In this post, we are going to walk through the Essential Definitions in Probability Theory. Understanding these concepts is critical to comprehend further advanced concepts in probability theory. Here, you will learn:

  • The key concepts such as random variables and conditional probabilities.
  • You gain knowledge about how these notions are related.

Random Variables

The random variable is one of the essential definitions in probability theory. In probability theory, the outcomes of a random phenomenon determine the random variable values. In other words, a random variable is a variable that its values are determined with a random event. We should be able to measure a random variable that provides the capability to assign probabilities to its possible values. The domain of a random variable is the sample space. For example, in the case of having a dice, only six possible outcomes are considered, as {1,2,3,4,5,6}.


In mathematics, a random variable is usually denoted with upper case roman letters such as X, Y. However, such notation is not consistent, and you should expect to see different notations. However, don’t worry, the utilized notation would be usually reported whenever you read any article.

By using more precise mathematics notation, a random variable X is a measurable function defined as X: \Omega \rightarrow E, which is from all possible space \Omega to some event E. Let’s have an illustrative example. Assume we would like to roll a dice, and the measurable event space is all numbers less than 4. Here, we have \Omega=\{1,2,3,4,5,6\}, the random variable X as the outcome of rolling the dice, and we define the event as E \equiv X \leq 4.

Essential Definitions in Probability Theory

Conditional probability

One of the essential definitions in probability theory is the conditional probability. This is because a lot of events depends on other precedent events or available partial information. Recognizing and calculating this dependency can lead to a more precise probability estimation. As below figure, how many layers should you wear, depends on the weather!!

Conditional probability definition

Conditional probability, using simple wording, refers to the likelihood of an event (chain of events) given the fact that another event (chain of events) happened.

Let’s have an example. Assume we toss a dice and then flip a fair coin. What is the probability of getting number one (Event A) in tossing the dice and getting heads (Event B) after flipping the coin? Clearly A \cap B = \varnothing. So, the probability of getting number one (in tossing the dice) and getting heads (in flipping the coin) is as below:

    \[P(A \cap B) = P(A)P(B)={\frac{1}{6} \times \frac{1}{2}} = \frac{1}{12}\]

Let’s take a look at a conditional situation. Assume we want to calculate the probability of having heads (in flipping the coin) if we get one (in tossing the dice)? This is a conditional statement. Basically, B is conditioned on happening A:


The mathematical formulation of the conditional probability of two events is as below:

Conditional probability calculation

If P(A)>0, then the conditional probability of P(B|A) is as below:

    \[P(B|A)=\frac{P(A \cap B)}{P(B)}\]

NOTE: In case A \cap B = \varnothing (The two A and B events are independent), we have: 

    \[P(B|A)=\frac{P(A \cap B)}{P(B)}=\frac{P(A)P(B)}{P(B)}=P(A)\]

Bayes’ Rule

In order to explain the Bayes’ rule, let’s start with something simple as a special case. Assume we have two events A and B. Do you agree with the following statement?

    \[A = A \cap \Omega = A \cap (B \cup B^c) = (A  \cap B) \cup (A  \cap B^c)\]

A = A \cap \Omega: The intersection of any event with the sample space is that particular event.
\Omega = B \cup B^c: The union of any event with its complement event is the whole sample space.

NOTE: E_1 = A  \cap B and E_2 = A  \cap B^c events are mutually exclusive. Think why?

Now, let’s calculate the probability of event A:

(1)   \begin{equation*} \begin{split}P(E) & = P(A \cap (B \cup B^c)) = P(A  \cap B) + P(A  \cap B^c) \\& = P(A | B) P(B) + P(A | B^c) P(B^c)\end{split}\end{equation*}

Above, we used the conditional probability rules to expand the event intersections. The above equation asserts that the probability of event A is a weighted average of the conditional probability of A given that B has occurred and the conditional probability of A given that B has not happened. This is a very useful formula as a lot of times, directly calculating the probability of an event such as A may not be easy or even possible. This rule conditions the likelihood of an event on different events.


The fact is, in Bayes’ rule, the probability of an event is going to be conditioned on different mutually exclusive events (B and B^c) which together form the whole sample space as B \cup B^c = \Omega.

The above special case can be extended to the below more general rule called the Bayes’ rule.

Bayes’ rule

Assume we have some mutually exclusive events as \Omega = \bigcup_{i=1}^{n}E_i, i.e., at least one of the E_i event MUST and will occur. Then, the probability of the event A is calculated as below:

(2)   \begin{equation*} \begin{split}P(A) = \sum_{i=1}^{n}P(A|E_i)P(E_i)\end{split}\end{equation*}

The concept of independence

We previously discussed conditional probabilities. You learn the concept of probability of an event A dependent to another event B as P(A|B). Intuitive, we can infer that if A and B are independent, then P(A|B) does not depend on B, and it simply equals to P(A). We previously investigated the formulation as well. Technically, being independent is mutual, i.e., when P(A|B)=P(A), then P(B|A)=P(B) and vice verse. We can conclude that when A and B are independent, then P(A \cap B)=P(A)P(B). This formulation leads to the following definition: 

Independent Events

Two events A and B are said to be independent if P(A \cap B)=P(A)P(B). Otherwise, they are called dependent.


In a previous post, you learned about what is probability and some mathematical background. In this post, you acquire knowledge about the fundamentals of probability theory and its key concepts. What you learned so far, aimed to prepare you to utilize probability notions and further strengthen your background for more advanced probabilistic concepts in Machine Learning. Do you have any questions or suggestions? Feel free to comment and share your point of view.

Leave a Comment

Your email address will not be published. Required fields are marked *