Probability theory is the branch of mathematics involved with probability. The notion of probability is used to measure the level of uncertainty. Probability theory aims to represent uncertain phenomena in terms of a set of axioms. Long story short, when we cannot be exact about the possible outcomes of a system, we try to represent the situation using the likelihood of different outcomes and scenarios.

The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man’s mind.

James Clerk Maxwell

In this post, you will learn:

• What is the probability theory?
• Why it is important in Artificial Intelligence and Machine Learning?
• The fundamental definitions in probability theory
• Some mathematical background

## Probability theory in Machine Learning

The probability theory is of great importance in many different branches of science. Let’s focus on Artificial Intelligence empowered by Machine Learning. The question is, “how knowing probability is going to help us in Artificial Intelligence?” In AI applications, we aim to design an intelligent machine to do the task. First, the model should get a sense of the environment via modeling.

As there is ambiguity regarding the possible outcomes, the model works based on estimation and approximation, which are done via probability. Second, as the machine tries to learn from the data (environment), it must reason about the process of learning and decision making. Such reasoning is not possible without considering all possible states, scenarios, and their likelihood. Third, to measure and assess the machine capabilities, we must utilize probability theory as well. ## Probability Axioms

Let’s roll a dice and ask the following informal question: What is the chance of getting six as the outcome? It is equivalent to another more formal question: What is the probability of getting a six in rolling a dice? Informal answer: The same as getting any other number most probably. Formal response: 1/6. How do we interpret the calculation of 1/6? Well, it is clear that when you roll a dice, you get a number in the range of {1,2,3,4,5,6}, and you do NOT get any other number. We can call {1,2,3,4,5,6} the outcome space that nothing outside of it may happen. To mathematically define those chances, some universal definitions and rules must be applied, so we all agree with it.

To this aim, it is crucial to know what governs the probability theory. We start with axioms. The definition of an axiom is as follows: “a statement or proposition which is regarded as being established, accepted, or self-evidently true.” Before stepping into the axioms, we should have some preliminary definitions.

### Sample and Event Space

Probability theory is mainly associated with random experiments. For a random experiment, we cannot predict with certainty which event may occur. However, the set of all possible outcomes might be known.

### Sample Space

Definition: We call the set of all possible outcomes as the sample space and we denote it by .

After defining the sample space, we should define an event.

### Event

Definition: An event is a set embracing some possible outcomes. Any event is a subset of the sample space . The empty set is called the impossible event as it is null and does not represent any outcome.

Now, let’s discuss some operations on events.

• Union: For any set of events , the union event consists of all outcomes that occurred in any of E_{i} events at least once. Ex: The indicates that if or occurred.
• Intersection: For any set of events , the intersection event consists of all outcomes that occurred in all of E_{i} events at least once. Ex: The indicates if and both occurred.
• Mutually Exclusive: Two events and are mutually exclusive if they cannot occur concurrently. In other words, . Ex: (A) throwing a fair coin and (B) rolling a dice. A and B are clearly mutually exclusive.
• Complement Set: For any event , we denote as the complement of and stands for all outcomes in the sample space that are not in . Basically and .

### Axioms

Andrey Kolmogorov, in 1933, proposed Kolmogorov Axioms that form the foundations of Probability Theory. The Kolmogorov Axioms can be expressed as follows: Assume we have the probability space of . Then, the probability measure is a real-valued function mapping as satisfies all the following axioms:

1. For any event , (the probability of occurrence is non-negative).
2. .
3. for any set of mutually exclusive events .

### Outcomes

Using the axioms, we can conclude some fundamental characteristics as below:

• If event is a subset of event ( ), then .
• If is an event and is the complementary set (all other events except in the event space ), then .
• The probability of the empty set is zero ( ) as the empty set is the complementary set of the sample space .
• For any event , we have the probability bound of .

## Math Background

To tackle and solve the probability problem, there is always a need to count how many elements available in the event and sample space. Here, we discuss some important counting principles and techniques.

### Counting all possible outcomes

Let’s consider the special case of having two experiments as and . The basic principle states that if one experiment ( ) results in N possible outcomes and if another experiment ( ) leads to M possible outcomes, then conducting the two experiments will have possible outcome, in total. Assume experiment has M possible outcomes as and has N possible outcomes as .

It is easy to prove such a principle for its special case. All you need in to count all possible outcomes of two experiments: The generalized principle of counting can be expressed as below:

### Generalized Basic Principle of Counting

Assume we have q different experiments with the corresponding number of possible outcomes as . Then we can conclude that there is a total of outcomes for conducting all q experiments.

### Permutation

What is a permutation? Suppose we have three persons called Michael, Bob, and Alice. Assume the three of them stay in a queue. How many possible arrangements we have? Take a look at the arrangements as follows: As above, you will see six permutations. Right? But, we cannot always write all possible situations! We need some math. The intuition behind this problem is that we have three places to fill in a queue when we have three persons. For the first place, we have three choices. For the second place, there are two remaining choices. Finally, there is only one choice left for the last place! So we can extend this conclusion to the experiment that we have choices. Hence, we get the following number of permutations: NOTE: The descending order of multiplication from to is as above (the product of all positive integers less than or equal to n), denote as , and called factorial.

### Combination

The combination stands for different combinations of objects from a larger set of objects. For example, assume we have a total number of objects. With how many ways can we select objects from that objects? Let’s get back to the above examples. Assume we have three candidates named Michael, Bob, and Alice, and we only desire to select two candidates. How many different combinations of candidates exist? Let’s get back to the general question: How many selections we can have if we desire to pick objects from objects?

### Combination

The number of unordered selections of objects from objects is denoted and calculated as: NOTE: In the combination selection, we referred to the unordered selection. It means, the combination of is the same as i.e., the order does NOT matter.

The above definition can be generalized.

### Generalization

Assume we have objects, groups of objects each with objects, and . The number of unordered possible divisions of n objects into these distinct groups can be calculated as below: ## Conclusion

In this article, you learned about probability theory, why it is important in Machine Learning, and what are the fundamental concepts. Probability theory is of great importance in Machine Learning since it all deals with uncertainty and predictions. Above, the basics that help you to understand probability concepts and utilizing them. Having any questions? Feel free to ask by commenting below.

0 0 vote
Article Rating
Subscribe
Notify of 