![Hands-On Mathematics for Deep Learning](https://wfqqreader-1252317822.image.myqcloud.com/cover/81/36698081/b_36698081.jpg)
Conditional probability
Conditional probabilities are useful when the occurrence of one event leads to the occurrence of another. If we have two events, A and B, where B has occurred and we want to find the probability of A occurring, we write this as follows:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_1032.jpg?sign=1739282456-2WjbBxl5OCj9R7l4jqt7bXSOXuRYnu5W-0-54dea04d405d7fbf3a5f05f2faca0b33)
Here, .
However, if the two events, A and B, are independent, then we have the following:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_1104.jpg?sign=1739282456-ticRwlTKZtCfYRv64d1xDSN6pvgCFulW-0-8169bb744f0d397d6fc974a122f880db)
Additionally, if , then it is said that B attracts A. However, if A attracts BC, then it repels B.
The following are some of the axioms of conditional probability:
.
.
.
is a probability function that works only for subsets of B.
.
- If
, then
.
The following equation is known as Bayes' rule:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_1751.jpg?sign=1739282456-qfpxe1BTAaaB9BgRNuQe2bChAZxmG2Kk-0-8c8af3ed2e7241d5e5d6bba589761c65)
This can also be written as follows:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_68.jpg?sign=1739282456-L7MxR3BGwnUPpfPP6KCFB9VHuilXY4wM-0-a72bdf08f1b07c78b2a3cd8f792c6259)
Here, we have the following:
is called the prior.
is the posterior.
is the likelihood.
acts as a normalizing constant.
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_273.jpg?sign=1739282456-xVA1ftzhTIn4dMyDH9gkWGAW2OUQipop-0-7f1034d0e3ac3baa153490667ff81ded)
Often, we end up having to deal with complex events, and to effectively navigate them, we need to decompose them into simpler events.
This leads us to the concept of partitions. A partition is defined as a collection of events that together makes up the sample space, such that, for all cases of Bi, .
In the coin flipping example, the sample space is partitioned into two possible events—heads and tails.
If A is an event and Bi is a partition of Ω, then we have the following:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_225.jpg?sign=1739282456-P5bdL5835H624rumdLYyUGpKLKSvXucJ-0-3124f9876f2bca375c617fec5e0ae757)
We can also rewrite Bayes' formula with partitions so that we have the following:
![](https://epubservercos.yuewen.com/FF11E0/19470372701459106/epubprivate/OEBPS/Images/Chapter_58.jpg?sign=1739282456-5SRzaEYp9gbSTLa9JMuP6iMFuErmwYP5-0-3870b5b1195f4bb98c2120b950dde754)
Here, .