Probability Theory

Probability theory provides the mathematical foundation for quantifying uncertainty. In ML, it underpins everything from model predictions to hypothesis testing and Bayesian inference.

Basic Definitions

Sample space $\Omega$: the set of all possible outcomes.

Event $A \subseteq \Omega$: a subset of outcomes.

Probability measure $P$: a function assigning probabilities to events, satisfying:

  • $P(A) \geq 0$ for all events $A$
  • $P(\Omega) = 1$
  • Countable additivity: $P(\cup_i A_i) = \sum_i P(A_i)$ for disjoint $A_i$

Conditional Probability

The probability of $A$ given that $B$ has occurred:

\[P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0\]

Chain rule:

\[P(A \cap B) = P(A|B) P(B) = P(B|A) P(A)\]

Independence

Events $A$ and $B$ are independent if:

\[P(A \cap B) = P(A) P(B)\]

Equivalently: $P(A \mid B) = P(A)$ (knowing $B$ gives no information about $A$).

Conditional independence: $A$ and $B$ are independent given $C$ if:

\[P(A \cap B | C) = P(A|C) P(B|C)\]

Foundation of Naive Bayes classifiers and graphical models.

Bayes’ Theorem

\[P(A|B) = \frac{P(B|A) P(A)}{P(B)}\]

Expanded form:

\[P(A|B) = \frac{P(B|A) P(A)}{P(B|A) P(A) + P(B|\neg A) P(\neg A)}\]

For multiple hypotheses $H_1, \ldots, H_k$:

\[P(H_i | D) = \frac{P(D | H_i) P(H_i)}{\sum_j P(D | H_j) P(H_j)}\]

Terms:

  • $P(H_i)$: prior (belief before seeing data)
  • $P(D \mid H_i)$: likelihood (probability of data under hypothesis)
  • $P(H_i \mid D)$: posterior (updated belief after seeing data)
  • $P(D)$: evidence (normalizing constant)

Law of Total Probability

If $B_1, \ldots, B_k$ form a partition of $\Omega$:

\[P(A) = \sum_{i=1}^k P(A | B_i) P(B_i)\]

Used to compute marginal probabilities from conditionals.

Combinatorics

Permutations: number of ways to arrange $k$ items from $n$:

\[P(n, k) = \frac{n!}{(n-k)!}\]

Combinations: number of ways to choose $k$ items from $n$ (order doesn’t matter):

\[\binom{n}{k} = \frac{n!}{k!(n-k)!}\]

Binomial coefficient properties:

  • $\binom{n}{0} = \binom{n}{n} = 1$
  • $\binom{n}{k} = \binom{n}{n-k}$
  • Pascal’s identity: $\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}$

Key Probability Distributions

Distribution Type Formula Mean Variance
Bernoulli Discrete $P(X=1) = p$ $p$ $p(1-p)$
Binomial Discrete $\binom{n}{k} p^k (1-p)^{n-k}$ $np$ $np(1-p)$
Geometric Discrete $(1-p)^{k-1} p$ $1/p$ $(1-p)/p^2$
Poisson Discrete $\frac{\lambda^k e^{-\lambda}}{k!}$ $\lambda$ $\lambda$
Uniform Continuous $\frac{1}{b-a}$ on $[a,b]$ $\frac{a+b}{2}$ $\frac{(b-a)^2}{12}$
Normal Continuous $\frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ $\mu$ $\sigma^2$

See Probability Distributions for full details.