Probability Distributions
Discrete Distributions
Bernoulli Distribution
Single binary trial with success probability $p$:
\[P(X = k) = p^k (1-p)^{1-k}, \quad k \in \{0, 1\}\]- Mean: $p$
- Variance: $p(1-p)$
- Used for: binary classification, coin flips
Binomial Distribution
Number of successes in $n$ independent Bernoulli trials:
\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, \ldots, n\]- Mean: $np$
- Variance: $np(1-p)$
- Used for: A/B testing, quality control
Geometric Distribution
Number of trials until first success:
\[P(X = k) = (1-p)^{k-1} p, \quad k = 1, 2, \ldots\]- Mean: $1/p$
- Variance: $(1-p)/p^2$
- Memoryless property: $P(X > m+n \mid X > m) = P(X > n)$
Negative Binomial Distribution
Number of trials until $r$ successes:
\[P(X = k) = \binom{k-1}{r-1} p^r (1-p)^{k-r}\]- Mean: $r/p$
- Variance: $r(1-p)/p^2$
- Used for: overdispersed count data
Poisson Distribution
Count of events in fixed interval with rate $\lambda$:
\[P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots\]- Mean: $\lambda$
- Variance: $\lambda$
- Used for: rare events, count regression, arrival processes
Multinomial Distribution
Generalization of binomial to $k$ categories:
\[P(X_1 = n_1, \ldots, X_k = n_k) = \frac{n!}{n_1! \cdots n_k!} p_1^{n_1} \cdots p_k^{n_k}\]where $\sum_i n_i = n$ and $\sum_i p_i = 1$.
- Used for: multi-class classification, word counts in documents
Continuous Distributions
Uniform Distribution
Equal probability over interval $[a, b]$:
\[f(x) = \frac{1}{b-a}, \quad a \leq x \leq b\]- Mean: $\frac{a+b}{2}$
- Variance: $\frac{(b-a)^2}{12}$
- Used for: initialization, baseline models
Normal (Gaussian) Distribution
The most important distribution in statistics:
\[f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\]- Mean: $\mu$
- Variance: $\sigma^2$
- Central Limit Theorem: sum of i.i.d. variables converges to Normal
Standard Normal: $\mu = 0, \sigma = 1$
\[\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2}\]68-95-99.7 rule:
- 68% of mass within $\mu \pm \sigma$
- 95% of mass within $\mu \pm 2\sigma$
- 99.7% of mass within $\mu \pm 3\sigma$
Multivariate Normal Distribution
Generalization to $\mathbb{R}^d$:
\[f(\mathbf{x}) = \frac{1}{(2\pi)^{d/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (\mathbf{x}-\mu)^T \Sigma^{-1} (\mathbf{x}-\mu)\right)\]- Mean vector: $\mu \in \mathbb{R}^d$
- Covariance matrix: $\Sigma \in \mathbb{R}^{d \times d}$ (symmetric, positive definite)
- Used for: Gaussian processes, Kalman filters, factor analysis
Exponential Distribution
Time between events in a Poisson process:
\[f(x) = \lambda e^{-\lambda x}, \quad x \geq 0\]- Mean: $1/\lambda$
- Variance: $1/\lambda^2$
- Memoryless property: $P(X > s+t \mid X > s) = P(X > t)$
- Used for: survival analysis, queueing theory
Gamma Distribution
Sum of $k$ exponential waiting times:
\[f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}, \quad x > 0\]- Shape: $\alpha > 0$, Rate: $\beta > 0$
- Mean: $\alpha/\beta$
- Variance: $\alpha/\beta^2$
- Used for: Bayesian priors, waiting times
Beta Distribution
Distribution on $[0, 1]$, conjugate prior for Bernoulli/Binomial:
\[f(x) = \frac{1}{B(\alpha, \beta)} x^{\alpha-1} (1-x)^{\beta-1}, \quad 0 \leq x \leq 1\]where $B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}$
- Mean: $\frac{\alpha}{\alpha+\beta}$
- Variance: $\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$
- Used for: Bayesian inference, modeling probabilities
Dirichlet Distribution
Multivariate generalization of Beta, conjugate prior for Multinomial:
\[f(\mathbf{p}) = \frac{1}{B(\alpha)} \prod_{i=1}^k p_i^{\alpha_i-1}\]where $\sum_i p_i = 1$ and $B(\alpha) = \frac{\prod_i \Gamma(\alpha_i)}{\Gamma(\sum_i \alpha_i)}$
- Used for: topic models (LDA), categorical priors
Student’s t-Distribution
Heavy-tailed alternative to Normal:
\[f(x) = \frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu\pi} \Gamma(\frac{\nu}{2})} \left(1 + \frac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}\]- Degrees of freedom: $\nu > 0$
- Mean: $0$ (for $\nu > 1$)
- Variance: $\frac{\nu}{\nu-2}$ (for $\nu > 2$)
- As $\nu \to \infty$, converges to Standard Normal
- Used for: robust statistics, small-sample inference
Chi-Squared Distribution
Sum of squared standard Normals:
If $Z_1, \ldots, Z_k \sim \mathcal{N}(0, 1)$, then $X = \sum_i Z_i^2 \sim \chi^2_k$
- Degrees of freedom: $k$
- Mean: $k$
- Variance: $2k$
- Used for: hypothesis testing, confidence intervals
Summary Table
| Distribution | Parameters | Support | Mean | Variance | Typical Use |
|---|---|---|---|---|---|
| Bernoulli | $p$ | ${0,1}$ | $p$ | $p(1-p)$ | Binary outcomes |
| Binomial | $n, p$ | ${0,\ldots,n}$ | $np$ | $np(1-p)$ | Count of successes |
| Poisson | $\lambda$ | ${0,1,\ldots}$ | $\lambda$ | $\lambda$ | Rare events |
| Uniform | $a, b$ | $[a,b]$ | $\frac{a+b}{2}$ | $\frac{(b-a)^2}{12}$ | Baseline, initialization |
| Normal | $\mu, \sigma^2$ | $\mathbb{R}$ | $\mu$ | $\sigma^2$ | General modeling |
| Exponential | $\lambda$ | $[0,\infty)$ | $1/\lambda$ | $1/\lambda^2$ | Waiting times |
| Gamma | $\alpha, \beta$ | $(0,\infty)$ | $\alpha/\beta$ | $\alpha/\beta^2$ | Bayesian priors |
| Beta | $\alpha, \beta$ | $[0,1]$ | $\frac{\alpha}{\alpha+\beta}$ | complex | Probability priors |
| Student’s t | $\nu$ | $\mathbb{R}$ | $0$ | $\frac{\nu}{\nu-2}$ | Robust statistics |