Expectation and Variance

Expected Value

The expected value (mean, expectation) is the long-run average value of a random variable.

For discrete variables:

$$ E[X] = \sum_x x \cdot p(x) $$

For continuous variables:

$$ E[X] = \int_{-\infty}^{\infty} x \cdot f(x) dx $$

Notation: $E[X]$, $\mathbb{E}[X]$, or $\mu_X$

Properties of Expectation

Linearity: $E[aX + bY] = aE[X] + bE[Y]$
Constant: $E[c] = c$
Independence: If $X$ and $Y$ are independent, $E[XY] = E[X]E[Y]$
Law of the unconscious statistician (LOTUS):

$$ E[g(X)] = \sum_x g(x) p(x) \quad \text{(discrete)} $$

$$ E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x) dx \quad \text{(continuous)} $$

Variance

Variance measures the spread or dispersion around the mean:

$$ \text{Var}(X) = E[(X - \mu)^2] $$

Computational formula:

$$ \text{Var}(X) = E[X^2] - (E[X])^2 $$

Notation: $\text{Var}(X)$, $\sigma_X^2$, or $\sigma^2$

Properties of Variance

Constant: $\text{Var}(c) = 0$
Scaling: $\text{Var}(aX) = a^2 \text{Var}(X)$
Shift: $\text{Var}(X + c) = \text{Var}(X)$
Sum of independent variables:

$$ \text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) \quad \text{(if independent)} $$
General sum:

$$ \text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y) $$

Standard Deviation

Standard deviation is the square root of variance:

$$ \sigma = \sqrt{\text{Var}(X)} $$

Same units as $X$, more interpretable than variance.

Covariance

Covariance measures how two variables vary together:

$$ \text{Cov}(X, Y) = E[(X - \mu_X)(Y - \mu_Y)] $$

Computational formula:

$$ \text{Cov}(X, Y) = E[XY] - E[X]E[Y] $$

Properties:

$\text{Cov}(X, X) = \text{Var}(X)$
$\text{Cov}(X, Y) = \text{Cov}(Y, X)$ (symmetric)
$\text{Cov}(aX, bY) = ab \cdot \text{Cov}(X, Y)$
$\text{Cov}(X + Y, Z) = \text{Cov}(X, Z) + \text{Cov}(Y, Z)$

Correlation

Correlation is normalized covariance (scale-free):

$$ \rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y} $$

Properties:

$-1 \leq \rho \leq 1$
$\rho = 1$: perfect positive linear relationship
$\rho = -1$: perfect negative linear relationship
$\rho = 0$: no linear relationship (but may have nonlinear dependence)
Independence implies zero correlation, but zero correlation does NOT imply independence

Sample Statistics

Given data $x_1, \ldots, x_n$:

Sample mean:

$$ \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i $$

Sample variance (unbiased estimator):

$$ s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 $$

Sample standard deviation:

$$ s = \sqrt{s^2} $$

Sample covariance:

$$ \text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) $$

Moments

Raw moments: $\mu_k’ = E[X^k]$

Central moments: $\mu_k = E[(X - \mu)^k]$

$\mu_1 = \mu$ (mean)
$\mu_2 = \sigma^2$ (variance)
$\mu_3$ relates to skewness (asymmetry)
$\mu_4$ relates to kurtosis (tail heaviness)

Moment Generating Function (MGF)

$$ M_X(t) = E[e^{tX}] $$

Key property: $k$-th derivative at $t=0$ gives $k$-th moment:

$$ M_X^{(k)}(0) = E[X^k] $$

Uniqueness: If two distributions have the same MGF, they are identical.

Inequalities

Markov’s inequality (for non-negative $X$):

$$ P(X \geq a) \leq \frac{E[X]}{a} $$

Chebyshev’s inequality:

$$ P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2} $$

Guarantees that most mass lies within a few standard deviations of the mean.

Law of Large Numbers

Weak LLN: Sample mean converges in probability to expected value:

$$ \bar{X}_n \xrightarrow{P} \mu \quad \text{as } n \to \infty $$

Strong LLN: Sample mean converges almost surely:

$$ \bar{X}_n \xrightarrow{a.s.} \mu \quad \text{as } n \to \infty $$

Foundation of Monte Carlo methods and empirical risk minimization.

Central Limit Theorem

For i.i.d. variables $X_1, \ldots, X_n$ with mean $\mu$ and variance $\sigma^2$:

$$ \frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1) $$

As $n \to \infty$, the sample mean approaches a Normal distribution regardless of the original distribution.