Expectation and Variance

Expected Value

The expected value (mean, expectation) is the long-run average value of a random variable.

For discrete variables:

\[E[X] = \sum_x x \cdot p(x)\]

For continuous variables:

\[E[X] = \int_{-\infty}^{\infty} x \cdot f(x) dx\]

Notation: $E[X]$, $\mathbb{E}[X]$, or $\mu_X$

Properties of Expectation

  • Linearity: $E[aX + bY] = aE[X] + bE[Y]$
  • Constant: $E[c] = c$
  • Independence: If $X$ and $Y$ are independent, $E[XY] = E[X]E[Y]$
  • Law of the unconscious statistician (LOTUS):

    \[E[g(X)] = \sum_x g(x) p(x) \quad \text{(discrete)}\] \[E[g(X)] = \int_{-\infty}^{\infty} g(x) f(x) dx \quad \text{(continuous)}\]

Variance

Variance measures the spread or dispersion around the mean:

\[\text{Var}(X) = E[(X - \mu)^2]\]

Computational formula:

\[\text{Var}(X) = E[X^2] - (E[X])^2\]

Notation: $\text{Var}(X)$, $\sigma_X^2$, or $\sigma^2$

Properties of Variance

  • Constant: $\text{Var}(c) = 0$
  • Scaling: $\text{Var}(aX) = a^2 \text{Var}(X)$
  • Shift: $\text{Var}(X + c) = \text{Var}(X)$
  • Sum of independent variables:

    \[\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) \quad \text{(if independent)}\]
  • General sum:

    \[\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) + 2\text{Cov}(X, Y)\]

Standard Deviation

Standard deviation is the square root of variance:

\[\sigma = \sqrt{\text{Var}(X)}\]

Same units as $X$, more interpretable than variance.

Covariance

Covariance measures how two variables vary together:

\[\text{Cov}(X, Y) = E[(X - \mu_X)(Y - \mu_Y)]\]

Computational formula:

\[\text{Cov}(X, Y) = E[XY] - E[X]E[Y]\]

Properties:

  • $\text{Cov}(X, X) = \text{Var}(X)$
  • $\text{Cov}(X, Y) = \text{Cov}(Y, X)$ (symmetric)
  • $\text{Cov}(aX, bY) = ab \cdot \text{Cov}(X, Y)$
  • $\text{Cov}(X + Y, Z) = \text{Cov}(X, Z) + \text{Cov}(Y, Z)$

Correlation

Correlation is normalized covariance (scale-free):

\[\rho(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \sigma_Y}\]

Properties:

  • $-1 \leq \rho \leq 1$
  • $\rho = 1$: perfect positive linear relationship
  • $\rho = -1$: perfect negative linear relationship
  • $\rho = 0$: no linear relationship (but may have nonlinear dependence)
  • Independence implies zero correlation, but zero correlation does NOT imply independence

Sample Statistics

Given data $x_1, \ldots, x_n$:

Sample mean:

\[\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i\]

Sample variance (unbiased estimator):

\[s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2\]

Sample standard deviation:

\[s = \sqrt{s^2}\]

Sample covariance:

\[\text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})\]

Moments

Raw moments: $\mu_k’ = E[X^k]$

Central moments: $\mu_k = E[(X - \mu)^k]$

  • $\mu_1 = \mu$ (mean)
  • $\mu_2 = \sigma^2$ (variance)
  • $\mu_3$ relates to skewness (asymmetry)
  • $\mu_4$ relates to kurtosis (tail heaviness)

Moment Generating Function (MGF)

\[M_X(t) = E[e^{tX}]\]

Key property: $k$-th derivative at $t=0$ gives $k$-th moment:

\[M_X^{(k)}(0) = E[X^k]\]

Uniqueness: If two distributions have the same MGF, they are identical.

Inequalities

Markov’s inequality (for non-negative $X$):

\[P(X \geq a) \leq \frac{E[X]}{a}\]

Chebyshev’s inequality:

\[P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}\]

Guarantees that most mass lies within a few standard deviations of the mean.

Law of Large Numbers

Weak LLN: Sample mean converges in probability to expected value:

\[\bar{X}_n \xrightarrow{P} \mu \quad \text{as } n \to \infty\]

Strong LLN: Sample mean converges almost surely:

\[\bar{X}_n \xrightarrow{a.s.} \mu \quad \text{as } n \to \infty\]

Foundation of Monte Carlo methods and empirical risk minimization.

Central Limit Theorem

For i.i.d. variables $X_1, \ldots, X_n$ with mean $\mu$ and variance $\sigma^2$:

\[\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1)\]

As $n \to \infty$, the sample mean approaches a Normal distribution regardless of the original distribution.