Calculus

Derivatives

The derivative of $f(x)$ measures the instantaneous rate of change:

\[f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}\]

Common Rules

Rule Formula
Power $\frac{d}{dx} x^n = nx^{n-1}$
Chain $\frac{d}{dx} f(g(x)) = f’(g(x)) \cdot g’(x)$
Product $\frac{d}{dx} [f \cdot g] = f’g + fg’$
Quotient $\frac{d}{dx} [f/g] = (f’g - fg’) / g^2$
Sum $\frac{d}{dx} [f + g] = f’ + g’$

Common Derivatives

Function Derivative
$e^x$ $e^x$
$\ln x$ $1/x$
$\sin x$ $\cos x$
$\cos x$ $-\sin x$
$\sigma(x) = \frac{1}{1+e^{-x}}$ $\sigma(x)(1 - \sigma(x))$
$\tanh(x)$ $1 - \tanh^2(x)$
$\text{ReLU}(x)$ $0$ if $x<0$, $1$ if $x>0$

Partial Derivatives

For $f(x_1, x_2, \ldots, x_n)$, the partial derivative with respect to $x_i$ treats all other variables as constants:

\[\frac{\partial f}{\partial x_i}\]

Gradient

The gradient $\nabla f$ is the vector of all partial derivatives:

\[\nabla f(\mathbf{x}) = \left[\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n}\right]^T\]
  • Points in the direction of steepest ascent
  • Gradient descent moves opposite to the gradient to minimize $f$

Jacobian

For a vector-valued function $\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^m$, the Jacobian is the $m \times n$ matrix of all partial derivatives:

\[J_{ij} = \frac{\partial f_i}{\partial x_j}\]

Used heavily in backpropagation: the Jacobian of layer outputs w.r.t. inputs.

Hessian

For $f: \mathbb{R}^n \to \mathbb{R}$, the Hessian $H$ is the $n \times n$ matrix of second-order partial derivatives:

\[H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}\]
  • Symmetric if $f$ has continuous second derivatives
  • Positive definite Hessian → local minimum
  • Negative definite Hessian → local maximum
  • Indefinite Hessian → saddle point

Chain Rule (Multivariable)

If $\mathbf{y} = f(\mathbf{x})$ and $z = g(\mathbf{y})$:

\[\frac{\partial z}{\partial x_i} = \sum_j \frac{\partial z}{\partial y_j} \frac{\partial y_j}{\partial x_i}\]

In matrix form: $\frac{\partial z}{\partial \mathbf{x}} = J_f^T \frac{\partial z}{\partial \mathbf{y}}$

This is the foundation of backpropagation.

Taylor Series

Approximates a function around a point $a$:

\[f(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \cdots\]

First-order (linear) approximation:

\[f(\mathbf{x} + \delta) \approx f(\mathbf{x}) + \nabla f(\mathbf{x})^T \delta\]

Second-order approximation:

\[f(\mathbf{x} + \delta) \approx f(\mathbf{x}) + \nabla f(\mathbf{x})^T \delta + \frac{1}{2} \delta^T H \delta\]

Used in Newton’s method and second-order optimizers.

Integration

\[\int_a^b f(x)\, dx\]

Fundamental Theorem of Calculus: $\frac{d}{dx} \int_a^x f(t)\, dt = f(x)$

Key integral identities:

  • $\int e^x dx = e^x + C$
  • $\int x^n dx = \frac{x^{n+1}}{n+1} + C$ (for $n \neq -1$)
  • $\int \frac{1}{x} dx = \ln\lvert x \rvert + C$

Gaussian integral: $\int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi}$

Used extensively in probability distributions and variational inference.

Multivariable Integration

  • Change of variables: introduces the Jacobian determinant $\lvert \det J \rvert$
  • Monte Carlo integration: approximate $\int f(x) p(x) dx \approx \frac{1}{N}\sum_i f(x_i)$ where $x_i \sim p$