Calculus

Derivatives

The derivative of $f(x)$ measures the instantaneous rate of change:

\[f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}\]

For $f(x_1, x_2, \ldots, x_n)$, the partial derivative with respect to $x_i$ treats all other variables as constants:

\[\frac{\partial f}{\partial x_i}\]

The gradient $\nabla f$ is the vector of all partial derivatives:

\[\nabla f(\mathbf{x}) = \left[\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n}\right]^T\]

For a vector-valued function $\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^m$, the Jacobian is the $m \times n$ matrix of all partial derivatives:

\[J_{ij} = \frac{\partial f_i}{\partial x_j}\]

Used heavily in backpropagation: the Jacobian of layer outputs w.r.t. inputs.

For $f: \mathbb{R}^n \to \mathbb{R}$, the Hessian $H$ is the $n \times n$ matrix of second-order partial derivatives:

\[H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}\]

If $\mathbf{y} = f(\mathbf{x})$ and $z = g(\mathbf{y})$:

\[\frac{\partial z}{\partial x_i} = \sum_j \frac{\partial z}{\partial y_j} \frac{\partial y_j}{\partial x_i}\]

In matrix form: $\frac{\partial z}{\partial \mathbf{x}} = J_f^T \frac{\partial z}{\partial \mathbf{y}}$

This is the foundation of backpropagation.

Approximates a function around a point $a$:

\[f(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \cdots\]

First-order (linear) approximation:

\[f(\mathbf{x} + \delta) \approx f(\mathbf{x}) + \nabla f(\mathbf{x})^T \delta\]

Second-order approximation:

\[f(\mathbf{x} + \delta) \approx f(\mathbf{x}) + \nabla f(\mathbf{x})^T \delta + \frac{1}{2} \delta^T H \delta\]

Used in Newton’s method and second-order optimizers.

\[\int_a^b f(x)\, dx\]

Fundamental Theorem of Calculus: $\frac{d}{dx} \int_a^x f(t)\, dt = f(x)$

Key integral identities:

Gaussian integral: $\int_{-\infty}^{\infty} e^{-x^2} dx = \sqrt{\pi}$

Used extensively in probability distributions and variational inference.

Change of variables: introduces the Jacobian determinant $\lvert \det J \rvert$
Monte Carlo integration: approximate $\int f(x) p(x) dx \approx \frac{1}{N}\sum_i f(x_i)$ where $x_i \sim p$