Linear Algebra
Scalars, Vectors, Matrices, Tensors
- Scalar: single number, e.g. $x \in \mathbb{R}$
- Vector: 1-D array of numbers, $\mathbf{x} \in \mathbb{R}^n$
- Matrix: 2-D array, $\mathbf{A} \in \mathbb{R}^{m \times n}$
- Tensor: n-D generalization of matrices
Vector Operations
| Operation | Notation | Result |
|---|---|---|
| Dot product | $\mathbf{a} \cdot \mathbf{b} = \sum_i a_i b_i$ | scalar |
| Outer product | $\mathbf{a} \mathbf{b}^T$ | matrix |
| Element-wise (Hadamard) | $\mathbf{a} \odot \mathbf{b}$ | vector |
| Cross product | $\mathbf{a} \times \mathbf{b}$ | vector (3-D only) |
Matrix Operations
Transpose: $(A^T){ij} = A{ji}$
Matrix multiplication: $C = AB$ where $C_{ij} = \sum_k A_{ik} B_{kj}$
- Requires inner dimensions to match: $(m \times k)(k \times n) = (m \times n)$
- Not commutative: $AB \neq BA$ in general
Inverse: $A^{-1}$ exists iff $A$ is square and non-singular ($\det(A) \neq 0$). $A A^{-1} = I$
Special Matrices
| Type | Property |
|---|---|
| Identity $I$ | $AI = IA = A$ |
| Diagonal | Non-zero only on diagonal |
| Symmetric | $A = A^T$ |
| Orthogonal | $A^T A = I$, columns are orthonormal |
| Positive definite | $\mathbf{x}^T A \mathbf{x} > 0$ for all $\mathbf{x} \neq 0$ |
| Positive semi-definite | $\mathbf{x}^T A \mathbf{x} \geq 0$ |
Norms
Measure the “size” of a vector or matrix.
Vector norms:
- $L^1$ norm: $\lVert\mathbf{x}\rVert_1 = \sum_i \lvert x_i \rvert$
- $L^2$ norm (Euclidean): $\lVert\mathbf{x}\rVert_2 = \sqrt{\sum_i x_i^2}$
- $L^\infty$ norm: $\lVert\mathbf{x}\rVert_\infty = \max_i \lvert x_i \rvert$
- $L^p$ norm: $\lVert\mathbf{x}\rVert_p = \left(\sum_i \lvert x_i \rvert^p\right)^{1/p}$
Matrix norms:
- Frobenius norm: $\lVert A\rVert_F = \sqrt{\sum_{i,j} A_{ij}^2}$
Linear Independence and Rank
- Vectors are linearly independent if no vector can be written as a linear combination of the others.
- Rank of a matrix = number of linearly independent rows (or columns).
- A matrix $A \in \mathbb{R}^{m \times n}$ has rank at most $\min(m, n)$.
- Full rank means rank equals $\min(m, n)$.
Determinant
- Scalar measure of how much a matrix scales volume.
- $\det(AB) = \det(A)\det(B)$
- $\det(A^{-1}) = 1 / \det(A)$
- $\det(A) = 0$ iff $A$ is singular (non-invertible).
Linear Systems
$A\mathbf{x} = \mathbf{b}$
- Unique solution if $A$ is invertible: $\mathbf{x} = A^{-1}\mathbf{b}$
- No solution (overdetermined, inconsistent) or infinitely many (underdetermined)
- Solved in practice via Gaussian elimination, LU decomposition. Do not compute $A^{-1}$ directly.
Decompositions
LU Decomposition
$A = LU$ where $L$ is lower triangular, $U$ is upper triangular. Used for solving linear systems efficiently.
QR Decomposition
$A = QR$ where $Q$ is orthogonal, $R$ is upper triangular. Used in least squares, Gram-Schmidt.
Eigendecomposition
$A = Q \Lambda Q^{-1}$ where $\Lambda$ = diagonal matrix of eigenvalues. Only valid for diagonalizable square matrices.
Singular Value Decomposition (SVD)
$A = U \Sigma V^T$
- $U \in \mathbb{R}^{m \times m}$: left singular vectors (orthogonal)
- $\Sigma \in \mathbb{R}^{m \times n}$: diagonal, singular values $\sigma_i \geq 0$
- $V \in \mathbb{R}^{n \times n}$: right singular vectors (orthogonal)
- Works for any matrix (not just square)
- Singular values = $\sqrt{\text{eigenvalues of } A^T A}$
Key uses in ML:
- PCA (principal component analysis)
- Low-rank approximations (compress data, reduce noise)
- Pseudoinverse: $A^+ = V \Sigma^+ U^T$
Least Squares
Solve $A\mathbf{x} \approx \mathbf{b}$ when the system is overdetermined.
Normal equations: $A^T A \mathbf{x} = A^T \mathbf{b}$
Solution: $\mathbf{x} = (A^T A)^{-1} A^T \mathbf{b}$ (the pseudoinverse projection).
Used heavily in linear regression.
Projections
Project $\mathbf{b}$ onto the column space of $A$:
\[\mathbf{p} = A(A^T A)^{-1} A^T \mathbf{b}\]The projection matrix $P = A(A^T A)^{-1} A^T$ satisfies $P^2 = P$ (idempotent) and $P^T = P$.