Eigenvalues and Eigenvectors

Definition

For a square matrix $A \in \mathbb{R}^{n \times n}$, a non-zero vector $\mathbf{v}$ is an eigenvector with eigenvalue $\lambda$ if:

$$ A\mathbf{v} = \lambda\mathbf{v} $$

Geometrically: $A$ only scales $\mathbf{v}$ (does not rotate it). $\lambda$ tells you by how much.

Computing Eigenvalues

Rewrite $A\mathbf{v} = \lambda\mathbf{v}$ as $(A - \lambda I)\mathbf{v} = \mathbf{0}$.

This has a non-trivial solution iff $A - \lambda I$ is singular:

$$ \det(A - \lambda I) = 0 $$

This is the characteristic polynomial of $A$. Its roots are the eigenvalues.

Example for 2×2:

$$ \det\begin{pmatrix}a-\lambda & b \\ c & d-\lambda\end{pmatrix} = (a-\lambda)(d-\lambda) - bc = 0 $$

Properties

An $n \times n$ matrix has exactly $n$ eigenvalues (counting multiplicity, in $\mathbb{C}$)
$\text{tr}(A) = \sum_i \lambda_i$
$\det(A) = \prod_i \lambda_i$
Eigenvectors for distinct eigenvalues are linearly independent
Real symmetric matrices have real eigenvalues and orthogonal eigenvectors

Eigendecomposition

If $A$ has $n$ linearly independent eigenvectors $\mathbf{v}_1, \ldots, \mathbf{v}_n$ with eigenvalues $\lambda_1, \ldots, \lambda_n$:

$$ A = Q \Lambda Q^{-1} $$

where $Q$ is the matrix with eigenvectors as columns and $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$.

For symmetric matrices: $A = Q \Lambda Q^T$ (since $Q$ is orthogonal: $Q^{-1} = Q^T$)

When does it exist? Only for diagonalizable matrices (all $n$ eigenvectors linearly independent).

Spectral Theorem

Any real symmetric matrix $A$ can be decomposed as:

$$ A = Q \Lambda Q^T = \sum_i \lambda_i \mathbf{q}_i \mathbf{q}_i^T $$

where $Q$ is orthogonal and $\Lambda$ is real diagonal.

Covariance matrices and kernel matrices are symmetric positive semi-definite → always diagonalizable with real non-negative eigenvalues.

Powers and Functions of Matrices

Eigendecomposition makes matrix powers easy:

$$ A^k = Q \Lambda^k Q^{-1} $$

$$ f(A) = Q f(\Lambda) Q^{-1} $$

where $f(\Lambda) = \text{diag}(f(\lambda_1), \ldots, f(\lambda_n))$.

Used in: computing matrix exponential $e^A$ for ODEs, RNNs, state-space models.

Positive Definite Matrices

$A$ is positive definite (PD) iff all eigenvalues $\lambda_i > 0$.

$A$ is positive semi-definite (PSD) iff all $\lambda_i \geq 0$.

Equivalently: $A$ is PD iff $\mathbf{x}^T A \mathbf{x} > 0$ for all $\mathbf{x} \neq \mathbf{0}$.

In ML:

Covariance matrices are always PSD
Gram matrices $K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)$ must be PSD for valid kernel functions
The Hessian at a local minimum is PD

Relationship to SVD

For $A \in \mathbb{R}^{m \times n}$ with SVD $A = U\Sigma V^T$:

Left singular vectors $U$ = eigenvectors of $AA^T$
Right singular vectors $V$ = eigenvectors of $A^T A$
Singular values $\sigma_i = \sqrt{\lambda_i(A^T A)}$

For square symmetric matrices: singular values equal the absolute values of eigenvalues.

Principal Component Analysis (PCA)

PCA finds the directions of maximum variance in data.

Given centered data matrix $X \in \mathbb{R}^{n \times d}$:

Compute covariance matrix: $C = \frac{1}{n} X^T X$
Eigendecompose: $C = Q\Lambda Q^T$
Principal components = eigenvectors $\mathbf{q}_i$ (ordered by decreasing $\lambda_i$)
Project: $Z = XQ_k$ where $Q_k$ = top-$k$ eigenvectors

The eigenvalue $\lambda_i$ = variance explained along $\mathbf{q}_i$.

Spectral Methods in Graphs

For a graph with adjacency matrix $A$ or Laplacian $L = D - A$:

Eigenvalues of $L$ (spectrum) encode graph structure
Second smallest eigenvalue (Fiedler value) measures connectivity
Eigenvectors of $L$ = smooth graph signals, used in spectral GNNs

Eigenvalues in Optimization

The Hessian’s eigenvalues determine the loss landscape curvature:

All positive → convex local bowl → gradient descent converges
Mix of positive/negative → saddle point
Smallest eigenvalue → sharpest direction → sets gradient descent step size limit: $\eta < 2/\lambda_{\max}$

Condition number $\kappa = \lambda_{\max} / \lambda_{\min}$ determines convergence rate.

Power Iteration

Iterative algorithm to find the dominant eigenvector:

v = random unit vector
repeat:
    v = Av
    v = v / norm(v)

Converges to the eigenvector for $\lambda_{\max}$. Rate depends on $\lvert \lambda_1 / \lambda_2 \rvert$.

Used in: PageRank, spectral clustering initialization.