Code Optimization

Code optimization transforms a program to run faster, use less memory, or consume less energy, while preserving its semantics. Optimizations are applied to the IR before code generation.

Optimization Goals

Execution speed: reduce the number of instructions executed; reduce instruction latency.

Code size: smaller code fits in instruction cache better; important for embedded systems.

Memory usage: reduce stack and heap allocation; improve cache locality.

Power consumption: fewer operations = less power.

Optimizations must be semantics-preserving: the optimized program must produce the same observable outputs as the original for all valid inputs.

Local Optimizations (Within a Basic Block)

Constant Folding

Evaluate constant expressions at compile time.

t = 3 * 4 + 1    =>   t = 13
if 1 < 2 goto L  =>   goto L

Constant Propagation

If a variable is always assigned a constant, replace all uses with the constant.

x = 5
y = x + 3    =>   y = 8

Dead Code Elimination

Remove instructions whose results are never used.

t = a * b    # t never used after this
y = c + d
# Remove the first instruction

Common Subexpression Elimination (CSE)

If an expression is computed multiple times with the same operands, compute it once and reuse the result.

t1 = a + b
t2 = a + b    =>   t1 = a + b; t2 = t1

Algebraic Simplification

Apply mathematical identities.

x = a + 0    =>   x = a
x = a * 1    =>   x = a
x = a * 0    =>   x = 0
x = a * 2    =>   x = a + a  (or x = a << 1)
x = a ** 2   =>   x = a * a

Copy Propagation

Replace uses of a copy target with the copy source.

x = y
z = x + 1    =>   z = y + 1

Global Optimizations (Across Basic Blocks)

Data Flow Analysis

Compute properties of the program at each point using equations over the CFG.

Reaching definitions: which assignments to variable $x$ could have been the last assignment before a given point?

Live variable analysis: which variables hold values that will be used later? Dead assignments can be eliminated.

Available expressions: which expression values computed earlier are still valid (no operand has been reassigned)?

Data flow equations (live variables, backward analysis):

\[\text{LiveOut}(B) = \bigcup_{S \in \text{succ}(B)} \text{LiveIn}(S)\] \[\text{LiveIn}(B) = \text{Use}(B) \cup (\text{LiveOut}(B) \setminus \text{Def}(B))\]

Solve iteratively (worklist algorithm) until fixed point.

Global CSE

Eliminate expressions that are available at the current point (computed on all paths to this point without their operands being redefined).

Global Dead Code Elimination

Remove instructions whose defined variables are not live at the point of definition.

Loop Optimizations

Loops account for most execution time. Loop optimizations have the highest impact.

Loop Invariant Code Motion (LICM)

Move computations whose operands do not change in the loop to before the loop.

for i in range(n):
    y = x * 2      # x is loop-invariant
    a[i] = y + i

# After LICM:
y = x * 2
for i in range(n):
    a[i] = y + i

Strength Reduction

Replace expensive operations with cheaper ones inside loops.

# Original: multiplication each iteration
for i in range(n):
    a[i] = i * k

# After strength reduction: add instead of multiply
t = 0
for i in range(n):
    a[i] = t
    t += k

Induction Variable Elimination

Remove redundant induction variables that are linear functions of the primary induction variable.

Loop Unrolling

Replicate the loop body multiple times to reduce loop overhead and enable further optimizations.

# Original
for i in range(0, n, 1):
    a[i] = b[i] + c[i]

# After 4x unrolling
for i in range(0, n, 4):
    a[i]   = b[i]   + c[i]
    a[i+1] = b[i+1] + c[i+1]
    a[i+2] = b[i+2] + c[i+2]
    a[i+3] = b[i+3] + c[i+3]

Loop Fusion and Fission

Fusion: combine two adjacent loops with the same iteration space into one. Reduces loop overhead; improves locality.

Fission: split one loop into two. Improves cache behavior when the two operations use different arrays.

Interprocedural Optimizations

Inlining

Replace a function call with the body of the callee. Eliminates call overhead; enables further optimizations on the combined code (e.g., constant folding through the call).

int square(int x) { return x * x; }
y = square(a + 1);
// After inlining:
t = a + 1;
y = t * t;

Devirtualization

For virtual (polymorphic) function calls, determine the concrete type at compile time and replace with a direct call. Enables inlining of virtual calls.

Optimization Levels (GCC/Clang)

Flag	Optimizations enabled
`-O0`	None (default; fast compile)
`-O1`	Basic: dead code, CSE, constant folding
`-O2`	All standard: LICM, inlining, loop opts
`-O3`	Aggressive: vectorization, more inlining
`-Os`	Optimize for size
`-Oz`	Aggressively minimize size