Introduction to Computer Architecture
Computer architecture defines the structure and behavior of a computer system: how hardware components are organized, how they communicate, and how software interacts with hardware. Understanding architecture is fundamental to writing efficient software and designing systems.
Levels of Abstraction
Application software
-> High-level language
-> Assembly language
-> Machine code (ISA)
-> Microarchitecture (CPU implementation)
-> Logic gates and circuits
-> Transistors and devices
-> Physics
ISA (Instruction Set Architecture): the contract between hardware and software. Defines the instructions a CPU can execute, the registers it exposes, the memory model, and the I/O mechanism. Multiple microarchitecture implementations can implement the same ISA.
Core Components
CPU (Central Processing Unit): fetches, decodes, and executes instructions. Contains:
- Registers: small, fast storage inside the CPU.
- ALU (Arithmetic Logic Unit): performs arithmetic and logic operations.
- Control unit: decodes instructions; generates control signals.
- Program counter (PC): holds the address of the next instruction.
Memory: stores instructions and data. Slower than registers; organized in a hierarchy (cache, DRAM, disk).
I/O devices: keyboard, display, storage, network adapters. Communicate with the CPU via I/O buses.
Interconnect (bus): shared communication medium connecting CPU, memory, and I/O.
The Von Neumann Architecture
The dominant model for general-purpose computers, proposed by John von Neumann in 1945.
Key idea: programs are stored in the same memory as data (stored-program computer). This allows programs to manipulate other programs.
Components:
- Memory: stores both instructions and data.
- CPU: contains ALU and control unit.
- Input/Output.
- Bus connecting them.
Von Neumann bottleneck: the shared bus between CPU and memory limits throughput. The CPU is often faster than memory; it stalls waiting for data. Modern systems mitigate this with caches and out-of-order execution.
Harvard Architecture
Separate instruction and data memories (and buses). No Von Neumann bottleneck for instructions. Used in many microcontrollers (AVR, PIC) and DSPs.
Modern CPUs use a modified Harvard architecture: unified DRAM but separate instruction and data L1 caches.
Performance Metrics
CPI (Cycles Per Instruction): average number of clock cycles per instruction.
\[\text{Execution time} = \text{Instruction count} \times \text{CPI} \times \text{Cycle time}\]Clock frequency (Hz): number of cycles per second. Higher frequency = faster cycle; limited by heat and power.
MIPS (Millions of Instructions Per Second): simple throughput metric; misleading because instructions vary in complexity across ISAs.
FLOPS (Floating-Point Operations Per Second): relevant for scientific computing and ML.
Amdahl’s Law: if a fraction $f$ of execution time can be improved by speedup $S$:
\[\text{Overall speedup} = \frac{1}{(1-f) + f/S}\]The speedup is limited by the non-improved fraction. Parallelizing 90% of code with infinite cores gives at most 10$\times$ speedup.
RISC vs. CISC
RISC (Reduced Instruction Set Computer): small, uniform instruction set; load/store only architecture (arithmetic only on registers); fixed instruction width; many registers. ARM, RISC-V, MIPS, SPARC.
CISC (Complex Instruction Set Computer): rich instruction set; memory-to-memory operations; variable-length instructions; complex addressing modes. x86-64.
In practice: modern x86-64 CPUs internally translate CISC instructions to RISC-like micro-ops (µops). The ISA-level distinction is less meaningful for performance than microarchitecture implementation quality.
Modern CPU Features
Superscalar: issue multiple instructions per cycle by having multiple execution units. Modern CPUs issue 4-6 instructions per cycle.
Out-of-order execution: execute instructions in a different order than the program to avoid stalls; commit results in-order.
Branch prediction: predict the outcome of conditional branches to keep the pipeline full. Modern predictors achieve >99% accuracy.
Speculative execution: execute instructions before it is known whether they are needed (after a predicted branch). Roll back if the prediction is wrong.
SIMD (Single Instruction Multiple Data): execute the same operation on multiple data elements simultaneously. SSE, AVX (x86); NEON (ARM). Essential for multimedia and ML.