Introduction to Data Structures and Algorithms

Data structures and algorithms are the core of computer science. A data structure organizes and stores data to enable efficient access and modification. An algorithm is a step-by-step procedure for solving a problem.

Why They Matter

Every program uses data structures and algorithms, whether explicitly or not. Choosing the wrong data structure can turn a fast program into an unusably slow one. A program that handles 1000 elements well may fail on a million because of algorithmic complexity.

Understanding DSA enables you to:

  • Write programs that scale to large inputs.
  • Recognize standard problems and apply known solutions.
  • Reason about correctness and efficiency.
  • Pass technical interviews.

Asymptotic Complexity

We describe algorithm efficiency using Big O notation, which characterizes how runtime (or space) grows as input size $n$ grows, ignoring constant factors.

Common Complexity Classes

Notation Name Example
$O(1)$ Constant Hash table lookup
$O(\log n)$ Logarithmic Binary search
$O(n)$ Linear Linear scan
$O(n \log n)$ Linearithmic Merge sort
$O(n^2)$ Quadratic Bubble sort
$O(n^3)$ Cubic Naive matrix multiply
$O(2^n)$ Exponential Brute-force subsets
$O(n!)$ Factorial Brute-force permutations

Formal Definitions

Big O (upper bound): $f(n) = O(g(n))$ if there exist $c > 0$ and $n_0$ such that $f(n) \leq c \cdot g(n)$ for all $n \geq n_0$.

Big Omega (lower bound): $f(n) = \Omega(g(n))$ if $f(n) \geq c \cdot g(n)$.

Big Theta (tight bound): $f(n) = \Theta(g(n))$ if both $O$ and $\Omega$ hold.

Analyzing Code

Sequential statements: add complexities.

Loops: multiply the body complexity by the number of iterations.

Nested loops: multiply all loop counts.

Recursive algorithms: solve the recurrence relation.

Master Theorem: for $T(n) = aT(n/b) + f(n)$:

  • If $f(n) = O(n^{\log_b a - \epsilon})$: $T(n) = \Theta(n^{\log_b a})$.
  • If $f(n) = \Theta(n^{\log_b a})$: $T(n) = \Theta(n^{\log_b a} \log n)$.
  • If $f(n) = \Omega(n^{\log_b a + \epsilon})$: $T(n) = \Theta(f(n))$.

Merge sort: $T(n) = 2T(n/2) + O(n)$. Case 2: $T(n) = O(n \log n)$.

Binary search: $T(n) = T(n/2) + O(1)$. Case 1: $T(n) = O(\log n)$.

Space Complexity

Memory usage also matters. $O(1)$ space = constant extra memory (in-place). $O(n)$ space = proportional to input size.

Auxiliary space: space used beyond the input itself.

Stack space for recursion: each recursive call uses $O(1)$ stack space. $k$-level deep recursion: $O(k)$ stack space. Deep recursion can cause stack overflow.

Abstract Data Types vs. Data Structures

An Abstract Data Type (ADT) defines a type by its behavior (operations and their semantics), not its implementation.

Stack ADT: push, pop, peek, isEmpty. Queue ADT: enqueue, dequeue, peek, isEmpty. Dictionary/Map ADT: insert, delete, lookup.

A data structure is a concrete implementation of an ADT. A Stack can be implemented with an array or a linked list.

Correctness

An algorithm is correct if it produces the right output for every valid input.

Loop invariant: a property that holds before and after each iteration. Proving the invariant holds at initialization and is preserved by each iteration proves partial correctness; combined with termination, proves total correctness.

Example (binary search): invariant: if the target exists, it is within [lo, hi]. At each step, the range halves. Terminates when lo > hi.

Algorithm Design Paradigms

Brute force: try all possibilities. Correct but often exponential.

Divide and conquer: split the problem into smaller sub-problems; solve recursively; combine. Merge sort, quicksort, binary search.

Greedy: make the locally optimal choice at each step. Works when the greedy choice property and optimal substructure hold. Dijkstra’s shortest path, Kruskal’s MST.

Dynamic programming: solve overlapping sub-problems once and store results. Optimal substructure required. Fibonacci, longest common subsequence, knapsack.

Backtracking: enumerate candidates; abandon (backtrack) a candidate as soon as it cannot lead to a valid solution. N-queens, Sudoku, subset sum.

Randomized algorithms: use randomness. Quicksort with random pivot, Bloom filters, approximate nearest neighbor.