Data Models

A data model defines how data is represented, organized, and queried. The choice of data model determines what operations are natural and efficient.

Relational Model

Proposed by Edgar Codd in 1970. Data is organized as a collection of relations (tables).

Key concepts:

Relation (table): a set of rows (tuples), each with the same attributes (columns).
Attribute: a named column with a domain (type).
Tuple (row): a single record.
Schema: the name of the relation and its attributes with types.
Primary key: a set of attributes that uniquely identifies each tuple.
Foreign key: a set of attributes that references the primary key of another relation.

Example:

Student(id INT PK, name VARCHAR, dept_id INT FK -> Dept)
Dept(id INT PK, name VARCHAR)

Query language: SQL (Structured Query Language), based on relational algebra.

Strengths: strong theoretical foundation; declarative queries; enforced integrity constraints; powerful joins.

Used in: PostgreSQL, MySQL, SQLite, Oracle, SQL Server.

Document Model

Data is organized as documents (typically JSON or BSON).

{
  "_id": "student_123",
  "name": "Alice",
  "dept": {"name": "CS", "building": "Gates"},
  "courses": [
    {"code": "CS101", "grade": "A"},
    {"code": "CS201", "grade": "B+"}
  ]
}

Strengths: flexible schema (documents in the same collection can have different fields); natural fit for hierarchical or nested data; no need for joins (embed related data).

Weaknesses: no standard query language; joins are expensive; consistency guarantees vary.

Used in: MongoDB, Firestore, CouchDB, RavenDB.

Key-Value Model

The simplest model: a mapping from keys to values. The DBMS treats values as opaque blobs.

Operations: get(key), put(key, value), delete(key).

Strengths: extremely fast (O(1) for all operations); horizontally scalable; simple.

Weaknesses: no structure in values; no complex queries; no joins.

Used in: Redis (values can be strings, lists, sets, hashes, sorted sets), DynamoDB (structured key-value), Memcached (pure cache).

Column-Family Model

Data is organized as rows with sparse, dynamically typed columns grouped into column families.

Example (Cassandra):

Table: user_events
Partition key: user_id
Clustering key: event_timestamp
Columns: event_type, event_data

Strengths: excellent write throughput; horizontal scalability; efficient for queries on a subset of columns.

Weaknesses: data modeling must match query patterns; no joins; limited consistency options.

Used in: Apache Cassandra, HBase, Google Bigtable.

Graph Model

Data is organized as nodes (entities) and edges (relationships), both with properties.

(:Person {name: "Alice"})-[:KNOWS]->(:Person {name: "Bob"})

Strengths: natural for relationship-heavy data; efficient multi-hop traversal; expressive query languages (Cypher, Gremlin).

Weaknesses: harder to scale horizontally; less mature tooling than relational.

Used in: Neo4j, Amazon Neptune, JanusGraph.

Use cases: social networks, fraud detection, recommendation engines, knowledge graphs.

Time-Series Model

Optimized for sequences of timestamped measurements.

Key properties: immutable past data; write-heavy with sequential timestamps; queries typically involve time ranges and aggregations.

measurement: cpu_usage
tags: host=server1, datacenter=us-east
time: 2024-01-15T12:00:00Z, value: 72.3
time: 2024-01-15T12:00:05Z, value: 74.1

Used in: InfluxDB, TimescaleDB, Prometheus, Apache Druid.

Wide-Column / Columnar Storage

Row-oriented storage (OLTP): stores entire rows together. Fast for retrieving all columns of a row (point lookups, updates).

Column-oriented storage (OLAP): stores each column separately. Fast for aggregations over one or a few columns of many rows. Excellent compression (similar values together). Used in analytics databases.

Store	Row-based	Column-based
Storage	All cols of a row together	All rows of a col together
Read all cols of one row	1 I/O	$n$ I/Os
Aggregate over 1 col	Read all cols	Read 1 col
Best for	OLTP	OLAP
Examples	PostgreSQL, MySQL	Redshift, Parquet, DuckDB

Entity-Relationship Model

A conceptual data model used for database design. Entities (things), attributes (properties), and relationships (associations between entities).

ER diagram components:

Entity: rectangle.
Attribute: oval.
Relationship: diamond.
Cardinality: 1:1, 1:N, M:N.

ER diagrams are translated to relational schemas: entities become tables; M:N relationships become join tables.

Choosing a Data Model

Requirement	Best Model
Complex queries, joins, integrity	Relational
Flexible, nested, hierarchical data	Document
Simple caching, sessions	Key-value
Write-heavy, distributed, time data	Column-family
Relationship traversal	Graph
Timestamped metrics, events	Time-series
Analytical aggregations	Columnar