File Systems
File Concepts
A file is a named collection of related data stored persistently on a storage device.
File attributes: Name, identifier (inode number), type, location, size, protection, timestamps (created, modified, accessed).
File operations: create, write, read, seek, delete, truncate, open, close.
Open file table: OS maintains a table of open files per process (file descriptors). System-wide open file table also exists. Each entry tracks: file position, access mode, reference count.
File Types
| Type | Unix extension | Identified by |
|---|---|---|
| Regular | .txt, .c, .exe | Content |
| Directory | - | Inode type |
| Symbolic link | .lnk | Inode type |
| Special files (devices) | - | /dev/, inode type |
| Pipe | - | Created via mkfifo |
| Socket | .sock | Created via socket() |
Magic numbers: Unix identifies file type by first bytes (e.g., #! for scripts, ELF for executables, PK for ZIP).
File Access Methods
Sequential access: Read bytes in order, one after another. Natural for tapes; efficient for streaming.
Random (direct) access: Seek to any position, read/write at that offset. Required for databases.
Index-sequential: Index file points into sequential data file. Efficient range queries (B-trees are the modern form).
Directory Structure
Single-Level Directory
All files in one directory. Simple, but name conflicts across users.
Two-Level Directory
One directory per user. Separate namespaces but no grouping within a user.
Tree-Structured Directory
Hierarchical. Files have path names (absolute and relative). Most modern systems.
Acyclic Graph Directory
Allow shared subdirectories/files via hard links or symbolic links.
General Graph Directory
Allow cycles. Requires garbage collection for deletion.
Directory Implementation
Linear list: Array of (name, inode) pairs. Simple but O(n) search.
Hash table: Map file name to directory entry. O(1) average search. Collisions possible.
File System Implementation
Disk Layout
| Region | Contents |
|---|---|
| Boot block | Bootloader |
| Superblock | FS metadata (size, free block count, inode count) |
| Inode table | Array of inodes |
| Data blocks | File and directory contents |
Superblock stores critical metadata. Corrupting it destroys the entire filesystem. Typically replicated.
Inodes (Index Nodes)
Each file has an inode containing:
- File type, permissions, owner, group
- Size, timestamps
- Link count (number of hard links)
- Pointers to data blocks
Block pointers in Unix inodes:
- 12 direct block pointers (12 × 4KB = 48KB)
- 1 single indirect (4KB/4B = 1024 pointers → 4MB)
- 1 double indirect (1024² = 4GB)
- 1 triple indirect (1024³ = 4TB)
Directory Entries
A directory is a file whose content is a list of (name → inode number) mappings.
Free Space Management
- Bitmap: One bit per block (1=free, 0=used). Efficient for finding contiguous free space.
- Free list: Linked list of free blocks. Simple but slow to find contiguous space.
- Grouping/Counting: Store (first free block, count) pairs.
Common File Systems
| FS | OS | Notable features |
|---|---|---|
| ext4 | Linux | Journaling, extents, 1 exabyte max |
| XFS | Linux | High performance, 8 exabyte max |
| Btrfs | Linux | COW, snapshots, checksums, RAID |
| NTFS | Windows | Journaling, ACLs, compression |
| APFS | macOS | COW, snapshots, native encryption |
| FAT32 | Cross-platform | Simple, 4GB file limit |
| exFAT | Cross-platform | FAT32 successor, larger files |
| ZFS | Solaris/FreeBSD | COW, checksums, integrated RAID, snapshots |
Journaling
Log changes before applying them. On crash, replay journal to restore consistency.
Write-ahead logging:
- Write journal entry (transaction begin + changes)
- Commit journal entry
- Apply changes to filesystem
- Mark journal entry as done
Journal modes (ext3/ext4):
- Writeback: Only journal metadata. Fast but data may be lost.
- Ordered (default): Journal metadata, but write data before metadata commits.
- Journal: Journal both data and metadata. Slow but safest.
Virtual File System (VFS)
Abstraction layer allowing the kernel to support multiple filesystem types with a common interface.
User space: open(), read(), write(), close()
↓
VFS layer: file operations → inode operations
↓
Concrete FS: ext4, XFS, NFS, procfs...
VFS defines: inode, dentry (directory entry cache), file, super_block objects and their operation tables.
Buffer Cache / Page Cache
Disk reads/writes are cached in memory (page cache).
- Read: check cache first, read from disk on miss
- Write: write to cache (dirty), flush to disk later (writeback)
sync()/fsync()forces dirty pages to disk
Unified page cache: Modern Linux uses the same cache for file data and memory-mapped files.
RAID
Redundant Array of Independent Disks. Combines multiple disks for performance and/or reliability.
| Level | Description | Min Disks | Overhead |
|---|---|---|---|
| RAID 0 | Striping, no redundancy | 2 | 0% |
| RAID 1 | Mirroring | 2 | 50% |
| RAID 5 | Striping + distributed parity | 3 | 1/n |
| RAID 6 | Striping + 2 parity blocks | 4 | 2/n |
| RAID 10 | Mirror + stripe | 4 | 50% |
NFS (Network File System)
Allows a client to mount a remote filesystem over a network. Client VFS calls are translated to RPC calls to the NFS server.
- Stateless server (NFSv2/v3): Server doesn’t remember open files. Each request is self-contained.
- Stateful server (NFSv4): Better locking, better performance.
File System Consistency
fsck (file system check): Scans entire filesystem for inconsistencies after crash. Slow for large disks.
Replaced by journaling in modern filesystems (journal guarantees consistency without full scan).
Copy-on-Write filesystems (ZFS, Btrfs, APFS): Never overwrite data in place. Write new version, then atomically update pointers. No need for journaling; always consistent.