Transport Layer
The transport layer (Layer 4) provides end-to-end communication services between applications on different hosts. It multiplexes multiple application flows over a single network connection using port numbers, and provides either reliable (TCP) or unreliable (UDP) delivery.
Ports and Multiplexing
A single host runs many applications simultaneously. Port numbers distinguish them.
Socket: identified by the 5-tuple: (protocol, local IP, local port, remote IP, remote port).
Well-known ports (0-1023): HTTP=80, HTTPS=443, SSH=22, FTP=20/21, SMTP=25, DNS=53, DHCP=67/68.
Registered ports (1024-49151): assigned by IANA to specific services.
Ephemeral ports (49152-65535): dynamically assigned to client sockets by the OS.
Demultiplexing: the transport layer uses the destination port to deliver a received segment to the correct application process.
TCP In Depth
Three-Way Handshake
Client Server
|-------- SYN (seq=x) ----->|
|<- SYN-ACK (seq=y, ack=x+1)|
|------ ACK (ack=y+1) ----->|
ESTABLISHED
Initial Sequence Numbers (ISNs): chosen randomly to prevent spoofing and old segment confusion. Each side has its own ISN.
SYN cookies: server encodes connection state into the ISN when under SYN-flood attack. Avoids maintaining state for half-open connections.
Reliability: Sequence Numbers and Acknowledgments
Sequence number: the byte offset of the first byte in the segment’s payload.
Acknowledgment number: the next byte the receiver expects. All bytes before this have been received.
Cumulative acknowledgment: one ACK covers all data up to the ACK number.
Selective acknowledgment (SACK): the receiver can acknowledge out-of-order blocks. TCP option (3 SACK blocks in a typical header extension). Enables efficient recovery of multiple losses.
Retransmission
Retransmission timeout (RTO): if no ACK is received within RTO, retransmit the segment. RTO is estimated using RTT:
\[\text{SRTT} = (1-\alpha)\text{SRTT} + \alpha \text{RTT}_{\text{sample}}, \quad \alpha = 0.125\] \[\text{DevRTT} = (1-\beta)\text{DevRTT} + \beta |RTT_{\text{sample}} - SRTT|, \quad \beta = 0.25\] \[\text{RTO} = \text{SRTT} + 4 \times \text{DevRTT}\]Fast retransmit: if the sender receives 3 duplicate ACKs, it retransmits the missing segment immediately (without waiting for RTO).
Flow Control
Prevents the sender from overwhelming a slow receiver.
Receive window (rwnd): advertised in every TCP segment. The sender may not have more than rwnd unacknowledged bytes in flight.
Where cwnd is the congestion window (see below).
Congestion Control
Prevents the sender from overwhelming the network.
Slow Start: begin with cwnd = 1 MSS; double cwnd every RTT until ssthresh (slow start threshold) is reached.
Congestion Avoidance: after ssthresh, increase cwnd by 1 MSS per RTT (additive increase).
On loss (RTO timeout): ssthresh = cwnd/2; cwnd = 1 MSS; restart slow start.
On loss (3 dup ACKs, TCP Reno): ssthresh = cwnd/2; cwnd = ssthresh; enter congestion avoidance.
TCP CUBIC (Linux default): uses a cubic function of time since last congestion event to set cwnd. Better for high-bandwidth, long-delay networks.
BBR (Bottleneck Bandwidth and Round-trip propagation time): estimates actual bottleneck bandwidth and minimum RTT; targets optimal operating point. Used in Google’s infrastructure and YouTube.
ECN (Explicit Congestion Notification): routers mark packets (instead of dropping) when queues are building. Receivers echo the mark via the ECE flag; senders reduce cwnd without packet loss.
TCP Connection Termination
Active close: sends FIN; enters FIN_WAIT_1; receives ACK (FIN_WAIT_2); receives FIN; sends ACK; enters TIME_WAIT for 2×MSL (60-120 s) before CLOSED.
TIME_WAIT: ensures the final ACK reaches the other side; prevents delayed packets from a closed connection from being interpreted as a new connection.
RST (Reset): abrupt termination. Sent when receiving a packet for a non-existent connection, or when an application wants to abort.
UDP In Depth
Header: 8 bytes: source port (2B), destination port (2B), length (2B), checksum (2B).
Checksum: optional in IPv4; mandatory in IPv6. Covers pseudo-header (src IP, dst IP, protocol, UDP length) + UDP header + data.
No connection state: a server can handle requests from many clients without maintaining per-client state.
Use cases: DNS (fast single-request-response), DHCP, streaming (RTP/RTSP), gaming (position updates), QUIC (HTTP/3 runs over UDP).
QUIC
A modern transport protocol designed by Google, now standardized (RFC 9000). Runs over UDP.
Key features:
- 0-RTT and 1-RTT connection establishment (vs. TCP’s 1 RTT + TLS’s 1-2 RTTs).
- Built-in TLS 1.3 encryption (no separate TLS handshake).
- Multiple independent streams within a connection (no head-of-line blocking).
- Connection migration: a connection can survive changing IP addresses (mobile users switching between Wi-Fi and cellular).
- Better congestion control flexibility (per-connection).
HTTP/3: uses QUIC as the transport. Reduces page load times, especially on lossy or high-latency connections.
TCP vs. UDP
| Property | TCP | UDP |
|---|---|---|
| Connection | Yes (3-way handshake) | No |
| Reliability | Guaranteed delivery | Best effort |
| Ordering | In-order delivery | No ordering |
| Flow control | Yes | No |
| Congestion control | Yes | No |
| Header size | 20+ bytes | 8 bytes |
| Latency | Higher | Lower |
| Use cases | HTTP, SSH, email | DNS, streaming, gaming |