Denys Inhul

ch2: protocol on the wire

date: May 14 2026

1. DNS message format

A DNS message is one of the smallest useful packets on the internet: a 12-byte header followed by four variable-length sections. Query and response use the same format. The shape hasn’t changed since RFC 1035 (1987).

[diagram placeholder] 12-byte header laid out as six 16-bit fields (ID, FLAGS, QDCOUNT, ANCOUNT, NSCOUNT, ARCOUNT), plus the four variable-length sections (Question, Answer, Authority, Additional) showing who fills them in and what they hold.

The 12-byte header

Six 16-bit fields. The ones worth knowing:

  • ID: a random number the client picks, echoed back by the server so the client can match the response to the question. This is the field a Kaminsky-style attacker tries to guess (Chapter 6).
  • Flags: packed into one 16-bit word: QR (query or response), OPCODE, AA (authoritative answer), TC (truncated, response too big for UDP), RD (recursion desired, set by client), RA (recursion available, set by server), and a 4-bit RCODE (0 = ok, 3 = NXDOMAIN, others).
  • QDCOUNT, ANCOUNT, NSCOUNT, ARCOUNT: how many records each of the four sections below holds.

The four sections

The sections are Question, Answer, Authority, and Additional. A query carries one question and leaves the other three empty. A response echoes the question back and fills in whatever the server has to offer:

sectionholdsexample
Questionthe name + type + class being asked”what is the A record for www.cloudflare.com?”
Answerrecords that directly answer the questionA 104.16.132.229
AuthorityNS records for the zone that owns the answer; used in referrals”I don’t know, but ask ns1.example.com
Additionalhelpful extras, e.g. the A record for an NS hostname mentioned aboveso the client doesn’t have to do a separate lookup

The resource record

Every record, in every section, has the same shape:

NAME      (variable length, compressed)
TYPE      (2 bytes; A=1, AAAA=28, MX=15, NS=2, TXT=16, CNAME=5)
CLASS     (2 bytes; always IN=1 on the public internet)
TTL       (4 bytes, seconds; the cache budget from Chapter 1)
RDLENGTH  (2 bytes, length of the data below)
RDATA     (variable, format depends on TYPE)

That’s it. Everything else in this book is either fancier ways to put records into one of the four sections, or fancier ways to deliver this packet. The data model is RRs in named sections, and that’s the entire semantic surface of DNS.

Name compression

If a single packet mentions www.example.com, mail.example.com, and example.com, the protocol can’t afford to repeat example.com three times. Compression lets later occurrences of a suffix become a 2-byte pointer back to the first one. RFC 1035 requires pointers to reference earlier bytes only, which keeps parsing linear and rules out loops. The implementation history is littered with parser bugs that didn’t enforce that rule, but the rule itself works; a typical response shrinks by 30 to 50 percent.

[diagram placeholder] Uncompressed packet (71 bytes) vs compressed (43 bytes, 40% savings). Shows the C0 04 2-byte pointer convention, with arrows back to the first occurrence of example.com.

[demo placeholder] Live DoH packet inspector: type a name and record type, page fires a real DoH query to 1.1.1.1, shows the response as a color-coded hex+ASCII dump with hover-to-explain on each byte.

2. Why UDP, and TCP fallback

A DNS exchange is one tiny round-trip: client asks, server answers, done. Two packets.

Compare that to what TCP would charge. A TCP connection costs a 3-way handshake (SYN, SYN-ACK, ACK), then the actual data (PSH-ACK with the question, PSH-ACK with the response), then a 4-way teardown (FIN plus ACK each way). Seven packets and two round-trips of setup before any data flows, all to deliver a 50-byte question and an 80-byte answer. The overhead dwarfs the payload by an order of magnitude.

UDP skips it. One packet out, one packet back, no setup, no teardown, no per-connection state on the server. That economy is why DNS runs on UDP.

TCP isn’t disabled though. The protocol falls back to it in three cases. First, when the response didn’t fit in a UDP packet: the server returns what it can with the TC bit set, and the client retries the whole query over TCP. Second, for zone transfers (AXFR / IXFR), which can be megabytes and need a reliable stream. Third, for the encrypted variants: DoT (DNS over TLS, port 853) and DoH (DNS over HTTPS, port 443) both ride TCP. DoQ (DNS over QUIC, RFC 9250) is the newer entrant: UDP underneath, QUIC’s encryption and reliability on top. Chapter 6 covers all three.

3. The 512-byte limit

RFC 1035 capped a UDP DNS message at 512 bytes. The number wasn’t arbitrary: IPv4’s minimum reassembly buffer is 576 bytes; subtract 20 for the IP header and 8 for UDP, you have 548; round down for safety, you get 512. Below that, your packet is guaranteed not to fragment on any 1980s-era IP network.

In 1987 that was generous. By the late 1990s it was a wall. IPv6 records (AAAA, 16 bytes each), DNSSEC signatures (RRSIG records, hundreds of bytes each), and modern zones returning many A records for load distribution all push responses past 512 quickly. When the server can’t fit the answer, it sets the TC bit and the client has to retry over TCP, paying the connection-setup tax for every oversized response.

4. How EDNS(0) works

Renegotiating the cap without breaking every existing resolver was the trick. RFC 2671 (1999, revised as RFC 6891 in 2013) introduced an OPT pseudo-record that lives in the Additional section. It isn’t a real record; it’s a way to smuggle extension data inside the existing message format. Resolvers that don’t understand OPT ignore it. Resolvers that do understand it read out the fields they need.

The most important field is the UDP payload size the querier is willing to receive. 4096 bytes used to be the common value. The post-2020 conservative default is 1232, the IPv6 minimum MTU (1280) minus headers, which stays below the path-MTU floor and avoids IP-layer fragmentation.

Once that field is set, the server can return larger answers over UDP without triggering the TC-bit fallback. Every DNS feature added after 1999 (DNSSEC, EDNS Client Subnet, DNS Cookies, the signalling bits for DoH and DoT) lives in an OPT field. Without OPT, DNS would have ossified in 1987.

5. Interesting bits

EDNS makes amplification cheap

A 50-byte query that elicits a 4000-byte response is an 80× amplifier. Spoof the source IP to your victim’s address, send the query to any open resolver, and the resolver dutifully sends a 4000-byte response to the victim. Across a botnet this saturates gigabit links. The 2013 Spamhaus attack hit ~300 Gbps using exactly this pattern. Chapter 6 covers the defenses.

Why 1232 specifically

Responses larger than the path MTU get split into IP fragments. Lots of middleboxes (home routers, corporate firewalls, NATs) silently drop fragmented UDP. The post-2020 1232-byte cap is the largest payload that fits inside an IPv6 minimum-MTU packet after headers, so it avoids fragmentation entirely. Below the cap, you stay in the safe zone.

DNS Flag Day

For years, resolver and authoritative-server vendors carried code that worked around middleboxes which dropped or mangled EDNS-marked queries. The workarounds were expensive and growing. In 2019 and again in 2020, all the major vendors coordinated a “Flag Day”: simultaneously, they stopped supporting the workarounds. Operators of broken servers were forced to upgrade. It worked, and it’s one of the rare examples of the DNS community doing successful coordinated deprecation.

6. QNAME minimization

When the resolver walks the hierarchy (root, TLD, second-level, leaf), each upstream only needs the part of the name it’s responsible for. The root needs the TLD. The TLD needs the second-level name. Only the leaf nameserver needs the full thing.

For three decades, the resolver sent the full name at every hop anyway. Looking up private-clinic-name.medical.org meant sending that entire string to the root, then again to .org, then again to medical.org. The protocol had no way to ask “who runs .org?” without revealing what you were about to ask underneath it.

That made every full query name visible to every upstream on its path. Three concrete consequences are worth knowing:

  • NSA’s MORECOWBELL. Documents released in 2015 described a passive DNS surveillance program built around exactly this leak, pulling query data off network taps near major resolvers and authoritative servers.
  • Compromised auth servers learning sensitive content. Internal subdomains often leak project codenames, infrastructure layout, or testbed locations. A resolver that asks secret-launch-q4-2024.internal.bigco.com reveals the project’s existence to every upstream on the way.
  • Tor correlation attacks. Tor exit nodes do DNS lookups for the destinations their users visit. A 2016 study (Greschbach et al., “The Effect of DNS on Tor’s Anonymity”) showed that an attacker watching DNS at the resolver and traffic at a guard node can correlate the two and deanonymize users. The leak that made it possible was QNAME’s verbosity.

RFC 7816 (2016, updated by RFC 9156 in 2022) defines QNAME minimization: send only the label the upstream needs. To the root, ask for .org. To .org, ask for medical.org. Only the final authoritative server ever sees the full name.

[diagram placeholder] Before/after of what each upstream sees. Pre-2016: full leaf name sent to root, TLD, and authoritative. Post-2016: each upstream gets only the labels it needs.

The fix took thirty years mostly because nobody was pushing for it. The 1987 design assumed an internet of trusted research hosts; DNS as a privacy surface wasn’t on anyone’s mind. By the time it should have been, the system was load-bearing and the change had no commercial sponsor. The Snowden disclosures in 2013 made the leak feel concrete, and within a few years implementations started shipping.

Rollout had its own friction. A handful of authoritative servers misbehaved when asked about partial names, returning errors or empty responses for anything shorter than the full query. Early QNAME-minimizing resolvers had to fall back to a “relaxed” variant that approximated the strict behavior without tripping those servers. By 2024 every major public resolver (Cloudflare, Google, Quad9, plus the default builds of Unbound and BIND) implements it, and most ISP resolvers have followed.

The resolver-to-upstream leak is largely closed. What the resolver itself sees, and what the network between you and the resolver sees, are separate problems. Chapter 6 covers both.

Next: ch3: who is in charge


Further reading