Aggressive cache use increases speed and efficiency of validating resolvers
Old bug in F5 software results in weeks of searching
Old bug in F5 software results in weeks of searching
RFC 8198 defines a mechanism for the reuse of NSEC(3) and wildcard records in the caches of validating resolvers. By first checking whether a domain name is within the range of a record already held in the cache, a resolver can avoid sending fresh DNS(SEC) queries.
However, resolver operators need to be aware of an old bug in the F5 software: older versions generate faulty NSEC records, which continue to cause problems when aggressive cache use is enabled. Familiarity with the bug can save you weeks of searching.
RFC 8198 defines a mechanism for the reuse of NSEC(3) and wildcard records in the caches of validating resolvers. By first checking whether a domain name is within the range of a record already held in the cache, a resolver can avoid sending fresh DNS(SEC) queries.
If, for example, your cache contains an NSEC record for the (alphabetical) interval bravo.example.nl–echo.example.nl as a result of previously querying the (non-existent) domain name charlie.example.nl, your resolver can reuse the record to immediately conclude that the domain name delta.example.nl doesn't exist either.
Similarly, the resolver can immediately give a positive response on the basis of a previously cached wildcard record.
[user@system ~]$ dig +norec +multi +dnssec TXT charlie.example.nl
@anytest1.sidnlabs.nl
; <<>> DiG 9.16.30-RH <<>> +norec +multi +dnssec TXT charlie.example.nl @anytest1.sidnlabs.nl
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 61750
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
; COOKIE: 01ba1cdf2e880b520100000062df0af94fdf0eea786e7f97 (good)
;; QUESTION SECTION:
;charlie.example.nl. IN TXT
;; AUTHORITY SECTION:
example.nl. 300 IN SOA ex1.sidnlabs.nl. hostmaster.sidn.nl. (
1075 ; serial
14400 ; refresh (4 hours)
3600 ; retry (1 hour)
604800 ; expire (1 week)
300 ; minimum (5 minutes)
)
example.nl. 300 IN RRSIG SOA 13 2 3600 (
20220823233422 20220724223422 7104 example.nl.
llG5YyoIpZ/ubvNld3O6DSWUP8AvBO3+gmhN3VC13hRo
3yxh7JrDhVKLwQRPJ1PhC591PN38bcakxjJCtcMBYA== )
example.nl. 300 IN NSEC _33078fba9732a68a53d15a15566f9857.example.nl.
A NS SOA MX TXT AAAA RRSIG NSEC DNSKEY HTTPS CAA
example.nl. 300 IN RRSIG NSEC 13 2 300 (
20220806144230 20220714040659 7104 example.nl.
qf7rOQnHaLOqL+mqpAb497OsoHjd8WXTFMwycQJMro2H
xOynUDvLZvSLYzoKTc6iwT2MHEouOveSKo27W9f75w== )
bravo.example.nl. 300 IN NSEC echo.example.nl. TXT RRSIG NSEC
bravo.example.nl. 300 IN RRSIG NSEC 13 3 300 (
20220815214455 20220716205201 7104 example.nl.
aT1XN9fBDuVpXdltnpKEidYBkY9qPX8YQcSxNZXlVC7/
e9mJ15t9jDHeeQqb9+8qh4qDpQ2tGKhk0Qgf9jPcXw== )
;; Query time: 12 msec
;; SERVER: 2001:678:8::53#53(2001:678:8::53)
;; WHEN: Mon Jul 25 23:28:25 CEST 2022
;; MSG SIZE rcvd: 577
Section 4.5 of RFC 4035 specified that, in order to maximise the current validity of its cache contents, a validating resolver should not reuse NSEC(3) and wildcard records, which the authors describe as blocking authoritative data. The section in question enforces that prohibition by the technical measure of requiring that a validating resolver's cache entries are based on complete domain names (FQDNs).
However, the update in RFC 8198 now recommends that, in the interests of speed and efficiency, NSEC(3) and wildcard records should be reused, so that, respectively, negative and positive responses can be given immediately. Retaining the records enables a validating resolver to synthetise its own response from the current (specific) query and the cached (more generic) response.
Such 'aggressive' cache use has since been implemented in popular validating resolver software, such as BIND, the PowerDNS Recursor and Unbound:
Software | Version | Support |
---|---|---|
BIND | NSEC (the 'synth-from-dnssec' option was enabled by default from version 9.12.0, but then disabled again from 9.14.18 due to security and performance problems; since version 9.18.1, it has been re-enabled) | |
2.0 | NSEC from version 2.0.0, NSEC3 from version 2.4.0 | |
NSEC/NSEC3 | ||
1.7.0. | NSEC (previously, the 'aggressive-nsec: yes' option had to be enabled manually; now it's enabled by default) |
According to the authors of RFC 8198 and the DNS resolver software developers, the efficiency gains offered by aggressive cache use should not be underestimated, further strengthening the argument for using DNSSEC.
The potential for efficiency gains is greatest on the root servers, where more than half the responses are NXDOMAIN responses [1, 2]. Hence, the new mechanism might help to reduce DoS attacks on the DNS system (although most such attacks are probably direct attacks that do not involve the use of caching resolvers). However, that spin-off benefit will not be realised until many more people start using DNSSEC.
In the .nl zone, NXDOMAIN responses account for a much smaller percentage of the total. In recent years, the figure initially rose to 10 to 15 per cent, before declining considerably in the last two years, and now stands at 5 to 10 per cent. However, that decline cannot be due to implementation of RFC 8198: until recently, the .nl zone was signed using the 'NSEC3 opt-out' method, implying that unsigned delegations (i.e. delegations that do have NS records, but don't have DS records) did not get their own NSEC3 records.
We disabled the 'NSEC3 opt-out' setting at the end of 2022, so that unsigned delegations also now have NSEC3 records. As well as improving the performance of validating resolvers that support aggressive caching, the change provides additional protection in the unlikely event of the .nl zone ever being hijacked, since the NS records cannot now be modified without also modifying the associated NSEC3 records.
We have not investigated the cause of the NXDOMAIN response peaks shown to above. The explanation may involve data collection problems, DoS attacks or (most likely) attempts to inventory the .nl zone by brute force.
Finally, we wish to highlight an old bug in network service provider F5's software that continues to cause problems when aggressive cache use is enabled. Older versions of the F5 software (based on BIND, with a bespoke front end that handles signing) generate faulty NSEC records, with some record types missing. Because the standard requires NSEC records to include all defined boundaries, resolvers that support RFC 8198 can assume that any missing record types do not exist. As a result, certain domain name-record type combinations may be rendered intermittently unreachable (depending on what is held in the resolver's cache).
Network and Systems Engineer Ruben van Staveren has first-hand experience of how difficult it can be to resolve an intermittent problem of that kind. One of his clients was sometimes unable to deliver mail to the domain 'minjenv.nl', because the MX record type was erroneously omitted from the NSEC record. While Van Staveren was investigating the problem, the client needed a way of continuing operations. A separate Unbound instance was therefore set up, alongside the dnsdist/PowerDNS system. However, that further complicated the task of tracking down the cause of the intermittent fault. It ultimately took Van Staveren weeks to discover exactly what was wrong, and (at the time of writing) the bug on the 'minjenv.nl' domain has still not been fixed.
Van Staveren is not the only person to have encountered the old bug. Senior PowerDNS Engineer Peter van Dijk says that he still comes across the problem every month or two. He 'resolves' it by configuring a negative trust anchor for the relevant domain name.
The persistence of the bug has even led some major DNS service providers and DNS resolver software developers to make their RFC 8198 implementations less aggressive (at the expense of performance) by not using NSEC(3) record boundary values for the generation of negative responses.
Although F5 fixed the bug in their software some time ago [1, 2], it seems that users consider updating their systems to be too challenging or expensive, with the result that resolver operators are now landed with the problem.
If you are an operator experiencing intermittent problems resolving a particular domain name, and you think that the bug described here may be the cause, check whether the authoritative name server publishes an NSEC record that is inconsistent with the record for the domain name in question, insofar as the NSEC record includes the domain name as a boundary, but not the relevant record type. In such circumstances, the most straightforward solution is indeed to define a negative trust anchor for the domain in question.