DNS error reporting by resolvers to authoritative name servers

RFC 9567 defines mechanism for automated reporting of issues

Envelopes flying through a virtual pipeline formed by ones and zeros.

DNS information exchange has traditionally been a one-way flow, from authoritative name servers to caching resolvers and end users. In the event of a problem – an expired DNSSEC signature, for example – the only way to notify a DNS operator has been by mail, phone or social media.

Now, however, RFC 9567 provides a mechanism for resolvers to send error messages to name servers. The mechanism is based on ordinary DNS message exchange, but with error data incorporated into queries.

Until now, there has been no way for an authoritative name server to receive reports from the outside world in the event of a DNS error. When a DNSSEC-validating resolver encountered a bogus domain name, all it could do was block the end user's access to it. As its name suggests, an authoritative name server was regarded as having sole responsibility for the correct signing of its zone, independently from the downstream DNS infrastructure.

Ordinary DNS message exchange

RFC 9567 is intended to change that by providing a mechanism that resolvers can use to (automatically) report problems to authoritative name server operators. First, an authoritative name server that wishes to receive reports has to send an EDNS0/OPT record of the Report-Channel type (option 18) with its responses. That record specifies an agent domain to which resolvers can send reports.

The reporting mechanism makes use of the ordinary DNS message exchange. Suppose that the authoritative name server for the domain servfail.nl includes the name of the agent domain agent.example.nl when responding to queries about servfail.nl. A resolver that's unable to validate servfail.nl can then send a query for the TXT record type to this specially constructed domain name (QNAME):

_er.1.servfail.nl.7._er.agent.example.nl

In that name, the value 1 specifies that the report relates to record type A. The full list is available on the IANA website. The value 7 is the number of the error message. The reports are defined in RFC 8914. In this case, it relates to an expired signature.

Note that the first value is placed after the '_er' label, and the second one before the label. The reports are structured that way to prevent them interfering with the normal address space. And, of course, the agent domain can't be a subdomain of the domain for which it accepts reports, otherwise any problem that makes the primary domain unreachable would also make the agent domain unreachable, and the reports would consequently be undeliverable.

The agent

At the agent address, there needs to be a special (authoritative) DNS server that receives incoming reports (in the form of queries, as described above) and processes them to generate statistics and, in appropriate cases, alerts.

The monitoring agent also sends DNS responses back to reporting resolvers, confirming receipt of their reports. Because all the information about a reported issue is contained within the query (QNAME) itself, the content (RDATA) of the associated response has no function beyond confirmation of receipt. What does matter is the stated TTL value, because the new reporting system makes use of the normal DNS caching mechanism to prevent resolvers swamping agents by reporting the same issue again and again.

The idea is that agents are separate systems whose function is to gather incoming reports, generate statistics and, where appropriate, send alerts. An agent is therefore comparable to a (commercial) DMARC report processing service. "My view is that automated error processing doesn't really belong in the DNS protocol," says Roy Arends, Principal Research Scientist at ICANN and one of the editors of RFC 9567. "The agent can be incorporated within an authoritative name server, such as PowerDNS or NSD, but the evaluation of incoming reports and the rectification of errors are really quite separate."

No authentication mechanism

In order to prevent the spoofing of error messages to the agent portal, reporting resolvers are required to use either TCP (rather than the much more efficient UDP) or DNS Cookies (a simple authentication mechanism for securing DNS connections over UDP). In addition, QNAME minimisation is used to enhance privacy.

The new DNS error reporting system does not feature an authentication mechanism comparable to the use of authorisation records in DMARC, which allows an error processor to explicitly communicate their willingness to receive messages for a given mail domain. With DNS error reporting, therefore, anyone can in principle send any kind of error messages to an agent, or set up a bogus domain so that error messages are sent to an agent by others. However, Arends doesn't see that as necessarily being problematic. "In the DNS world, anyone can send anything to anyone else. There's no existing relationship between a resolver and an authoritative name server, on which you might base some kind of authorisation mechanism. And you wouldn't want to burden resolvers with additional cryptographic responsibilities anyway, because they already have enough to do. In practice, therefore, an agent will receive all kinds of rubbish. However, real errors will trigger consistent patterns of reporting, which the agent can respond to."

Final safety net and troubleshooting

At the moment, it takes nearly half an hour to refresh the .nl zone for each new publication. Since the upgrade from DNSSEC algorithm 8 to algorithm 13 last summer, it takes just as long to perform all the tests and to validate the entire zone as it does to sign the zone.

Where the .nl zone is concerned, the use of RFC 9567 could never be a substitute for the existing checks, but it could be a useful supplement to them. "Even if we were ultimately to go over to dynamic updates – and we don't yet have any definite plans to do so – we wouldn't want to be less thorough than we currently are," says Marco Davids, Research Engineer at SIDN Labs. "Checking everything thoroughly prior to publication may be more complicated, but we can't be reliant on other people to tell us when something's wrong. Nevertheless, DNS error reporting could be used in addition to the existing checks, as a final safety net to catch errors that have somehow evaded detection."

"I could see RFC 9567 playing an important role with less essential second-level domain names. In that context, DNS error reporting could be an easily implemented addition that helps with prompt problem resolution."