Photo of a Cisco 7301 router and a Juniper M7i, part of the K root-server instance at AMS-IX

Measuring DNS Root Servers Under Change

Picture of Florian Steurer
Guest Author | Max Planck Institute for Informatics
Categories:
Twitter logo
LinkedIn logo
Facebook logo
January 21, 2025
In short
  • Root server co-location is prevalent but mostly low, with ∼70% of vantage points observing co-location of at least two servers.
  • While most traffic is kept local, many requests are routed to remote replicas. Upstream providers and individual routing policies play a major role in this.
  • Diversifying last-hop infrastructure at certain sites may improve redundancy.

The Domain Name System (DNS) is an enormous, hierarchical, and distributed database. At the top of this hierarchy, the root zone, served by the root servers, provides the (logical) starting point for all name resolutions. As most Internet applications rely on the DNS, the resilience of these root servers is critical for the functioning of the Internet.

Read: The Internet Domain Name System Explained for Non-Experts

Luckily, the root server system (RSS) is a prime example of a resilient system. This resilience is achieved through diversity and redundancy measures, such as:

  • 13 root servers (identified by the letters’ a’ to ‘m’) operated by 12 independent organizations.
  • Each root server has geographically distributed replicas. In total, the RSS contains over 1,900 server instances.
  • These instances are running different software stacks.

Before we examine the RSS’s resilience in more detail, it’s crucial to understand how to measure such a distributed system.

Measurement Setup

Root servers use IP Anycast to direct clients to server instances. For example, a client from Japan may be routed to a server instance in Tokyo, while a client from Europe may end up in Amsterdam, even though both clients contacted the same IP address.

This also means that the observed behavior may change depending on a client’s geographical or topological location. We need clients (or vantage points) in networks worldwide to capture those differences.

As part of a recent study, my colleagues and I at the Max Planck Institute for Informatics, Deutsche Commercial Internet Exchange (DE-CIX), and BENOCS GmbH used 675 vantage points from NLNOG RING in 523 networks and 62 countries (Figure 1).

World map showing the Figure 1 — locations of the 675 vantage points used in the study
Figure 1 — Locations of our 675 vantage points.

With more than 1,900 server instances, finding new deployment locations can be challenging. It is generally attractive to deploy at places with good (local) connectivity, such as data centers or Internet Exchange Points (IXPs). However, reusing the same last-hop infrastructure may reduce the redundancy of an anycast setup.

By collecting traceroutes from our vantage points to the 13 different root servers, we can quantify the reuse of last-hop infrastructure. If two routes from a vantage point share the second-to-last hop (with the last hop being the root server itself), these root servers will likely be co-located. We then define the ‘reduced redundancy’ as the total number of second-to-last hops minus the unique number.

Table of bar charts showing the number of vantage points in each region with reduced redundancy
Figure 2 — Reduced Redundancy due to multiple root servers sharing the hop-before-the-last.

Our study found that some co-location is prevalent, with 70% of vantage points observing the co-location of at least two servers. At some vantage points, for example, in Oceania (Figure 2), we observed a significant Reduced Redundancy (n=6) for IPv6.

However, a small ‘reduced redundancy’ is not necessarily better. For example, in Africa, we found that traffic is routed out-of-continent (increasing the number of unique paths), even though local replicas, such as l.root, are available. This underlines the role and importance of upstream providers when considering RSS resilience.

Keeping Traffic Local

As we have seen, Africa’s ‘reduced redundancy’ traffic is sometimes routed out of the continent.

However, keeping traffic local can improve resilience and is an explicit goal of the Internet Society.

Notably, raw performance (in terms of RTTs) is not a primary concern for the root servers, as their answers are typically cached in local resolvers.

Using special queries and a public map of server sites, we can elicit a server to give us an actual instance identifier and calculate the distance between the querying vantage point and the answering server replica.

Scatter plot showing the distance (in kilometers) per request from vantage point to b.root (IPv4) from o to 10,000km
Figure 3 — Distance per request from vantage point to b.root (IPv4).

Figure 3 shows the difference in distance between the geographically closest global instance and the one to which each request to the b.root was routed. 78% of requests are routed to their closest global replica and land on the diagonal.

We also observed a cluster of requests that travel an additional 5,000 to 10,000 kilometers. These requests are from vantage points located in Europe and routed to North America. Again, we find these effects are often caused by the routing decisions of upstream providers.

Learn More

If you found this interesting, check out our paper, where we provide a more in-depth analysis, examine an IP address change in the RSS, and evaluate the zone transfer mechanism in the context of the new ZONEMD record.

Florian Steurer is a PhD student at the Max Planck Institute for Informatics. His research focuses on measuring DNS and resilience.


Photo by Bas van Schaik VIA Wikimedia Commons