Lessons from the Cloudflare 1.1.1.1 Outage: A Resilience Perspective

22 July 2025

Internet Resilience Insights, Internet Society

Categories:

In short

An hour-long outage of one of the most popular global Domain Name System (DNS) services highlighted the need for greater diversity of this critical Internet infrastructure.
Internet Service Providers should set up their networks to query multiple DNS services or run their own local recursive resolvers where possible.
Internet users can also improve their connectivity resilience by configuring their devices to query for than one DNS service.

Last week (14 July 2025), Cloudflare's popular public domain Name Service (DNS) resolver service—1.1.1.1—experienced a significant outage that impacted users globally.

According to Cloudflare's post-incident report, the issue stemmed from a misconfiguration of legacy systems used to maintain the infrastructure that advertises Cloudflare's IP addresses to the Internet.

The incident, while relatively short-lived (62 minutes), exposed an essential weakness in how much the Internet ecosystem—particularly Internet Service Providers (ISPs), resolver operators, and end users—has come to rely on a few large, centralized DNS resolver services.

In this blog post, we unpack what this outage reveals about Internet resilience, especially about DNS resolution diversity, and what best practices ISPs and users should consider moving forward to improve DNS operational hygiene and resilience.

Why Did This Matter So Much?

At its core, the DNS is the Internet's address book. If DNS resolution fails, websites and apps become unreachable, even if the underlying network is otherwise functional.

The Cloudflare 1.1.1.1 service is one of the world's most widely used open DNS resolvers. As of early 2025, 1.1.1.1 handles approximately 1.9 trillion DNS queries daily, serving users across ~250 countries and territories.

Designed for speed and privacy, it has gained popularity with end users and ISPs who configure it as their default recursive resolver. According to W3Techs, Cloudflare's market share for DNS resolution is estimated to be around 14.6% of all websites.

Read: Is Big DNS Taking Over?

On 14 July, many of these users suddenly found themselves unable to access the Internet, not because their Internet connection was down, but because domain names were no longer being resolved. The effects were especially noticeable in countries where Cloudflare is often the go-to DNS provider (either because of performance or policy).

Implications for ISPs and Resolver Operators

For ISPs and network operators, the outage offers a clear warning that depending on a single upstream DNS resolver, however reliable it may seem, introduces a single point of failure.

Many small or regional ISPs, especially in developing regions, have limited resources to run and maintain their own caching resolvers. Outsourcing to a public resolver like 1.1.1.1 or 8.8.8.8 (Google DNS) is common in such cases. However, this model only works reliably if the selected provider remains available. When it fails, so does DNS service for all customers, essentially breaking the Internet experience.

This situation underscores the need for resilience by design, and one of the core principles is diversity, which is a recommended practice by KINDNS. Instead of relying on a single upstream DNS provider, operators should configure their systems to query multiple resolvers—such as Cloudflare (1.1.1.1), Google (8.8.8.8), Quad9 (9.9.9.9), or OpenDNS (208.67.222.222). With numerous providers in rotation, a failure in one service doesn't translate into a complete loss of DNS functionality—queries can continue to be resolved through alternative paths.

ISPs should run their own local recursive resolvers where possible, ideally with caching and DNSSEC validation. This setup improves user performance, security, and privacy and gives ISPs greater operational control. These local resolvers can still forward queries to external providers, but with the added benefit of caching and fallback logic that helps ensure continuity of service—even when an upstream goes dark.

Lessons for End Users

End users who manually configure 1.1.1.1 on their devices (usually for better performance or privacy benefits) might have also felt the outage.

Most users do not know how to assess DNS reliability or realize the potential consequences of hard-coding a single DNS provider on their phones, routers, or laptops. Therefore, when manually setting DNS configurations, at least two DNS providers should be configured. Another recommendation is to use your ISP's resolver if it supports modern standards and privacy practices.

The Cloudflare DNS incident reminds us that even well-engineered systems fail. The goal of a resilient Internet is not to eliminate all failures, but to mitigate their impact and ensure rapid recovery when they occur.

ISPs, end users, and resolver operators all have a role to play. By embracing redundancy, adopting best practices, and reducing structural dependencies, we can ensure a more resilient and inclusive Internet for all.

Photo by Nahil Naseer on Unsplash

Tags:

dns
outage