Photo of a sign post showing distance to different cities

Where Are Your Country’s Most Popular Websites Hosted?

Picture of James Madeley
Guest Author | Loughborough University and Pulse Research Fellow
Twitter logo
LinkedIn logo
Facebook logo
June 21, 2024

Accessing Internet content stored on servers located in your country is faster, cheaper, and more reliable than fetching content from another country. Enabling infrastructure such as Internet Exchange Points (IXPs), data centers, and content caches facilitates this. However, this infrastructure is only widely available in some countries.

The Internet Society has been helping to establish and advocate for these for many years. It recently set itself an ambitious plan to keep at least half of all Internet traffic in selected economies local by 2025. For more information, see the Internet Society’s 50/50 Vision Methodology.

To succeed in this effort, we must understand how much content is currently hosted locally and track changes.

Read: Measuring Internet Traffic Locality First Step Towards 50/50 Vision

My work as a 2024 Pulse Research Fellow involves implementing and evaluating a platform for conducting locality measurements. This post provides an overview of the current state of my research at the halfway point of the fellowship.

Understanding Internet Traffic Patterns is Challenging 

Determining whether traffic is local first starts with understanding content popularity. Generally, most Internet traffic is sourced from the most popular websites. Several Internet top list providers exist, such as Similarweb, Cloudflare, and Tranco. In this study, we used Google’s Chrome User Experience (CrUX) Report, which provides a monthly breakdown of the top 1,000 websites (and more, if required) accessed by Chrome users, split by country. Previous research shows that CrUX is more accurate than other top lists. 

Ideally, traffic volume towards specific domains would have been a more appropriate metric to measure traffic locality. Unfortunately, getting access to traffic volume data on a per-country basis is very difficult, as this information is usually available only to Internet Service Providers (ISPs). We, therefore, decided to use the list of most popular websites as a proxy for measuring traffic locality. 

How is Content Hosted? 

Websites have three ways to host content: natively, using Content Delivery Networks (CDNs), or a mix of both. CDNs have geographically distributed servers to host and serve content close to the end users. By doing so, they considerably reduce the latency of access to the content. Some services, such as online streaming or social media platforms, operate their own caches globally (for example, Netflix OpenConnect and Facebook Content Distribution Network)  

Once we have the list of websites by country, we determine whether they are natively hosted or hosted on a CDN platform. To do this, we perform a series of lookups on the domain’s IP (WHOIS, CNAME, and HTTP Response header). We use a modified version of the FindCDN tool to run our measurements.  

The benefit of grouping websites by hosting platform (that is, by CDN) is that we only need to find out whether the CDN is local to a country. It is reasonable to assume that devices will be sent to the same endpoints for all websites using a specific CDN, so testing individually becomes redundant.  

Geolocating CDN Caches 

To determine the locality of the CDN caches, we analyze location hints (geo-hints) found while retrieving objects from each CDN. To do so, we must run these measurements from the country of interest. We use residential proxies and leverage their vast network to run measurements locally. 

Most CDN platforms tested provide ‘geohints’ in the HTTP Response Header, usually as an airport IATA code. We can then determine whether the point of presence is the same as our testing source. 

Once we have collected the data per country, we can estimate the number of domains, out of the top 1,000, that are hosted locally.

Filtering the results provides some interesting perspectives on the general state of traffic locality. For example, we can categorize websites and see whether the type of site affects whether it is likely to be local. We can also see which providers have the most local sites.

There are also more obvious ways to examine the data, such as determining whether a country’s continent or economic status affects locality. Over time, we can also understand how geopolitical issues affect traffic locality.

All This, and We’re Only Halfway! 

The search methodology is nearly complete, and results are now being gathered.

We will collect this data for all countries weekly so that we can track the evolution of content locality. Tracking this data will help guide our advocacy work. Additionally, we will explore the role of IXPs in bringing content closer to end-users.  

The next steps are to produce visualizations (see an example below) and formalize our process into a full-length paper.

So far, this project has been very fulfilling. Working towards a significant goal such as the 50/50 Vision can sometimes feel daunting, but the milestones are clear, and we know the next steps. I look forward to seeing what we have achieved by the end of this fellowship and what the work can lead to in the future.