Photo of an open yellow pages book

The Internet Yellow Pages

Picture of Romain Fontugne
Guest Author | IIJ Research Lab
Categories:
Twitter logo
LinkedIn logo
Facebook logo
February 11, 2025
In short
  • The Internet Yellow Pages (IYP) integrates more than 50 measurement datasets to help users analyze Internet topologies.
  • Having all these datasets in one place is handy for conducting large-scale studies into the scale of the use of best current practices such as routing security.
  • The IYP is an open-source project.

Understanding how the Internet is performing requires skill in collating and analyzing data from multiple sources. Pulse is one project that is helping to do this by providing an overview of Internet accessibility, evolution, and resilience. For those wanting to dig further, we present the Internet Yellow Pages (IYP).

In a nutshell, the IYP integrates 50+ datasets to provide a unified database to study the Internet.

Figure 1 depicts an extract of IYP showing how the isoc.org website (left green node) can be accessed via four IP addresses (pink nodes) that are part of two prefixes (blue nodes). Both originated from AS13335 (red node), the content delivery network (CDN) Cloudflare (orange node). In addition, the prefixes are categorized as Internet Routing Registry (IRR) Valid, Resource Public Key Infrastructure (RPKI) Valid, and RPKI to indicate each path’s routing security resilience.

Infographic showing the routing path for the website isoc.org
Figure 1 — Example showing how data is modeled in IYP. The green node represents the ‘isoc.org’ hostname, the pink nodes are IP addresses, the blue nodes are IP prefixes, the brown nodes are tags, the red node is an autonomous system (AS), and the orange node is the AS name.

This example combines data from six organizations: OpenINTEL for the domain name system (DNS) resolution; BGP.Tools for the ‘Anycast’ tag; Internet Health Report (IHR) for IRR and RPKI status; BGPKIT for the border gateway protocol (BGP) data; RIPE NCC for the RPKI data; and PeeringDB for the AS name.

Learn more about the Internet Society’s work to secure global routing

You can reproduce this graphic for yourself using the following query:

MATCH p0 = (:HostName {name:'isoc.org'})-[:RESOLVES_TO]-(:IP)-[:PART_OF]-(pfx:Prefix)-[:ORIGINATE]-(orig:AS)-[:NAME {reference_org:'PeeringDB'}]-(:Name)

OPTIONAL MATCH p1 = (pfx)-[:CATEGORIZED]-(:Tag)

OPTIONAL MATCH p2 = (orig)-[:CATEGORIZED]-(:Tag {label:'Content Delivery Network'})

RETURN p0, p1, p2

Having all these datasets in one place is also handy for conducting large-scale studies. For example, instead of looking at a single hostname, we can extend the previous example by looking at Tranco’s top 1M popular hostnames and counting how many of these map to prefixes registered in RPKI.

Spoiler alert: 80% of the top 1M popular hostnames are covered by RPKI thanks to CDNs following best routing practices. (Learn more)

A benefit of getting these results with IYP is that they can easily be shared and reproduced. For instance, you can execute the above query or reproduce the RPKI result mentioned above by executing the queries shared in this notebook. Hence, anyone with queries can produce the results and update them using fresh IYP data. We also illustrate this in an APNIC blog where we analyze the Internet’s topology in Japan and share the query for each result.

Getting Started with IYP

The easiest way to browse the IYP database is to visit the IHR website. You can search for an Internet resource (for example, AS, prefix, domain name) and get IYP data related to that resource. 

Figure 2 illustrates the ‘routing’ view for the Internet Initiative Japan network (for example, connected ASes and announced prefixes). All other views (except the ‘monitoring’ one) provide IYP data via different widgets.

Screenshot of the Internet Health Report website showing the routing data for AS2497
Figure 2 — IYP routing data for AS2497. The IHR website is the simplest way to query IYP.

You’ll find the following tabs for each widget:

  • The Chart tab shows a visual representation of the data.
  • The Data tab gives the raw data in a table format you can download for further analysis. 
  • The Cypher Query tab gives you the exact query we used to pull the data from IYP. You can reuse that to query directly the IYP database or craft your own queries.
  • The Metadata tab gives links to the original datasets and the freshness of the data. 

Directly querying IYP enables users to go beyond simple searches. The learning curve is quite steep, though. We recommend that interested readers first learn the Cypher language basics and then the IYP documentation and examples provided in the IYP gallery. Finally, to make the most of IYP, you can download and run the database locally, which is a good way to integrate and analyze your own data into IYP.

For more details about this work, see our research paper published in the ACM IMC’24 proceedings. You can also reach us on GitHub. The IYP is an open-source project; feedback and contributions are very welcome!

Romain Fontugne is a Deputy Director of IIJ Research Lab, Japan, who focuses on Internet measurements, traffic analysis and network security.


Photo by Katie, Via Flickr