Photo of a lady standing in front of arrival departure board at an airport

The Aleph (ℵ): Revealing the Internet’s Hidden Geography from the DNS

Picture of Kedar Thiagarajan
Guest Author | Northwestern University
Categories:
Twitter logo
LinkedIn logo
Facebook logo
November 27, 2025
In short
  • A new, large language model-guided system, The Aleph, can decode geographic data from DNS records.
  • The system indicates that 58% of operators encode location information; hints span over 6,000 cities in more than 200 countries.
  • An improved understanding of the geolocation of Internet infrastructure facilitates research & measurement, operational standards, and Public policy surrounding infrastructure equity, resilience, and concentration.

Locating the physical infrastructure behind the Internet — routers, servers, and other devices — seems like a straightforward problem, but in practice, it’s surprisingly difficult. 

One often-overlooked source of geographic information is the Domain Name System (DNS) reverse-lookup (PTR) records. Operators frequently embed location hints, including airport codes, city mnemonics, or internal site tags, into PTR hostnames, such as:

  • 108-71-80-115.lightspeed.chcgil.sbcglobal.net
  • et3-42-2.es02.ord001.ix.nflxvideo.net

These names are broadcast across the global DNS, hiding in plain sight. The challenge is decoding them reliably and at scale.

Introducing The Aleph

The Aleph is a large language model-guided system for extracting geographic data from PTR records. It does two things:

  1. Learns the format of PTR hostnames per operator, for example, pop-<CITY>-rtrN.example.net.
  2. Learns the geo-coding scheme each operator uses, such as airport codes (ORD), city mnemonics (nyk), or custom tags.

From these, it builds precise regex-based rules and reusable mappings that enable large-scale geolocation of network devices using only PTR strings.

Where traditional approaches rely on fragile, hand-tuned patterns or costly manual curation, The Aleph bootstraps rules automatically to:

  • Detect PTR formats by network
  • Learn how geo hints are embedded
  • Refine its patterns with more data
  • Output reusable rules that others can validate or extend

A Billion PTRs Later…

We ran The Aleph on a February 2024 snapshot of OpenIntel’s PTR corpus, targeting 2,646 autonomous systems (ASes) covering ~90% of all PTR records and ~84% of the Internet population. The full run took two days and cost ~$500 using GPT-4 Turbo (via rate-limited personal accounts).

The results:

  • 1.16 billion PTR records parsed
  • 224 million records mapped to cities (~19% coverage)
  • 4,910 regexes generated for 1,551 networks (58%)
  • 16,108 unique geo-hints mapped to 6,025 cities in 206 countries

These mappings reveal a remarkable diversity in operator encoding styles. While airport codes like JFK or ORD are common, the majority of hints — around two-thirds — are custom tags, such as Arelion’s nyk (New York) or ffm (Frankfurt). Even within one city, multiple schemes may coexist: Tokyo has more than a dozen; Chicago appears as chi, chgo, northlake, and various ord* forms.

This is precisely why a learned, per-provider approach beats one-size-fits-all heuristics.

How Accurate Is It?

We validated The Aleph using two independent methods:

  1. Ground truth from operators across access, transit, and cloud networks.
  2. RTT-based latency probing to ensure inferred locations fit expected network distances.

Both checks confirmed the system’s high accuracy, especially when patterns are learned per-provider and cross-validated.

Why This Matters

Better geolocation of Internet infrastructure enables:

  • Research & measurement: More accurate outage maps, routing analysis, and performance metrics.
  • Operations: Detection of naming inconsistencies, mislabels, and drift.
  • Public policy: Infrastructure equity, resilience, and concentration analysis, core themes for organizations like Internet Society Pulse.

We’re working to:

  • Release more open, provider-specific regexes and mappings
  • Expand coverage to long-tail ASes
  • Invite operators to publish or share their naming conventions or PTR samples

You can submit samples or try The Aleph’s API at thealeph.ai.

To learn more, read our paper, ‘The Aleph: Decoding Geographic Information from DNS PTR Records Using Large Language Models‘ (ACM CoNEXT 2025).

Kedar Thiagarajan is a Ph.D candidate at Northwestern University.

Contributors: Esteban Carisimo and Fabián E. Bustamante, Northwestern University. 

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of the Internet Society.


Image by Jan Vašek from Pixabay