Measuring Internet Censorship Without Volunteers or Vantage Points

Guest Author | University of Maryland, Internet Society Pulse Research Fellow (2023).

Categories:

November 29, 2023

Authoritarian regimes often censor websites for those within their borders, threatening free and open communication on the Internet. Measuring what is blocked, how censors operate, and how censorship changes over time is essential to understanding and circumventing these censorship efforts.

Unfortunately, it is challenging to measure such things, particularly in countries that restrict researchers’ ability to find endpoints—either volunteers, vantage points, or live servers.

Through my Pulse Research Fellowship, I am building on research that my colleagues and I at the University of Maryland and the University of Chicago have been working on to overcome several of these challenges and improve how the measurement community can longitudinally measure censorship without requiring assistance from within the country.

Apply now for the 2024 Pulse Research Fellowship and register for the 2023 Pulse Research Fellowship Review webinar.

Key points:

Measuring Internet censorship traditionally requires endpoints—either volunteers, vantage points, or live servers—within censored countries to test whether websites and/or services are accessible.
Our technique tricks censoring devices into believing we are testing within the censor’s borders.
Preliminary results show this technique can measure some countries that complementary measurement efforts cannot measure longitudinally, such as Brunei and Tajikistan.

Censorship is Traditionally Measured With Endpoints

There is currently a wide array of efforts—like OONI and CensoredPlanet—measuring Internet censorship by requesting URLs and observing whether the corresponding websites are accessible. A limitation of these projects is that they rely on finding endpoints within censored countries to conduct these requests.

This is further compounded in countries with small populations, highly repressive regimes, low Internet penetration rates, and poor Internet infrastructure. In such countries, even when such endpoints are available, they are often limited to only a small handful of measurements, followed by periods of no or low measurement—giving us only a glimpse of censorship at a single point in time.

Our Approach Measures Censorship From the Outside

To overcome this challenge, my colleagues and I developed a novel technique that can longitudinally measure censorship without requiring assistance from within the country. This technique takes advantage of two quirks with the ways some countries censor.

Many countries deploy bidirectional censorship that blocks censored traffic regardless of whether the request censors receive originates from inside or outside the country (Figure 1). This means we can have our clients measure the level of censorship by sending requests from outside the censored country to servers inside the censored country.

However, finding these public servers inside censored countries can be burdensome for the above reasons.

Infographic showing bidirectional censorship between client in USA and Server in Iran. — Figure 1 — Bidirectional censorship in Turkmenistan acting on traffic from outside the censoring regime.

Yet, we don’t always need a responsive server to trigger censorship bidirectionally because some censoring devices (middleboxes) exhibit TCP noncompliant behavior.

A middlebox is a computer networking device that can transform, inspect, filter, and manipulate Internet traffic — otherwise known as connection tampering — that is deemed restricted between clients and servers due to copyright infringement, corporate network interference, or Internet censorship.

Tricking Censors Into Censoring

A regular HTTP(S) censorship event occurs when a client connects to a live server with a TCP three-way handshake and sends a PSH+ACK packet that contains a request to a censored website. The censor sees this request and takes blocking action by either dropping or throttling the client’s traffic, sending a blockpage back to the client, or sending a reset packet (Figure 2), denoted as RST, back to the client to terminate the connection.

Figure 2 — Waterfall diagram of HTTP(S) censorship via an RST packet.

The TCP three-way handshake requires a response from the server—a SYN+ACK packet. However, our goal is to measure censorship in places where there isn’t a server at all.

To achieve our goal, we rely on the fact that censors are expected to miss some packets within a connection due to asymmetric routes, load balancing, and heavy traffic. For example, a censor may miss the ACK packet sent from a client to the server in a TCP three-way handshake.

When the client sends a subsequent PSH+ACK packet with a censored domain, we would expect the censor to disregard the packet as, from the censor’s perspective, there is no ongoing connection since the client and the server did not conduct a TCP three-way handshake. Yet many censors still take blocking action, such as sending a RST packet back to the client (Figure 3). Many censors are, therefore, not fully TCP compliant. They rely only on presumption, not confirmation, of an ongoing connection to block a censored request.

Figure 3 — Waterfall diagram of TCP noncompliance behavior of a censor.

This means we can craft packet sequences that trigger censorship without requiring any live servers to complete the TCP handshake to trigger the censor.

Continuing the previous example, the client can send a SYN packet followed by a PSH+ACK packet to a non-responsive IP address to trigger the censor (Figure 4).

Figure 4 — Waterfall diagram of triggering HTTP(S) censorship without live servers.

This means we can now measure censorship in networks that have no participants. Due to bidirectional censorship, we can send these packet sequences from clients we control outside the censoring country. Moreover, we can direct our censorship measurements to non-responsive IP addresses with no users or machines behind them, mitigating potential user risks and ethical concerns regarding connections to live machines.

Automating the Process

The SYN followed by a PSH+ACK packet sequence is one of many that trigger some censors. However, it is not a standard packet sequence that will successfully trigger censorship in all censored regimes. Therefore, we must discover which packet sequences trigger censoring middleboxes across different censoring regimes.

In my initial attempt to apply this technique, my colleagues and I studied censorship in Turkmenistan—a notoriously difficult country to measure from within, given its low Internet penetration and extremely harsh laws about Internet use. I attempted to trigger the censoring middleboxes within the country by manually crafting packet sequences. I discovered that sending a SYN followed by a PSH+ACK packet twice, separated by 5-29 seconds between packets, successfully triggered censorship.

While encouraging, these results took considerable manual effort, which will not scale to different countries or ISPs within the same country.

As part of my Pulse Research Fellowship, I am developing techniques that automate the discovery of censorship-triggering packet sequences, allowing us to measure censorship in many countries worldwide that are out of reach for traditional measurement techniques.

To do so, I plan to use Geneva—an open-source genetic algorithm that trains against live censors to discover packet sequences that evade censorship. However, instead of having Geneva evade censorship, I plan to modify it to have it discover packet sequences that trigger censorship instead. This will include adding new capabilities to Geneva. For example, Geneva would not have been able to find the packet sequence used to trigger censorship in Turkmenistan as Geneva does not support breaks between sending packets.

New Method Will Allow us to Study Censorship in Overlooked Countries

For this censorship measurement technique to work, we need both bidirectional censorship and a censoring device that can be tricked into censoring with specially crafted packet sequences. So far, we’ve found:

Belarus, Brunei, China, Iran, Libya, Russia, Tajikistan, and Uzbekistan are censoring bidirectionally.
Sending a PSH+ACK packet twice successfully triggers censorship in Tajikistan, while the SYN followed by a PSH+ACK packet sequence is sufficient for the other countries.
Burundi, Equatorial Guinea, Kyrgyzstan, and Myanmar are not censoring bidirectionally, so we cannot study them with this technique.

We are in the process of studying more countries that have long been overlooked to understand what domains get censored, how homogeneous the censorship policies throughout a given country are, how censorship policies differ across regions of the world, and how censorship changes over time.

If you’d like to learn more, read our extended abstract and paper on our study of Turkmenistan.

And stay up to date with future developments via our website.

Sadia Nourin is a Computer Science Masters Student at the University of Maryland and Pulse Research Fellow.

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of the Internet Society.

Photo by Bob Shand Via Flickr.

Measuring Internet Censorship Without Volunteers or Vantage Points

Censorship is Traditionally Measured With Endpoints

Our Approach Measures Censorship From the Outside

Tricking Censors Into Censoring

Automating the Process

New Method Will Allow us to Study Censorship in Overlooked Countries

Recent Posts

What if…? Answering the Challenges of Measuring the Internet

EnergySHR: A Platform for Open-Source Data about Energy Transition Infrastructure, and People

Characterizing Internet Centralization vs Regionalization

Seeing the Unseen Internet: Lessons from the Internet Visualization Exhibition

Exploring the Potential of RPKI Signed Checklists: The Results Are In

Apply Now for the 2026 Pulse Research Fellowship and Mentorship