A Case For Using Large Language Models to Automate Internet Measurement Research

26 May 2026

Guest Author | University of California Irvine

Categories:

Resilience

In short:

Measuring the impact of Internet disruptions requires using several tools and methods that have traditionally required intensive human analysis.
A new system, ArachNet, uses Large Language Model (LLM) agents to automate the process, reducing the time to analyse the causes and impact of disruptions.
Scenario testing showed ArachNet identifying the same number of affected IPs as the expert system across multiple countries

When a critical Internet disruption occurs, every minute counts. The September 2025 multi-cable cuts in the Red Sea, which severed 15 separate cable systems and disrupted an estimated 25% of all data traffic between Europe and Asia, are a stark reminder of this challenge.

During such events, network operators and researchers must rapidly identify dark paths, evaluate lost connectivity, and build workflows to assess the damage. However, solving this requires integrating multiple specialized measurement tools, each with its own data formats and interfaces, and building the workflows from scratch requires domain expertise.

At the University of California, Irvine (UCI), my colleagues and I have been working to change this. In this post, I introduce ArachNet, the first system to demonstrate that Large Language Model (LLM) agents can independently generate Internet measurement workflows that capture expert reasoning patterns, transforming a natural language question into a complete, executable analysis.

ArachNet: Automating the Reasoning Process

ArachNet is a first step toward democratizing network intelligence, assisting:

New researchers with tackling sophisticated analyses without deep specialization in each tool.
Experienced researchers with handling integration complexity, allowing them to focus on novel insights.
Network operators to rapidly compose diagnostic workflows during critical incidents.

The system treats Internet measurement as a compositional problem. Our key insight is that measurement workflow development (Figure 1) follows predictable patterns: problem analysis, solution design, implementation, and adaptation.

Infographic illustrating the architecture of the ArachNet system — Figure 1 — ArachNet design with four specialized agents supported by the Registry

We execute these phases through four specialized agents supported by the Registry:

The Registry (Foundation): The backbone of ArachNet is a curated catalog of measurement tool capabilities. Rather than exposing raw code to the agents (which often leads to hallucinations or implementation errors), the Registry provides a compact "measurement API." Each entry specifies a tool's capabilities, required inputs, expected outputs, and constraints. This allows agents to understand what a tool can do without being overwhelmed by how it does it.
QueryMind (The Architect): This agent decomposes natural language queries into structured sub-problems, surfacing hidden complexities such as infrastructure mapping or temporal correlations that experts instinctively identify.
WorkflowScout (The Designer): This agent explores the Registry to identify optimal tool combinations, comparing trade-offs in data requirements, computational complexity, and reliability to produce a coherent end-to-end workflow architecture.
SolutionWeaver (The Coder): This agent converts the architecture into executable Python code, handling format translation between heterogeneous tools and embedding automated quality checks, including consistency verification, result sanity checking, and uncertainty quantification, to ensure research-quality outputs.
RegistryCurator (The Historian): As successful workflows accumulate, this agent identifies reusable patterns and adds them back to the Registry, ensuring ArachNet's capabilities grow organically over time without constant manual intervention.

By default, ArachNet runs in "standard" mode for fully automated workflows. For researchers who prefer more control, an "expert" mode allows domain specialists to review and refine agent outputs between stages.

Validation Through Case Studies

To validate ArachNet, we built a prototype using specialized prompts based on Claude Sonnet 4 and evaluated it against progressively challenging Internet resilience scenarios, using Nautilus and Xaminer, both developed at UCI, as expert benchmarks.

Level 1: Replicating Expert Analysis

Tasked with identifying the country-level impact of a SeaMeWe-5 failure, ArachNet independently developed a processing pipeline of approximately 250 lines of code, closely following the Xaminer workflow without any prior architectural guidance.

Remarkably, ArachNet identified the same number of affected IPs as the expert system across multiple countries.

While the final impact percentages differed slightly because ArachNet used an IP-based metric rather than a link-based metric, it successfully executed the core data transformations designed by human experts: cable dependency identification, IP extraction, geographic mapping, and country-level aggregation, demonstrating that complex measurement reasoning can be systematically automated.

Level 2: Multi-Framework Orchestration

When asked to analyze the cascading effects of submarine cable failures between Europe and Asia, ArachNet orchestrated integration across four frameworks spanning infrastructure, topology, and temporal domains. It generated approximately 525 lines of code to connect cable mappings with impact analysis, implementing geographic filtering, leveraging AS dependency graphs for cascade propagation modeling, and integrating BGP and traceroute data for temporal evolution analysis. What would typically require days of manual coordination was automatically assembled into a seamless workflow.

Level 3: Forensic Root Cause Analysis

Given only the observation that a sudden latency increase had been seen from European probes to Asian destinations starting three days ago, ArachNet implemented a comprehensive forensic investigation, including statistical anomaly detection and suspect cable scoring, in approximately 750 lines of code. While final verification still benefits from human oversight, the system provided a robust analytical foundation and forensic starting point that matches the methodology and rigor of domain specialists.

Looking Ahead

While our case studies demonstrate functional equivalence to expert solutions in specific scenarios, a major open challenge lies in the verification and validation of generated workflows for novel queries where no expert ground truth exists.

A related challenge is how the system should handle conflicting outputs from different measurement tools.

The emergence of agent communication protocols such as MCP and A2A protocols also presents exciting opportunities to standardize how AI agents interact with measurement tools. By enabling seamless, plug-and-play interoperability, these protocols could allow ArachNet to integrate any new measurement framework without requiring developers to manually curate the Registry, potentially transforming ArachNet from a custom implementation into a highly scalable, standards-based ecosystem.

Ultimately, ArachNet demonstrates that the barriers to Internet measurement are not fundamental; they are compositional. The reasoning that experts apply is systematic, and systematic reasoning can be automated.

If you want to learn more about ArachNet, please read our paper presented at HotNets'25. ArachNet's prompts and case studies are open-sourced. Stay tuned to my website (linked below) for future updates about ArachNet.

Alagappan Ramanathan is a PhD candidate at the University of California, Irvine, and a 2023 Internet Society Pulse Research Fellow.

The views expressed by the authors of this blog post are their own and do not necessarily reflect the views of the Internet Society.

Image by ClaudiaWollesen from Pixabay

Tags: