Internet Resilience Index Methodology

Introduction

About the Index

The Internet plays a critical role in society today. Unfortunately, not all countries are on a level playing field with regards to a resilient Internet infrastructure. Many low-income countries have under-provisioned networks and cable infrastructure, or they lack redundant interconnection systems. In these countries (or regions), the likelihood of Internet outages occurring is much higher than in other countries.

Measuring Internet resilience is not an easy task as there are several building blocks underpinning the Internet’s complex infrastructure. Additionally, the Internet landscape varies considerably around the world and to be able to objectively compare countries - on a common ground - there needs to be an objective set of metrics that track and record the different components that contribute to the resiliency of the Internet.

To achieve this task, the Internet Society created the Pulse Internet Resilience Index (IRI). This document outlines the approach used to build the Index, the selection of indicators and the underlying data sources, the weighting scheme, and the aggregation and imputation methods used.

The Four Pillars of a Resilient Internet Ecosystem

To grasp the multi-faceted nature of the Internet, the Index is built on four main pillars, which together contribute to the smooth operation of the Internet. The pillars are:

  1. Infrastructure: The existence and availability of physical infrastructure that provides Internet connectivity.
  2. Performance: The ability of the network to provide end-users with seamless and reliable access to Internet services.
  3. Security: The ability of the network to resist intentional or unintentional disruptions through the adoption of security technologies and best practices.
  4. Market Readiness: the ability of the market to self-regulate and provide affordable services to end-users as part of a diverse and competitive market.

The Internet Society Pulse IRI is built using existing best practices according to the Handbook on Constructing Composite Indicators of the European Commission Joint Research Centre and the OECD. The Pulse IRI adopts a similar methodology to other extant indices such as the GSMA Mobile Connectivity Index, the Facebook/EIU Inclusive Internet Index and the Web Foundation Web Index.

Data Sourcing

Selecting Indicators

Building a robust composite indicator requires careful selection of the underlying indicators. To date, there are no direct and readily available metrics that provide information about the Internet resilience of a network or a country. In the Internet Society Pulse IRI framework, the indicators selected are reflective of a specific aspect of resilience that needs to be quantified. The OECD/JRC handbook provides some guidance on the main characteristics to consider when selecting the indicators. In essence, they should be accurate, timely, and should cover as many countries as possible. Additionally, the Internet Society Pulse IRI relies exclusively on quantitative indicators as opposed to qualitative ones such as perception of Service Quality. This is to ensure that there is an objective set of metrics that can be used to make comparisons between countries.

Selection Criteria

The following criteria were used when selecting the indicators:

  • Relevance: The indicator should work towards showing an increase or decrease in the resilience of the Internet in a selected country.
  • Accuracy: The indicator should correctly estimate or describe the quantities or characteristics they are designed to measure.
  • Coverage: The data should cover as many countries as possible, as the Index is intended to be a global index.
  • Freshness: Any dataset should be at most two years old. Some datasets such as performance or network coverage should be recent. Some other datasets such as EGDI do not change much from one year to the next, so it is acceptable to use these datasets even when a year or two old.
  • Continuity: To objectively compare the index over the years, it is important to work with a stable list of indicators, which will provide data consistently over time.

Types of Indicators

There are three main types of indicators that have been used to calculate the Internet Society Pulse IRI:

  1. Direct indicator: A direct indicator is a direct measure of an aspect of resilience e.g., percentage of HTTPS adoption, latency, bandwidth, etc. They have a specific unit of measurement, and the raw value can be on different scales depending on what is being measured.
  2. Composite indicator: A composite indicator provides a score, which itself has been derived from multiple other variables. Examples are the MANRS score, EGDI index, etc. The scale of a composite indicator is usually between 0 and 100.
  3. Proxy indicator: A proxy is used where it is difficult to find a specific metric to measure an aspect of resilience. Proxies can be either direct or composite indicators. For example, the IRI uses “Number of IXPs” and “Number of datacenters” as proxy indicators for the robustness of the local infrastructure.

Orientation of Indicators

An indicator can either be positive or negative. In the Internet Society Pulse IRI framework, both positive and negative indicators are used either individually or in combination with other indicators to characterise overall levels of resilience. An example of a positive indicator is "Number of secure Internet servers" as the higher the number the more secure the network will be. Conversely, "% of spam infections" is a negative indicator, as the higher the percentage, the less secure the underlying networks are.

Details of Some Indicators

Network Performance

Network performance data relating to bandwidth, latency and jitter is collected from the monthly Ookla Speedtest Global Index. It contains measurements about fixed and mobile network performance around the world. The median download, upload, latency and jitter values are calculated by country.

Upstream Redundancy

The Upstream Redundancy is the average number of IPv4 upstream providers by active Autonomous Systems (ASes) in the country. The higher the number of upstream providers per AS, the more resilient the overall ecosystem is. The CAIDA AS-Relationship dataset is used to infer the provider to customer relationship.

Peering Efficiency

The Peering Efficiency score of a country is calculated by taking the number of local networks peering at IXPs in that country and dividing it by the number of local and active (seen on the global routing table) networks in that country. PeeringDB provides data about IXP peers and RIPEstat provides data about active networks.

$$PE_c= \frac{\sum P_i}{A}$$

Where:

$$PE_c = \text{Peering Efficiency of country c}$$ $$P_i = \text{Local ASes peering at IXP i}$$ $$A = \text{Number of active ASes for country c}$$

Market Concentration

The Internet Society Pulse IRI uses the Herfindahl-Hirschman Index (HHI) to calculate the market concentration score. APNIC ASPOP statistics provide market share information by AS and by country. We aggregate this data by organisation using as2org+. The HHI has a range between 0 and 10,000 where 0 means no concentration (a competitive market) and 10,000 means only one ASN is present i.e., with 100% market share.

$$HHI_c = s_1^2 + s_2^2 + s_3^2 + \; ... \; s_n^2 $$

Where:

$$HHI_c = \text{HHI of country c}$$ $$s_n = \text{market share (\%) of }ASN_n \; \text{of country c}$$

Upstream Provider Diversity

Diversity of upstream providers is an important element to measure as it indicates the extent to which the relationships of a given network are concentrated on a single network or group of networks. At a country-level, there are specific network operators providing international access and the more diverse the number of upstream Internet providers, the more resilient the country is in terms of network dependency.

The notion of network dependency can be proxied using AS Hegemony which is a score given to a network to quantify its centrality as observed by BGP monitors. AS hegemony ranges between 0 and 1 and can be interpreted as the average fraction of paths crossing a node. The higher the AS Hegemony score, the higher the dependency on that specific network.

Each network in a country has an AS Hegemony score based on how central it is for other networks in the same country. To calculate the diversity of the upstream provider distribution at a country-level, we use the HHI again. In a perfectly diverse scenario (HHI = 0), all networks would have the same AS Hegemony score. A high HHI value means that a small number of providers are dominant in the market for upstream Internet connectivity.

List of Indicators

Table 1 shows the list of indicators, the unit of measure and the source of the information.

Table 1. List of Indicators
Indicator Description Unit Source
Network Coverage Mobile network coverage includes 2G/3G/4G with a composite score provided by the GSMA Score (0 - 100) GSMA
Spectrum Allocation Spectrum allocation (composite score) Score (0 - 100) GSMA
Number of IXPs Number of IXPs per city where city has population > 300,000 for countries with population of <=20,000,000 and city has population > 1,000,000 otherwise. # of IXPs per city PeeringDB
Datacenters Number of datacenters # of datacenter per 10 million population PeeringDB
Mobile / Fixed Latency Median latency observed to the nearest Ookla server ms Ookla
Mobile / Fixed Jitter Median jiter observed to the nearest Ookla server ms Ookla
Mobile / Fixed Upload Speed Median upload throughput measured to the nearest Ookla server Mbps Ookla
Mobile / Fixed Download Speed Median download throughput measured to the nearest Ookla server Mbps Ookla
IPv6 IPv6 enabled end users % of IPv6 adoption Akamai, Facebook, Google, APNIC
HTTPS Pageloads using HTTPS % of page loads using HTTPS Mozilla
DNSSEC Validation Users validating DNSSEC % of users validating DNSSEC APNIC
DNSSEC Adoption Is the ccTLD DNSSEC signed? True or False DNS
MANRS Readiness MANRS score (filtering, global coordination, IRR, RPKI) Score (0 - 100) MANRS Observatory
Upstream Redundancy Average number of upstream IPv4 providers for a countries routed ASNs Score (0 - 100) CAIDA, NRO, RIPEstat
Secure Internet Servers Number of secure Internet servers detected on the country's networks # of secure servers per 1000 population World Bank
Global Cybersecurity Index Global Cybersecurity Index (Composite score) Score (0 - 100) ITU
DDoS Potential Potential DDoS threat a country represents Percentage Cybergreen
Affordability Mobile data and voice low-consumption basket. The basket is based on a monthly usage of a minimum of 70 voice minutes, 20 SMSs and 500 MB of data using at least 3G technology. % of GNI per capita ITU DataHub
Market Concentration Herfindahl-Hirschman Index (HHI) calculates the market concentration based on market share information per network Score (0 - 10000) APNIC, PeeringDB, CAIDA
Upstream Provider Diversity Herfindahl-Hirschman Index (HHI) calculated over the marketshare of transit networks with marketshare greater than 1% Score (0 - 10000) IIJ
Peering Efficiency Ratio of networks peering at IXPs vs routed ASes in a country Percentage PeeringDB, RIPEstat
Domain Count Domains registered by ccTLD # of domains per ccTLD per 1000 population DomainTools
EGDI E-Government Development Index Index (0 - 100) UN

Data Processing

Raw data comes in different forms and shapes and usually comes with several artifacts - some datasets are normally distributed, while some others are skewed. Before running any calculation or aggregation we need to impute for missing data and identify and handle outliers.

Missing Data

The following techniques have been used to impute missing data:

Table 2. Data imputation
Indicator Technique Details
Affordability Substitution We replace missing values with data from adjacent years
Fixed / Mobile Internet Performance Substitution We substitute mobile data for fixed data and vice-versa where values are otherwise unavailable
Maket Concentration Backward fill Initial gaps in data are filled with first available datapoints
Fixed / Mobile Internet Performance, HTTPS Adoption, Market Concentration, Secure Internet Servers Forward fill Gaps in data are filled with most recent earlier datapoints
IPv6 Substitution We impute a value of 0 where datapoints are otherwise unavailable
Spectrum Allocation, Network Coverage Substitution Replacement by data from a country from the same region with similar GDP per capita

Re-scaling and Treating Outliers

The scales used by the indicators are also different e.g., latency can range between 0 – 500ms, while domain count for a ccTLD can range between 0 – 2,000,000. It is important to scale the data so that indicators are comparable to one another, and to avoid the issue of the size of the country (i.e., larger countries in terms of population or GDP tend to have more networks, IXPs, datacenters, etc.).

On the other hand, outliers have the tendency to skew the data and can therefore have an impact on the overall score calculation, especially because Internet Society Pulse IRI uses the min-max normalization method to scale the data (see section on Min-Max Normalization below). If an indicator has a very high or very low value, this will be reflected in the min-max calculation.

The following transformations have been applied to the listed indicators as part of the framework:

  1. Denomination by population size: Number of datacenters, Number of domains
  2. Denomination by number of cities: Number of IXPs
  3. Log transformation*: Secure Internet Servers, Fixed/Mobile Internet performance

* A logarithmic transformation is useful to treat skewed datasets and to discard extreme values. Not only does it scale the data, but it has the advantage of handling outliers in the dataset. Log transformation preserves the differences between the values.

After scaling and transforming the above indicators, we run a check on the skewness and kurtosis values of the remaining indicators. For those having a skewness > 2 or kurtosis > 3.5 (general thresholds for outlier detection), the IRI makes use of the IQR (Interquartile Range: Q3 - Q1) method to trim down outliers. The following rules are applied:

  • Any value greater than Q3 + 1.5*IQR is replaced by Q3 + 1.5*IQR
  • Any value less than Q1 – 1.5*IQR is replaced by Q1 – 1.5*IQR

Min-Max Normalization

The next step, after cleaning and transforming the data, is normalization. Normalization is important because indicators are collected using different units of measurement (percentage, ms, Mbps, count, etc.). It is therefore important to rebase them to a common unit such as a 0 - 100 scale, where 100 usually refers to the strongest and 0 to the weakest value.

The method chosen was the min-max normalization which is a common technique used by multiple known indices and as opposed to other techniques such as ranking and categorical scales, min-max keeps the interval between the countries consistent.

Below are the formula Internet Society Pulse IRI uses to calculate the value of an indicator depending on whether it is positive or negative:

$$\text{Positive indicator:}\,\,\,I_{k,c} = \frac{x_{k,c} - Min(x_k)}{Max(x_k) - Min(x_k)}$$ $$\text{Negative indicator:}\,\,\,I_{k,c} = 1 - \frac{x_{k,c} - Min(x_k)}{Max(x_k) - Min(x_k)}$$
$$\text{where}\; x\; \text{refers to the raw value of the indicator}\; k\; \text{of country} \; c\; \text{and}\;I\;\text{refers to the normalized value.}$$ $$Max / Min(x_k)\;\text{refers to the min/max of indicator}\;k\;\text{for all countries.}$$

Positive indicators contribute towards increasing an index, negative indicators contribute to a decrease in the score, which is why we take the delta:

$$(1 - I_{k,c})$$

We chose not to use the z-score standardization technique (this technique standardizes around the mean value and ranges between 0 and 1) as not all the indicators followed a normal distribution.

Finally, the IRI only includes countries for which we have data (after imputation etc.) for all indicators and for every quarter since 2019 Q1.

Weighting and Aggregation

Assigning Weights

There are two main ways to aggregate the normalized indicators into a final score using:

  1. An ad-hoc weighting scheme.
  2. Statistical (optimization) techniques.

The Internet Society Pulse IRI uses a weighting scheme as it is the simpler technique of the two and relies on input that the Internet Society gathered through survey and discussions with subject matter experts.

During the weighting process, the importance of the indicator was also considered using a lifecycle approach. For example, for the Performance pillar, the following weights were assigned to the underlying dimensions: Fixed networks (40%) and Mobile networks (60%). Higher importance was given to mobile networks as they are more widely relied upon for Internet access from a global perspective.

In the Internet Society Pulse IRI framework, the indicators are grouped into different dimensions, and the dimensions into pillars, which provide their own quantitative measures of a specific aspect of Internet resilience. Below is a table showing the indicators, dimensions and pillars and their associated weights, used for the calculation of the Internet Society Pulse IRI.

The weights are revisited on an annual basis.

Table 3. Indicators, dimensions and pillars and associated weights
Pillar Weight (%) Dimension Weight (%) Indicator Weight (%)
Infrastructure 25 Mobile connectivity 50 Network Coverage 70
Spectrum Allocation 30
Enabling infrastructure 50 Number of IXPs 50
Datacenters 50
Performance 25 Fixed networks 40 Latency 20
Upload 30
Download 30
Jitter 20
Mobile networks 60 Latency 20
Upload 30
Download 30
Jitter 20
Enabling technologies and security 25 Enabling technologies 20 IPv6 30
HTTPS 70
DNS ecosystem 30 DNSSEC Validation 50
DNSSEC Adoption 50
Routing hygiene 30 MANRS Readiness 50
Upstream Redundancy 50
Security threat 20 Secure Internet Servers 30
Global Cybersecurity Index 40
DDoS Potential 30
Local ecosystem & Market readiness 25 Market structure 50 Affordability 40
Market concentration 30
Upstream provider diversity 30
Traffic localization 50 Peering efficiency 40
Domain count 30
EGDI 30

Aggregation

The Internet Society Pulse IRI uses a weighted sum formula at each level (indicator, dimension, and pillar) to aggregate the data into a composite score. The following formula was used:

$$IRI_c = \sum_i^n(w_i \cdot P_{i,c})$$

Where:

$$P_{i,c}=\sum_i^n(w_i \cdot D_{i,c})$$

And where:

$$D_{i,c}=\sum_i^n(w_i \cdot I_{i,c})$$

In simple terms, the final index 𝐼𝑅𝐼 of country "c" is the sum of the weighted pillars "P". A pillar is the weighted sum of the underlying dimensions "D" and a dimension is the weighted sum of the indicators "I", all of country "c".

Feedback

For any questions, comments, and feedback on the Internet Society Pulse IRI, please contact the Internet Society Pulse team ([email protected]).

Acknowledgements

The Internet Society would like to thank the following contributors for their valuable input to the conception of the Internet Society Pulse Internet Resilience Index (IRI). Amreesh Phokeer (Internet Society), Kevin Chege (Internet Society), Assane Gueye (Carnegie Mellon University-Africa), Josiah Chavula (University of Cape Town), and Ahmed Elmokashfi (Simula Research Lab).