Internet Resilience Index Methodology

Introduction

About the Index

The Internet plays a critical role in society today. Unfortunately, not all countries are on a level playing field with regards to a resilient Internet infrastructure. Many low-income countries have under-provisioned networks and cable infrastructure, or they lack redundant interconnection systems. In these countries (or regions), the likelihood of Internet outages occurring is much higher than in other countries.

Measuring Internet resilience is not an easy task as there are several building blocks underpinning the Internet’s complex infrastructure. Additionally, the Internet landscape varies considerably around the world and to be able to objectively compare countries - on a common ground - there needs to be an objective set of metrics that track and record the different components that contribute to the resiliency of the Internet.

To achieve this task, the Internet Society created the Pulse Internet Resilience Index (IRI). This document outlines the approach used to build the Index, the selection of indicators and the underlying data sources, the weighting scheme, and the aggregation and imputation methods used.

The Four Pillars of a Resilient Internet Ecosystem

To grasp the multi-faceted nature of the Internet, the Index is built on four main pillars, which together contribute to the smooth operation of the Internet. The pillars are:

Infrastructure: The existence and availability of physical infrastructure that provides Internet connectivity.
Performance: The ability of the network to provide end-users with seamless and reliable access to Internet services.
Security: The ability of the network to resist intentional or unintentional disruptions through the adoption of security technologies and best practices.
Market Readiness: the ability of the market to self-regulate and provide affordable services to end-users as part of a diverse and competitive market.

The Internet Society Pulse IRI is built using existing best practices according to the Handbook on Constructing Composite Indicators of the European Commission Joint Research Centre and the OECD. The Pulse IRI adopts a similar methodology to other extant indices such as the GSMA Mobile Connectivity Index, the Facebook/EIU Inclusive Internet Index and the Web Foundation Web Index.

Data Sourcing

Selecting Indicators

Building a robust composite indicator requires careful selection of the underlying indicators. To date, there are no direct and readily available metrics that provide information about the Internet resilience of a network or a country. In the Internet Society Pulse IRI framework, the indicators selected are reflective of a specific aspect of resilience that needs to be quantified. The OECD/JRC handbook provides some guidance on the main characteristics to consider when selecting the indicators. In essence, they should be accurate, timely, and should cover as many countries as possible. Additionally, the Internet Society Pulse IRI relies exclusively on quantitative indicators as opposed to qualitative ones such as perception of Service Quality. This is to ensure that there is an objective set of metrics that can be used to make comparisons between countries.

Selection Criteria

The following criteria were used when selecting the indicators:

Relevance: The indicator should work towards showing an increase or decrease in the resilience of the Internet in a selected country.
Accuracy: The indicator should correctly estimate or describe the quantities or characteristics they are designed to measure.
Coverage: The data should cover as many countries as possible, as the Index is intended to be a global index.
Freshness: Any dataset should be at most two years old. Some datasets such as performance or network coverage should be recent. Some other datasets such as EGDI do not change much from one year to the next, so it is acceptable to use these datasets even when a year or two old.
Continuity: To objectively compare the index over the years, it is important to work with a stable list of indicators, which will provide data consistently over time.

Types of Indicators

There are three main types of indicators that have been used to calculate the Internet Society Pulse IRI:

Direct indicator: A direct indicator is a direct measure of an aspect of resilience e.g., percentage of HTTPS adoption, latency, bandwidth, etc. They have a specific unit of measurement, and the raw value can be on different scales depending on what is being measured.
Composite indicator: A composite indicator provides a score, which itself has been derived from multiple other variables. Examples are the MANRS score, EGDI index, etc. The scale of a composite indicator is usually between 0 and 100.
Proxy indicator: A proxy is used where it is difficult to find a specific metric to measure an aspect of resilience. Proxies can be either direct or composite indicators. For example, the IRI uses “Number of IXPs” and “Number of datacenters” as proxy indicators for the robustness of the local infrastructure.

Orientation of Indicators

An indicator can either be positive or negative. In the Internet Society Pulse IRI framework, both positive and negative indicators are used either individually or in combination with other indicators to characterise overall levels of resilience. An example of a positive indicator is "Number of secure Internet servers" as the higher the number the more secure the network will be. Conversely, "% of spam infections" is a negative indicator, as the higher the percentage, the less secure the underlying networks are.

Details of Some Indicators

Network Performance

Network performance data relating to bandwidth, latency and jitter is collected from the monthly Ookla Speedtest Global Index. It contains measurements about fixed and mobile network performance around the world. The median download, upload, latency and jitter values are calculated by country.

Upstream Redundancy

The Upstream Redundancy is the average number of IPv4 upstream providers by active Autonomous Systems (ASes) in the country. The higher the number of upstream providers per AS, the more resilient the overall ecosystem is. The CAIDA AS-Relationship dataset is used to infer the provider to customer relationship.

Peering Efficiency

The Peering Efficiency score of a country is calculated by taking the number of local networks peering at IXPs in that country and dividing it by the number of local and active (seen on the global routing table) networks in that country. PeeringDB provides data about IXP peers and RIPEstat provides data about active networks.

$$PE_c= \frac{\sum P_i}{A}$$

Where:

$$PE_c = \text{Peering Efficiency of country c}$$ $$P_i = \text{Local ASes peering at IXP i}$$ $$A = \text{Number of active ASes for country c}$$

Market Concentration

The Internet Society Pulse IRI uses the Herfindahl-Hirschman Index (HHI) to calculate the market concentration score. APNIC ASPOP statistics provide market share information by AS and by country. We aggregate this data by organisation using as2org+. The HHI has a range between 0 and 10,000 where 0 means no concentration (a competitive market) and 10,000 means only one ASN is present i.e., with 100% market share.

$$HHI_c = s_1^2 + s_2^2 + s_3^2 + \; ... \; s_n^2 $$

Where:

$$HHI_c = \text{HHI of country c}$$ $$s_n = \text{market share (\%) of }ASN_n \; \text{of country c}$$

Upstream Provider Diversity

Diversity of upstream providers is an important element to measure as it indicates the extent to which the relationships of a given network are concentrated on a single network or group of networks. At a country-level, there are specific network operators providing international access and the more diverse the number of upstream Internet providers, the more resilient the country is in terms of network dependency.

The notion of network dependency can be proxied using AS Hegemony which is a score given to a network to quantify its centrality as observed by BGP monitors. AS hegemony ranges between 0 and 1 and can be interpreted as the average fraction of paths crossing a node. The higher the AS Hegemony score, the higher the dependency on that specific network.

Each network in a country has an AS Hegemony score based on how central it is for other networks in the same country. To calculate the diversity of the upstream provider distribution at a country-level, we use the HHI again. In a perfectly diverse scenario (HHI = 0), all networks would have the same AS Hegemony score. A high HHI value means that a small number of providers are dominant in the market for upstream Internet connectivity.

List of Indicators

Table 1 shows the list of indicators, the unit of measure and the source of the information.

Table 1. List of Indicators
Indicator	Description	Unit	Source
Network Coverage	Mobile network coverage includes 2G/3G/4G with a composite score provided by the GSMA	Score (0 - 100)	GSMA
Spectrum Allocation	Spectrum allocation (composite score)	Score (0 - 100)	GSMA
Number of IXPs	Number of IXPs per city where city has population > 300,000 for countries with population of <=20,000,000 and city has population > 1,000,000 otherwise.	# of IXPs per city	PeeringDB
Datacenters	Number of datacenters	# of datacenter per 10 million population	PeeringDB
Mobile / Fixed Latency	Median latency observed to the nearest Ookla server	ms	Ookla
Mobile / Fixed Jitter	Median jiter observed to the nearest Ookla server	ms	Ookla
Mobile / Fixed Upload Speed	Median upload throughput measured to the nearest Ookla server	Mbps	Ookla
Mobile / Fixed Download Speed	Median download throughput measured to the nearest Ookla server	Mbps	Ookla
IPv6	IPv6 enabled end users	% of IPv6 adoption	Akamai, Facebook, Google, APNIC
HTTPS	Pageloads using HTTPS	% of page loads using HTTPS	Mozilla
DNSSEC Validation	Users validating DNSSEC	% of users validating DNSSEC	APNIC
DNSSEC Adoption	Is the ccTLD DNSSEC signed?	True or False	DNS
MANRS Readiness	MANRS score (filtering, global coordination, IRR, RPKI)	Score (0 - 100)	MANRS Observatory
Upstream Redundancy	Average number of upstream IPv4 providers for a countries routed ASNs	Score (0 - 100)	CAIDA, NRO, RIPEstat
Secure Internet Servers	Number of secure Internet servers detected on the country's networks	# of secure servers per 1000 population	World Bank
Global Cybersecurity Index	Global Cybersecurity Index (Composite score)	Score (0 - 100)	ITU
DDoS Potential	Potential DDoS threat a country represents	Percentage	Cybergreen
Affordability	Mobile data and voice low-consumption basket. The basket is based on a monthly usage of a minimum of 70 voice minutes, 20 SMSs and 500 MB of data using at least 3G technology.	% of GNI per capita	ITU DataHub
Market Concentration	Herfindahl-Hirschman Index (HHI) calculates the market concentration based on market share information per network	Score (0 - 10000)	APNIC, PeeringDB, CAIDA
Upstream Provider Diversity	Herfindahl-Hirschman Index (HHI) calculated over the marketshare of transit networks with marketshare greater than 1%	Score (0 - 10000)	IIJ
Peering Efficiency	Ratio of networks peering at IXPs vs routed ASes in a country	Percentage	PeeringDB, RIPEstat
Domain Count	Domains registered by ccTLD	# of domains per ccTLD per 1000 population	DomainTools
EGDI	E-Government Development Index	Index (0 - 100)	UN

Data Processing

Raw data comes in different forms and shapes and usually comes with several artifacts - some datasets are normally distributed, while some others are skewed. Before running any calculation or aggregation we need to impute for missing data and identify and handle outliers.

Missing Data

The following techniques have been used to impute missing data:

Table 2. Data imputation
Indicator	Technique	Details
Affordability	Substitution	We replace missing values with data from adjacent years
Fixed / Mobile Internet Performance	Substitution	We substitute mobile data for fixed data and vice-versa where values are otherwise unavailable
Maket Concentration	Backward fill	Initial gaps in data are filled with first available datapoints
Fixed / Mobile Internet Performance, HTTPS Adoption, Market Concentration, Secure Internet Servers	Forward fill	Gaps in data are filled with most recent earlier datapoints
IPv6	Substitution	We impute a value of 0 where datapoints are otherwise unavailable
Spectrum Allocation, Network Coverage	Substitution	Replacement by data from a country from the same region with similar GDP per capita

Re-scaling and Treating Outliers

The scales used by the indicators are also different e.g., latency can range between 0 – 500ms, while domain count for a ccTLD can range between 0 – 2,000,000. It is important to scale the data so that indicators are comparable to one another, and to avoid the issue of the size of the country (i.e., larger countries in terms of population or GDP tend to have more networks, IXPs, datacenters, etc.).

On the other hand, outliers have the tendency to skew the data and can therefore have an impact on the overall score calculation, especially because Internet Society Pulse IRI uses the min-max normalization method to scale the data (see section on Min-Max Normalization below). If an indicator has a very high or very low value, this will be reflected in the min-max calculation.

The following transformations have been applied to the listed indicators as part of the framework:

Denomination by population size: Number of datacenters, Number of domains
Denomination by number of cities: Number of IXPs
Log transformation*: Secure Internet Servers, Fixed/Mobile Internet performance

* A logarithmic transformation is useful to treat skewed datasets and to discard extreme values. Not only does it scale the data, but it has the advantage of handling outliers in the dataset. Log transformation preserves the differences between the values.

After scaling and transforming the above indicators, we run a check on the skewness and kurtosis values of the remaining indicators. For those having a skewness > 2 or kurtosis > 3.5 (general thresholds for outlier detection), the IRI makes use of the IQR (Interquartile Range: Q3 - Q1) method to trim down outliers. The following rules are applied:

Any value greater than Q3 + 1.5*IQR is replaced by Q3 + 1.5*IQR
Any value less than Q1 – 1.5*IQR is replaced by Q1 – 1.5*IQR

Min-Max Normalization

The next step, after cleaning and transforming the data, is normalization. Normalization is important because indicators are collected using different units of measurement (percentage, ms, Mbps, count, etc.). It is therefore important to rebase them to a common unit such as a 0 - 100 scale, where 100 usually refers to the strongest and 0 to the weakest value.

The method chosen was the min-max normalization which is a common technique used by multiple known indices and as opposed to other techniques such as ranking and categorical scales, min-max keeps the interval between the countries consistent.

Below are the formula Internet Society Pulse IRI uses to calculate the value of an indicator depending on whether it is positive or negative:

$$\text{Positive indicator:}\,\,\,I_{k,c} = \frac{x_{k,c} - Min(x_k)}{Max(x_k) - Min(x_k)}$$ $$\text{Negative indicator:}\,\,\,I_{k,c} = 1 - \frac{x_{k,c} - Min(x_k)}{Max(x_k) - Min(x_k)}$$

$$\text{where}\; x\; \text{refers to the raw value of the indicator}\; k\; \text{of country} \; c\; \text{and}\;I\;\text{refers to the normalized value.}$$ $$Max / Min(x_k)\;\text{refers to the min/max of indicator}\;k\;\text{for all countries.}$$

Positive indicators contribute towards increasing an index, negative indicators contribute to a decrease in the score, which is why we take the delta:

$$(1 - I_{k,c})$$

We chose not to use the z-score standardization technique (this technique standardizes around the mean value and ranges between 0 and 1) as not all the indicators followed a normal distribution.

Finally, the IRI only includes countries for which we have data (after imputation etc.) for all indicators and for every quarter since 2019 Q1.

Weighting and Aggregation

Assigning Weights

There are two main ways to aggregate the normalized indicators into a final score using:

An ad-hoc weighting scheme.
Statistical (optimization) techniques.

The Internet Society Pulse IRI uses a weighting scheme as it is the simpler technique of the two and relies on input that the Internet Society gathered through survey and discussions with subject matter experts.

During the weighting process, the importance of the indicator was also considered using a lifecycle approach. For example, for the Performance pillar, the following weights were assigned to the underlying dimensions: Fixed networks (40%) and Mobile networks (60%). Higher importance was given to mobile networks as they are more widely relied upon for Internet access from a global perspective.

In the Internet Society Pulse IRI framework, the indicators are grouped into different dimensions, and the dimensions into pillars, which provide their own quantitative measures of a specific aspect of Internet resilience. Below is a table showing the indicators, dimensions and pillars and their associated weights, used for the calculation of the Internet Society Pulse IRI.

The weights are revisited on an annual basis.

Table 3. Indicators, dimensions and pillars and associated weights
Pillar	Weight (%)	Dimension	Weight (%)	Indicator	Weight (%)
Infrastructure	25	Mobile connectivity	50	Network Coverage	70
				Spectrum Allocation	30
		Enabling infrastructure	50	Number of IXPs	50
				Datacenters	50
Performance	25	Fixed networks	40	Latency	20
				Upload	30
				Download	30
				Jitter	20
		Mobile networks	60	Latency	20
				Upload	30
				Download	30
				Jitter	20
Enabling technologies and security	25	Enabling technologies	20	IPv6	30
				HTTPS	70
		DNS ecosystem	30	DNSSEC Validation	50
				DNSSEC Adoption	50
		Routing hygiene	30	MANRS Readiness	50
				Upstream Redundancy	50
		Security threat	20	Secure Internet Servers	30
				Global Cybersecurity Index	40
				DDoS Potential	30
Local ecosystem & Market readiness	25	Market structure	50	Affordability	40
				Market concentration	30
				Upstream provider diversity	30
		Traffic localization	50	Peering efficiency	40
				Domain count	30
				EGDI	30

Aggregation

The Internet Society Pulse IRI uses a weighted sum formula at each level (indicator, dimension, and pillar) to aggregate the data into a composite score. The following formula was used:

$$IRI_c = \sum_i^n(w_i \cdot P_{i,c})$$

Where:

$$P_{i,c}=\sum_i^n(w_i \cdot D_{i,c})$$

And where:

$$D_{i,c}=\sum_i^n(w_i \cdot I_{i,c})$$

In simple terms, the final index 𝐼𝑅𝐼 of country "c" is the sum of the weighted pillars "P". A pillar is the weighted sum of the underlying dimensions "D" and a dimension is the weighted sum of the indicators "I", all of country "c".

Feedback

For any questions, comments, and feedback on the Internet Society Pulse IRI, please contact the Internet Society Pulse team ([email protected]).

Acknowledgements

The Internet Society would like to thank the following contributors for their valuable input to the conception of the Internet Society Pulse Internet Resilience Index (IRI). Amreesh Phokeer (Internet Society), Kevin Chege (Internet Society), Assane Gueye (Carnegie Mellon University-Africa), Josiah Chavula (University of Cape Town), and Ahmed Elmokashfi (Simula Research Lab).