On Sunday 5 February, Italy’s largest Internet Service Provider (ISP), TIM (formerly Telecom Italia), suffered a major network outage that affected more than one in three of Italy’s Internet users for nearly five hours.
Now that the dust has settled, let’s have a look at exactly what happened and, in doing so, discuss the importance for countries to have appropriate redundancy and peer with local partners so that such large-scale outages don’t happen.
Why Did One in Three Internet Users in Italy Lose Internet Connectivity?
As shown by the IIJ Network Dependency list (Figure 1), three of Italy’s top eight largest networks (based on user base) are managed by the Telecom Italia group — AS6762 SEABONE-NET (Sparkle), AS3269 ASM-IBSNAZ (the national landline network), and AS16232 ASN-TIM (TIM’s mobile network).
Sparkle is an international carrier that serves as a transit provider for many networks in Italy, but does not directly serve any end users. It operates as a separate entity, but is fully owned by Telecom Italia. TIM’s landline network (AS3269) directly serves 20.8% of the population. And because it is used as a transit network, it indirectly serves an estimated 8.5% of the country’s population. If we include the 8.4% of the population it serves via its mobile network (AS16232), we reach around 38% of Italy’s inhabitants. This means the outage on Sunday affected more than one third of the country.
The Italian Network Operators Group (ITNOG) runs a Telegram group that was very active during the outage, with many participants trying to understand what was happening, and Sparkle representatives providing updates on how they were resolving the issue.
As of the time of writing, we don’t have a detailed description of what happened, only that:
- The issue was located around international connectivity, which TIM buys from Sparkle.
- The issue also affected DNS resolvers on TIM’s network, and apparently some of its PPPoE servers.
- Connectivity between some locations in the country was also disrupted around the same time.
- While this outage affected different networks and was reported by a handful of people, it is still important to note, as some faulty cables might have triggered issues on routers or other networking equipment.
While we wait for a postmortem from both TIM and Sparkle we can speculate that these issues could have been avoided by TIM having greater redundancy and more interconnection.
The Importance of Redundancy
If you are a network operator, fault-handling should be a priority, as well as designing your network so that you don’t have a single point of failure. TIM should not be taken as a good example of either, since it relies only on Sparkle for international connectivity, as can be seen on the Hurricane Electric BGP Toolkit (Figure 2).
It is common practice for operators to be multihomed, which means having more than one other operator as an upstream provider, delivering so-called ‘transit’ to the rest of the Internet.
For example, Figure 3 below shows how Vodafone (AS30722), a competitor of TIM in Italy, has five upstream providers— AS3356 (Lumen, former Centurilink and Level3), AS1299 (Arelion), AS1273 (Vodafone Global Network), AS5396 (IRIDEOS Spa), AS6939 (Hurricane Electric) and AS1299 (Twelve99). This setup provides plenty of redundancy ’when’ one of these upstreams has an outage.
A Case For Why Peering With Other Local Networks is Important
Before 2013, TIM, being the incumbent telco in Italy, was forced by law to peer with every Italian operator. This meant that its peering matrix was really complex but very diversified. However since 2013 the laws have changed, and almost all of TIM’s peering relationships were discontinued. You can learn more about it from Marco d’Itri’s presentation at Salottino MIX 2014 and from this presentation by Luca Cicchelli (TOP-IX) and Mauro Magrassi (MIX) at the 2013 European Peering Forum
What this means is that, in many cases, if local Internet users want to reach users on one of TIM’s networks, traffic needs to transit via Frankfurt, Germany where it enters the Sparkle network (TIM’s only upstream provider), and then gets delivered back to TIM in Milan. This increases the latency and cost to connect with TIM networks and could be avoided if TIM peered locally via one of the many Internet Exchange Points (IXPs) in Italy.
This is a situation that TIM has chosen: it does not have any technical basis. Rather, it is a political stance. Peering with most of the networks on the two peering points where it has an active port — NAMEX in Rome and MIX in Milan — would require minimal effort.
In 2020 and 2021, TIM temporarily set up peering sessions with any network that requested it to “absorb” the traffic for those working from home during lockdowns. TIM had also set up a peering agreement with TOP-IX, but all of this was discontinued in late 2021.
This outage could not have been a better way to show how important IXPs and local content are. This is an area of work the Internet Society has been focusing on. We work to promote IXPs and interconnection, and are focusing on helping countries keep traffic local with our 50/50 vision- an ambitious plan to keep at least half of all Internet traffic in emerging economies local by 2025.