Mapping Local Internet Control

from the Berkman Center for Intenet & Society at Harvard University

RESEARCH
Overview

There are over 1.5 billion users on the Internet, but all of those users connect through only about 30,000 autonomous systems (ASs). ASs are generally Internet service providers but can also be large companies, universities, and other such organizations who act as independent entities on the Internet. These ASs are responsible for assigning individual IP addresses and routing traffic from individual machines / IP addresses out to and in from the wider Internet. So controlling the traffic (filtering, surveilling, blocking, etc) of those 1.5 billion users only requires controlling those 30,000 ASs. But the vast majority of those ASs are small organizations that rely on one or more larger ASs for access to the wider Internet, so the vast majority of traffic flows through this much smaller slice of less than a thousand large ASs.

It is well understood now that governments exert various kinds of control over their local zones of the Internet, including filtering offensive sites, surveilling the activities of users, and controlling which users can access the Internet at all. When broken down into individual countries, the number of ASs that have access to almost all of the traffic within a given country is at most a few dozen ASs and often only a few ASs, even for the biggest countries. For example, we have found that in China over 90% of the country's approximately 190 million IP addresses ultimately connect to the wider Internet through one of only 4 ASs, while in Russia 90% of the country's approximately 20 million IP addresses connect through one of 36 ASs.

The maps on this site are a rough attempt to map how the ASs in each given country connect to one another and to the rest of the world, with a particular eye for developing metrics for the relative costs of controlling the Internet in each country. For each country, we provide a visual map of the network of ASs, as well as the following core metrics based on the country AS network data:

  • Connected IPs: the total number of IP addresses connected to the Internet through this AS, including the IP addresses connected through any children ASs
  • Points of Control: the minimum set of ASs required to connect to 90% of the IP addresses in each country
  • IPs per Point of Control: the total number of IP addresses in each country divided by the number of points of control
  • Complexity: the overall complexity of the network connecting the ASs in each country

We provide lists of all countries ranked by each of the Complexity and IPs per Point of Control metrics.

We intend the IPs per Points of Control metric as a measure of the relative ease of centrally controlling the Internet traffic within the country. China, with 4 points of control and over 190 million IP addresses, has nearly 50 million IPs per point of control. This means that China can operate a highly centralized control structure for its Internet, controlling an average of 50 million IP addresses at each of its core points of control, and indeed China has installed extensive filtering machinery at these points of control. Russia, by contrast, has 36 points of control for only 20 million IP addresses, meaning that Russia is able to control less than a million IP addresses for each of those points of control, requiring a much more complex, distributed system of control. And as expected, despite its otherwise authoritarian government, no evidence has been found of widespread Internet filtering by Russia.

A network that is strongly controlled at the center may nonetheless have a great deal of complexity at the edges. This complexity may make it difficult to control who connects to the network and how, while still allowing for control of the centralized flow of data. We intend the Complexity metric to measure the degree to which the network itself is complex, meaning that it has lots of connections between ASs and that most of its IPs are located near the edges rather than in or near the core.

There are a number of important limitations to both the data sets and analyses used for this project, explained in detail in the caveats section below. The two most important caveats are that we do not include peer relationships in the metrics and that we assume that child ASs with more than one parent share their routes equally among all parents.

Data

We generate all of the maps and metrics on this site from four sources of data

Methods

To generate the individual country maps, we:

  • Generate a list of all traced ASs in the CAIDA AS Relationships data set.
  • Look up the name and country of each AS using the Team Cymru service.
  • Look up the type of each AS using the CAIDA AS Taxonomy data set.
  • Divide the CAIDA AS Relationships data into countries using the Cymru country data. For each country, reassign all foreign provider ASs to the single, virtual 'Rest of World' AS and drop all other foreign relationships.
  • Generate a network graph for each country's ASs and relationships using the Flare Visualization library.

To generate the Connected IPs, we:

  • Look up the number of allocated IP addresses for each ASN in the CAIDA Routeviews Prefix to AS Mappings data set.
  • Recursively traverse up the tree of ASs for the country, adding the number of IPs of each AS both to its own connected IPs count and to its parent's. If an AS has more than one parent, add to each parent the connected IPs of the child divided by the number of parents. This is a gross estimation of multiple parent relationships -- see caveats below.

To generate the Points of Control for each country, we use a simple greedy algorithm:

  • Start with the AS with the most connected IP addresses as a point of control.
  • Add up the connected IPs represented by all of the points of control ASs together, taking care to count children ASs appropriately (do not count ASs in the points of control list as children, and only include a proportional share of children ASs whose parents are not all in the points of control list).
  • Find the AS that would most increase the number of connected IP addresses of the points of control.
  • Repeatedly add the AS that would most increase the connected IPs to the points of control ASs until the sum of the connected IPs of all points of control ASs is greater or equal to 90% of all IPs in the country.

To generate the Complexity for each country, we:

  • Use the equation ( C = AS * ∑( CI(a) / I ) ) / I ) where
    • C = Complexity
    • AS = total number of ASs for country
    • ∑ = sum for each AS in country
    • CI(a) = connected IPs for AS
    • I = total IPs in country
  • Using this definition of complexity, a network is more complex if it:
    • has more ASs per IP address,
    • has more connections per AS, and
    • has more of its IP addresses located away from the core of the network.

Caveats

There are a number of important limitations to this project:

  • The numbers of IP addresses for each country refers to allocated rather than actively used addresses. Even to the degree that the IP address numbers represent IP addresses used, many users now connect to the Internet through natural address translation, allowing many users per public IP address. So the number of IP addresses is only a loose approximation of the number of Internet users.
  • The core CAIDA AS Relationships data set infers the list of ASs and relationships from data gathered from a large set of traceroutes run from a distributed set of servers. This process of collection and inference finds only 86.7% of customer-provider relationships on the Internet, so not all customer links nor all ASs are represented.
  • The CAIDA route data is particularly sparse for small countries, where missing a few ASs may drastically impact the mapping. To avoid misleading results for under-sampled countries, we only include countries with observed ASs including a total of at least 25,000 IP addresses. Accordingly, we omit the following countries: Afghanistan; Aland Islands; American Samoa; Anguilla; Antigua and Barbuda; Benin; Bhutan; Botswana; British Indian Ocean Territory; Comoros; Congo, The Democratic Republic of the; Cook Islands; Djibouti; Ethiopia; French Guiana; Gambia; Greenland; Grenada; Guadeloupe; Guinea; Lesotho; Liberia; Malawi; Marshall Islands; Mauritania; Micronesia, Federated States of; Myanmar; Norfolk Island; Northern Mariana Islands; Palau; Reunion; Saint Kitts and Nevis; Saint Martin; Saint Pierre and Miquelon; Saint Vincent and the Grenadines; Samoa; Sierra Leone; Swaziland; Timor-Leste; Togo; Tonga; Turkmenistan; Vanuatu; Virgin Islands, British; Wallis and Futuna; Zimbabwe. We also eliminate the following regions: the African Regional Intellectual Property Organization and the European Union.
  • The AS Relationships data set only provides AS level relationships, which does not allow us to determine how traffic is routed among the many parents of a single child. So for each child with multiple parents, we assign to each parent the connected IPs of the child divided by the number of parents. Using more detailed, block level routing relationships would allow us to determine the real proportion of IP addresses routed to each parent.
  • The AS Relationships data set only finds 38.7% of peer-to-peer relationships. Because of the lack of peer relationships in the data set, we ignore them completely for this analysis. If most of the traffic we care about is served from within one of the ASs we categorize as points of control, or if most of the traffic is in the Rest of World for the country, then the peer relationships will not strongly affect the set of points of control (since the traffic will ultimately need to flow through the points of control in any case). But whatever portion of traffic for the country is served by ASs near the edges of the network could potentially avoid the points of control by flowing through peer connections. Likewise, a higher or lower proportion of peer connections than customer-provider connections within a country network could increase or lower the complexity of the country in relation to other countries.
  • We omit the United States because the U.S. has a large number of sparse Class A networks, vastly inflating the IP address numbers (1.2 billion addresses compared to 190 million for China, even though China and the U.S. have about the same number of Internet users).
  • We omit from the maps all customer -> provider relationships from the rest of the world to an AS within a country in order to make the links out to the rest of the world (the black node in the maps) much clearer.

Paper

Mapping Local Internet Control, 2011.