A variety of organizations, institutions, companies, and countries seek to restrict Internet access from within their premises and territories... more >Jonathan Zittrain* and Benjamin Edelman**
A variety of organizations, institutions, companies, and countries seek to restrict Internet access from within their premises and territories. For example, companies may seek to improve employee productivity by restricting access to leisure sites; libraries and schools may seek to avoid exposing children to sexually-explicit content, or be required to do so; countries may seek to control the information received by their citizens generally. Common among nearly all these applications is the public unavailability of the filtering lists -- that, by the design of filtering systems, users cannot and do not know the set of specific sites blocked. In some cases users might ask for a specific site and be told of its unavailability due to filtering, but in other cases such unavailability may be conflated with unremarkable network blockages -- a Web site might be unreachable for any number of reasons, and the failure to view it at a particular moment cannot reliability be attributed to active filtering.
With this project we seek to document and analyze a large number of Web pages blocked by various types of filtering regimes, and ultimately create a distributed tool enabling Internet users worldwide to gather and relay such data from their respective locations on the Internet. We can thus start to assemble a picture not of a single hypothetical World Wide Web comprising all pages currently served upon it, but rather a mosaic of webs as viewed from respective locations, each bearing its own limitations on access. As various countries, companies and other entities employ or consider employing filtering software, documentation of the specific details, successes, and in some instances flaws of existing filtering efforts may prove helpful. (See European Union Internet Action Plan - Filtering & Rating, among other entities considering filtering.)
In general we seek to focus on those blocked pages likely to be most controversial -- for example, those blocked pages seemingly inconsistent with the criteria of the blocking regimes that respectively restrict their retrieval, as well as blocked pages that are well-known or frequently requested. However, to provide a more complete sense of the state of Internet filtering efforts, we also seek to investigate allblocking, including the restriction of access to web pages consistent with censors' category definitions.
Previous efforts to document the extent of government filtering have been made by researchers at the Soros Foundation's Internet Censorship Project. In that work, the ICP sent correspondents around the world to collect anecdotal data about filtering efforts worldwide by manually searching for some well-known Web pages; we build on their work by invoking automated methods to test and document thousands of pages blocked by each country or other blocking system studied. We wish to similarly augment the efforts described in such ventures as Shanthi Kalathil and Taylor Boas's "The Internet and State Control in Authoritarian Regimes" and Radio Free Europe's "20 Enemies of the Internet." Finally, our work follows a series of projects intended to document sites blocked -- and in many instances arguably wrongly blocked -- by major commercial Internet filtering applications; such projects include Bennett Haselton's Peacefire and Seth Finkelstein's Anticensorware Investigations as well as one author's Sites Blocked by Internet Filtering Programs.
In future work, the authors intend to expand analysis to Internet filtering systems in additional countries. Sign up to receive updates. To date, our methodology is limited to obtaining Internet access through a given country and testing a set of URLs for blockages; however, our ultimate aim is to develop a distributed software application for use in testing, analyzing, and documenting Internet filtering regimes worldwide. This application will enable Internet users to easily test what is and isn't filtered from their respective locations on the network, relaying results back for analysis and documentation. Get more information and sign up to get involved.Country-Specific Studies
From May 14 to 27, 2002, the authors investigated Internet Web blocking in Saudi Arabia. We find a total of 2,038 distinct web pages blocked, including over a thousand sites containing non-sexually explicit content. In testing a sample of 795 well-known sexually-explicit sites we found 86.2% blocked.
From March to November 2002, the authors tested more than two hundred thousand sites through Internet access in China, finding nearly twenty thousand to be blocked. Blocked sites included sites operated by world governments, non-governmental organizations, and educational institutions, as well as sites with health, entertainment, and political content.
Real-Time Testing of Internet Filtering in China (August 2002) - An experiment in "open research," this system allows anyone interested to suggest a site for testing in China and to find out immediately whether that site is accessible from within China.
Screenshots documenting the September 2002 replacement of requests for google.com with the pages of other search engines. (September 2002)
Edelman writes an op-ed column for the South China Morning Post. He argues that an important, but often unrecognized, aspect of China's Internet filtering is the extreme secrecy that surrounds the practice.
"Blocked Sites Will Return, but with Limited Access." The South China Morning Post. (January 2003)
Edelman notes blocking by China of more than a million user-designed sites hosted on Blogspot.com. He argues that as China begins to use more granular methods of blocking, its efforts may encounter less resistance and cause less controversy.
Although Google's SafeSearch feature is intended to omit only sexually-explicit content from Google's search results, it actually blocks considerably more, including at least tens of thousands of web pages without any sexually-explicit content. Edelman finds that blocked results include sites operated by educational institutions, non-profits, news media, and national and local governments.
When filtering operates at the level of a web server's IP address, all content on that web server is typically blocked if any of its content is to be blocked. In this project, Edelman determines the rate at which web servers host multiple web sites -- finding that more than 87% of active domain names share their IP addresses with one or more additional domains, while more than two thirds of domain names share their IPs with fifty or more domains.
This project documents exclusions from Google's French and German sites, in apparent efforts to comply with local content restrictions. A further experiment in open research, this system also allows anyone interested to suggest a site for immediate testing.
In testing of February 2001 to April 2002, as part of expert work in the case of American Library Association v. United States, 201 F.Supp.2d 401 (E.D.Pa., 2002), Edelman investigated Internet blocking by leading commercial filtering packages N2H2, Smartfilter, Surfcontrol, and Websense. Edelman reports a total of 6,777 distinct web pages found likely to be wrongly blocked, judged against filtering companies' own category definitions.
Other Analyses and Projects
Represented by the ACLU, Edelman seeks a declaratory judgement permitting him to extract N2H2's block list, analyze it, report its content, and share the tools used to do so -- without fear of suit by N2H2. The lawsuit, filed in Massachusetts federal court, is pending.
Zittrain writes a "Perspective" column for CNET News.com. Zittrain argues that filtering of Internet content is on the upswing, a trend that -- if left unchecked -- threatens to undo a basic underpinning of the global cybernetwork.
* Jack N. and Lillian R.
Berkman Assistant Professor of Entrepreneurial Legal Studies, Harvard Law School.
** J.D. Candidate, Harvard Law School, 2005.
Last updated February 17, 2008