Analysis of Overbroad Mail Filtering

Overview

In an attempt to reduce unsolicited commercial email ("spam"), certain blocklist services have come to provide lists of IP addresses and networks thought to distribute such messages. This project analyzes these services in an attempt to document overbroad classifications -- misclassification of legitimate sources of mail as spam.

 

The Implementation of Mail Blocklist Services and the Potential for Error

A variety of blocklist services have come to designated those IP addresses and networks thought to distribute unsolicited email messages. Such blocklist services include the Mail Abuse Protection System (MAPS), Spamcop, and the Open Relay Database (ORDB); Computerized Horizons maintains a list of all such services. These services rely on a common design: When a mail server receives an incoming email message from a remote host, it checks the IP address of that host for listing in one or more of these services, and if the host's IP address is listed, the mail server discards or bounces the message at issue. That is, message categorization is made on the basis of the email host used to transmit or retransmit a message -- not on the basis of the contents of the message itself.

The design of this filtering system gives rise to at least three distinct classes of errors. First, the blocklist service may simply make a mistake -- categorizing a host as a source of unsolicited email when in fact no such unsolicited email has ever been sent through that host. Second, the blocklist may refuse a message simply for originating at a blocked host previously found to send unsolicited messages. If the host is blocked by a blocklist service, then all messages sent through it will fail to reach destinations using that blocklist service -- even though many of the specific messages blocked are themselves unobjectionable, and even though the objectionable use of the host may have been by an independent user of that host. Third, the blocklist may refuse a message for originating near a host previously found to send unsolicited messages -- for blocklists are known to categorize entire networks as origins of unsolicited email even when only a few hosts in those networks have produced unsolicited messages. ("Network blocking" by Spamcop.)

Blocklist services have also been alleged to lack procedural safeguards sufficient to reduce the potential for error here. Appeals processes are informal ... In this context, lawsuits have resulted to challenge blocklist services' decisions as to site categorizations.

Blocklist services categorize sources of unsolicited email on the basis of IP address rather than domain name, introducing a further possibility of error. Good reason to do that -- analysis on the basis of hosts known to send unsolicited email. But when many domains share one or several servers -- as is typical in commercial Internet hosting environments -- blocklist services risk particularly frequent and far-reaching errors. The remainder of this document considers these errors ...

In general, the specific contents of mail filtering blocklists are not publicily available for inspection in full. Instead, interested users can check the categorization of particular hosts on a host-by-host basis. (Testing interfaces for MAPS and Spamcop.) However, blocklist services intentionally keep the full list confidential. MAPS explains that "it is very important to our legal and civil position that no one ever acquire a full copy of the MAPS RBL without indemnifying us" and therefore requires consent to a license agreement before allowing full access to the list. Spamcop's blocklist access page mentions no legal or contractual requirements, but Spamcop does require payment of a $1,000 fee to obtain access to its blocklist.

As a result, to date the author knows of no comprehensive examination of the specific sites filtered by blocklist services.

 

Methodology & Results

The author learned from Justin Greene of Clicvu that blocklist service Spamcop has categorized Clicvu's ISP, Globix, as a source of unsolicited email. Apparently one Globix customer had send unsolicited email from IP addresses within 209.10.150.96/27 (a block of 32 IP addresses), but Spamcop has categorized the entire network block 209.10.150.0/24 (a block of 256 addresses) as a source of unsolicited email. According to another affected Globix customer (and consistent with prior reports of methods of blocklists), Spamcop blocked a broad range of IPs in order to "apply pressure" to Globix to take action against the apparent source of unsolicited messages. Meanwhile, other customers in this portion of the Globix network are miscategorized by Spamcop as senders of unsolicited email -- notwithstanding never having done so.

The author has developed methods to determine most or all domain names that are hosted in a given IP range. This is information is not typically available; the DNS's reverse-lookup system ("in-addr.arpa") is notoriously unreliable, and ... .

 

Future Work

addl blocks, addl domains. pls send

 


Ben Edelman
Last Updated: December 20, 2002 - Notify me of major updates and additions to this page.

This page is hosted on a server operated by the Berkman Center for Internet & Society at Harvard Law School, using space made available to me in my capacity as a Berkman Center affiliate for academic and other scholarly work. The work is my own, and the Berkman Center neither funded it nor expresses a position on its contents.