Technology Impact Assessment: Fingerprinting versus Bayesian Filtering

Since the early 2000s, the battle between spammers and providers of antispam systems has resulted in a radical growth in the volume of “noise” on the internet. In response to the costs of the massive volumes of spam on email systems, and increasingly in mobile messaging, internet service providers (ISPs) and large organizations have adopted two general classes of antispam technologies. Fingerprinting mechanisms rely on the mass characteristic of spam and employ a centralized architecture to identify spam. Bayesian (statistical) filters in contrast simply detect patterns of desirable (ham) or undesirable (spam) communications based on the preferences of end users, while using artificial intelligence (AI) techniques to automate the rest of the filtering process.

This paper reviews performance and other criteria of leading open source and commercial applications relevant for network administrators and managers when evaluating their choices between the two general classes of technologies. Specifically, the analysis compares the features of three leading fingerprint systems (eXpurgate®, commtouch®, and Mailshell) with two filters that employ the Bayesian methods (SpamAssassin and COMDOM® Antispam). The analysis suggests that the optimized Bayesian filtering in COMDOM® Antispam and its Tachyon Core® scanning engine achieve a throughput level at least five times faster than the leading fingerprint bundles, while maintaining the high accuracy rates expected in content filters. Consequently, COMDOM® Antispam significantly lowers the infrastructure costs associated with spam relative to some of the most efficient of commercial fingerprint systems, and the numerous commercial front end security software and appliances based on earlier Bayesian filters by:

  • A. Reducing expenditures on hardware, software, and administration of mail server in the short to medium term, and
  • B. Construction of a decentralized system of antispam protection more robust to smart spam and BGP spectrum agility techniques by spammers in the longer term.

All data used in the assessment are compiled from publicly available information. Hence, the results can be independently verified and employed in internal technology decisions by ISPs, in OEM integration, and by organizations that must process large volumes of spam that persist on the internet.