Nathan @ the Computer Lab

Part II / Diploma Project Ideas

Unfortunately I am now nearing the end of my PhD and therefore will not be supervising any projects in 2004/2005. I leave this page here for historical reasons, but please don't e-mail me to ask for suggestions of alternative supervisors as I do not know of anyone.

Trust-Based Collaborative SPAM Filtering

Spam is an irritating (and growing!) problem that all internet users face. The current primary weapons against it are complex keyword filtering tools that are prone to tagging legitimate mail as spam. Vipul's Razor is a relatively new idea which replaces keyword filtering with a collaborative detection and filtering network. The collaboration model utilises a user's reputation to assess spam reports and spot false ones.

A current research project of the Opera group is a trust-based security paradigm, called SECURE, that allows a system to reason about the trustworthiness of a user in a similar (but much more complex) manner than the reputation system used in Vipul's Razor. Although SECURE's trust model is not yet substantially complete, I believe it would be an interesting Part II or Diploma project to apply the ideas behind the model (or a simplified version of it) to a distributed spam filtering application.

Use Case

Alice receives an unsolicited bulk e-mailing. She marks it as such and her mail program generates a rule which would block this particular spam. The rule is signed by Alice and published in a distributed database.

Bob (actually an agent acting on his behalf) observes that a new rule is available from Alice and evaluates whether to apply it to incoming mail using his view of her trustworthiness. The action taken upon receipt of an e-mail matching this rule also depends on his trust in Alice. Bob's feedback on how "correct" the action taken was will cause an adjustment to Alice's trustworthiness rating.


Possible Extensions and Related Ideas

Another new way to fight spam that has gained much press recently is the use of Bayesian Statistical methods. An extension to the project would be to also implement a Bayesian statistical algorithm and compare its efficiency with the distributed network. Trust could also be used in the the Bayesian algorithm as the majority of people are unlikely to have a large enough corpus of spam to produce useful statistics, so sharing a corpus of spam would be a good way of implementing the Bayesian algorithm on top of the collaborative network.

Homepage Valid HTML 4.01!