===== About =====

The world needs a phishing corpus. Basically, how hard is it to gather up a few hundred phishing emails and assemble them into an unmolested mailbox for study? Oddly, it's easy to do yet no one has done it.

I collect my spam, I always have ... it's a compulsion by now. In this time period (nov 2004 - june 2005) I collected over 32,000 spams, yet only about 415 phishing emails. This corpus is hand selected from these messages and contains nothing but good old fashioned phishing emails.

To the best of my knowledge this is the first such phishing corpus publicly available.

===== Purposes of the corpus =====

===== The files =====

414 messages from November 27, 2004, until June 13, 2005, covering a variety of common phishing schemes. 3119972 bytes. Format: UNIX mbox (plain text, processable with procmail, Python, Perl, etc, imports into Eudora, Mail.app, and others).

434 messages from June 14, 2005, until 14 November, 2005, covering all of the common phishing schemes. 4118879 bytes. Format: UNIX mbox.

1423 messages from November 15, 2005, until 7 August, 2006, covering many of the newer phishing schemes. 10420294 bytes. Format: UNIX mbox.

2279 messages from August 7, 2006 - August 7, 2007 (a year!), covering many many phish and targets. 20067215 bytes. Format: UNIX mbox.

===== Existing Research Citing the Phishing Corpus =====

//Learning to Detect Phishing Emails//, Ian Fette, Norman Sadeh, and Anthony Tomasic. WWW 2007, Banff, Alberta, Canada.

S. Abu-Nimeh, D. Nappa, X. Wang, S. Nair, "A Comparison of Machine Learning Techniques for Phishing Detection", In Proc. of the 2nd APWG eCrime Researchers Summit, 2007.

Lina Zhou, Yongmei Shi, Dongsong Zhang, "A Statistical Language Modeling Approach to Online Deception Detection," IEEE Transactions on Knowledge and Data Engineering, 08 June 2007. IEEE Computer Society Digital Library. IEEE Computer Society, 24 March 2008.

Ahmed Obied, Reda Alhajj, "Fraudulent and Malicious Sites on the Web".

Troy Ronda, Stefan Saroiu, Alec Wolman, "iTrustPage: A User-Assisted Anti-Phishing Tool", in EuroSys’08, April 1–4, 2008, Glasgow, Scotland, UK.

ZHENGHUI ZHU, "DECONSTRUCTION AND ANALYSIS OF EMAIL MESSAGES", M.Sc. Thesis. FLORIDA STATE UNIVERSITY.

Madhusudhanan Chandrasekaran, Vidyaraman Shankaranarayanan, Shambhu Upadhyaya, "CUSP: Customizable and Usable Spam Filters for Detecting Phishing Emails", NYS Symposium, Albany, NY 2008

Changwei Liu , "Fighting Unicode-Obfuscated Spam", Proceedings of E-Crime Research, 2007. http://www.ecrimeresearch.org/2007/proceedings/p45_liu.pdf

===== Related =====

Things that are slightly related will go here.