me, 2.0: jose nazario
beauty and the street
phishing corpus

a shot of a fishing fly i recall purchasing in scotland about ten years
ago. this is a wet fly, and on a heavier hook than i normally use. i don't
think i've used it, as i haven't crimped down the barb (i catch and
release).
while talking with someone recently, we discovered that there aren't
any good phishing corpi available for the general public to study and
analyze. some
researchers from birmingham, uk, want to gather up phishing mails
for a corpus they want to construct, but they don't yet have a corpus they
are sharing.
since i collect my spam, and since i've manually classified it, i have
a bunch of viruses and also a bunch of phishing emails in my spam corpus.
hence, a long term phishing corpus was easy to construct.
what can you do with such a beast?
- train a bayesian
mail classifier (like ifile, spambayes, etc) how to identify these and
classify them not as spam but as phishing.
- deconstruct them and learn how phishers operate.
- add specific rules to procmail recipes
- anything else you want to do ...
you can download the phishing corpus from the PhishingCorpus wiki page
here on my website.
|
next Tuesday, Jun 14, 2005 @ 03:35pm |
previous Sunday, Jun 12, 2005 @ 01:12pm
| archives
|
Last modified: Monday, Jun 13, 2005 @ 09:54am
|
copyright © 2002-2005 jose nazario, all rights reserved.