As you've seen it's easily cited in the literature (check Google Scholar for "nazario phishingcorpus" for some examples). Some notes to ensure you interpret it correctly: it's not meant to be exhaustive but rather representative (since it's only my personal inbox). It's all hand classified, so it could contain errors (if you spot any please let me know). Earlier mailboxes were anonimized (destination IPs and domain names) but later ones are not. It also shouldn't contain malware (e.g. malicious executable attachments). I'd be happy to answer any additional questions you may have. Thanks for your interest and all the best with your research. I'd love to get a peek at it when you publish, I'm always interested in how people are tackling this problem. Thanks for your interest and let me know if you have any additional questions