A month of spam

So my month of analyzing how well the spam detectors for gmail, Yahoo mail and Hotmail work. The results are a little hard to interpret, but they are interesting anyway. Before you look a the table, here is some nomenclature:

True negatives: Email that I’ve received on that account that was not classified as spam and was not spam
True positives: Email that I’ve received on that account that was correctly classified as spam
False positives:Email that was classified as spam but were not spam
False negatives:Email that was spam but was not classified as spam

Provider True negatives True positives False positives False negatives
Gmail 251 555 6 2
Yahoo 72 71 1 31
Hotmail 20 25 1 2

So, what is the conclusion? Well, first that Yahoo’s spam classifier isn’t very good in catching spam. About 1/3 of the spam that I’ve received ended up on my inbox. And only one real email ended up in the spam folder (which is statistically the same as gmail or hotmail). Gmail seems to do a very good job at classifying spam, but it does seem to err more on throwing things on my spam folder than letting spam pass into my inbox, and that does annoy me a lot. As you can see, it’s the account I use the most and receive the most amount of spam. If I don’t check spam for a couple of days sometimes it’s hard to sift through 70 spam emails to find one non-spam there. Unfortunately I don’t have much to talk about hotmail because that account is mostly dead.

Other things we can say about this? Well, we can look at any date trends on spam. Do they happen more often on weekdays or weekends? (unfortunately my data is not split by time – the date on the email many times doesn’t make much sense and I haven’t checked my emails often enough to annotate time) Let’s look only at gmail where there was enough data to make it interesting:

Sunday 56
Monday 94
Tuesday 70
Wednesday 76
Thursday 106
Friday 87
Saturday 66

Or as a graph:

I wished there was much there to show. Probably I’ll need to look for longer than a month to get a better trend there. Look at the raw data day-by-day in the month for gmail:

Two of the spikes you see are Mondays and one is Thursday. The interesting trend that I’ve seen right now is that it seems like I’m getting significantly less spam in the last few days. Let’s see if this trend continues.

Well, I guess that’s it. It was fun! I should do things like this more often. Now it’s time to start my day.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s