SPAM: now with real meat
-
- Tenth Dan Procrastinator
- Posts: 4891
- Joined: Fri Jul 18, 2003 3:09 am
- Location: San Jose, CA
SPAM: now with real meat
So, how much spam have you filtered over the last year or so? I've been filtering my email since December 23rd or so last year and I've caught 2375 pieces of spam with my sieve script. Spam assassin support was added around March to aid in capturing all this spam. How much spam have you caught?
-
- Grand Pooh-Bah
- Posts: 6722
- Joined: Tue Sep 19, 2006 8:45 pm
- Location: Portland, OR
- Contact:
Unfortunately, I haven't got continuity in my data. I changed email addresses in that time frame. Also, dwindlehop@cmu.edu expired, so I ceased receiving all email directed to that address. Also, I changed spam filters from SpamAssassin plus Bogofilter to Mozilla Mail's Bayesian filter. Also, I read all my email on my Hiptop, anyway, which has no filter support. Uh, yeah. So basically, I have no idea.
I will say that keeping your email address a running target does wonders for your spam intake. I will also say that I believe having a weird TLD is also beneficial, spamwise, because many scripts don't recognize .name as a TLD.
How much legitimate email did you acquire in the same timeframe? What was the ratio of spam to ham? What was the positive result percentage (spams tagged as spam)? What was percentage of spam that got through? What was the percentage of good email that got tagged as spam?
Are you deleting spam, directing to a separate file or directory, or other? I would like to bounce, but sometimes I think that I will just wind up sending junk to whoever's email address got spoofed.
I will say that keeping your email address a running target does wonders for your spam intake. I will also say that I believe having a weird TLD is also beneficial, spamwise, because many scripts don't recognize .name as a TLD.
How much legitimate email did you acquire in the same timeframe? What was the ratio of spam to ham? What was the positive result percentage (spams tagged as spam)? What was percentage of spam that got through? What was the percentage of good email that got tagged as spam?
Are you deleting spam, directing to a separate file or directory, or other? I would like to bounce, but sometimes I think that I will just wind up sending junk to whoever's email address got spoofed.
-
- Tenth Dan Procrastinator
- Posts: 4891
- Joined: Fri Jul 18, 2003 3:09 am
- Location: San Jose, CA
The problem with changing your email address often is much like the problem of changing your cell phone number or even address often. You need to tell everyone the new address or number. Yes, this will keep you from getting spam. An interesting side note would be that changing your telephone number might actually cause you to get more telemarketers.
The strange TLD is also a pain when trying to enter your email address into some forms because they'll reject it saying that it's an invalid email address.
I haven't gone through to count the amount of legitimate mail in my junk folder, but it's quite low. I'll also say that the amount of real mail making it to my spam folder lately is almost non-existant. Most of it got in there nearer the beginning when I was still tuning the script. Pretty much all of it is because someone was sending mail from hotmail or some other known spam domain and I didn't have their address to snatch it away from the meat grinder.
I am not deleting spam and just filing it into a junk folder for brief manual inspection. The problem with bouncing or rejecting is that you may lose mail that was actually meant for you sometimes. I learned that rejected mail is not garunteed to be delivered to the rejectee as part of the general mail handling spec. Also, as you said, you'll likely just spam the person whose address got spoofed. Finally, if someone rejects the rejected mail back to you, it will bypass your script and be delivered anyway.
The way I file email from certain people to special folders with sieve makes counting how much legitimate mail I recieved a difficult statistic to extract. I can tell you that so far this month, only one piece of mail has evaded my filter and that was on October 1st. Also, to put this in context of the amount of spam I get, it's about 15-20 pieces of spam a day. So, this month so far, less than .5% of spam has beaten my filter.
Spam assassin by itself is not a good enough filter for certain and as such, I have several other rules looking into the header trying to find spam. My most recent addition is to actually look at the mail servers that the piece of mail went through. If it hits a server in another country, it's filtered as spam.
I'm starting to move away from the huge long list of known domains that spam has come from in the past in favor of smarter rules like the one above. However, I still maintain most of this list since there are quite a lot of spammers in this country as well. To help limit the size of this list, I've started looking for common words that spammers like to include in their domains and filter on those in place of full domains containing those words. I'm still considering this problem of just too many domains every once in a while to see if I can come up with a better solution.
The strange TLD is also a pain when trying to enter your email address into some forms because they'll reject it saying that it's an invalid email address.
I haven't gone through to count the amount of legitimate mail in my junk folder, but it's quite low. I'll also say that the amount of real mail making it to my spam folder lately is almost non-existant. Most of it got in there nearer the beginning when I was still tuning the script. Pretty much all of it is because someone was sending mail from hotmail or some other known spam domain and I didn't have their address to snatch it away from the meat grinder.
I am not deleting spam and just filing it into a junk folder for brief manual inspection. The problem with bouncing or rejecting is that you may lose mail that was actually meant for you sometimes. I learned that rejected mail is not garunteed to be delivered to the rejectee as part of the general mail handling spec. Also, as you said, you'll likely just spam the person whose address got spoofed. Finally, if someone rejects the rejected mail back to you, it will bypass your script and be delivered anyway.
The way I file email from certain people to special folders with sieve makes counting how much legitimate mail I recieved a difficult statistic to extract. I can tell you that so far this month, only one piece of mail has evaded my filter and that was on October 1st. Also, to put this in context of the amount of spam I get, it's about 15-20 pieces of spam a day. So, this month so far, less than .5% of spam has beaten my filter.
Spam assassin by itself is not a good enough filter for certain and as such, I have several other rules looking into the header trying to find spam. My most recent addition is to actually look at the mail servers that the piece of mail went through. If it hits a server in another country, it's filtered as spam.
I'm starting to move away from the huge long list of known domains that spam has come from in the past in favor of smarter rules like the one above. However, I still maintain most of this list since there are quite a lot of spammers in this country as well. To help limit the size of this list, I've started looking for common words that spammers like to include in their domains and filter on those in place of full domains containing those words. I'm still considering this problem of just too many domains every once in a while to see if I can come up with a better solution.
-
- Grand Pooh-Bah
- Posts: 6722
- Joined: Tue Sep 19, 2006 8:45 pm
- Location: Portland, OR
- Contact:
I was being facetious, but I'll allow it wasn't obvious enough. Changing your email address is bad, not having a TLD recognized by most scripts is bad. Using a forwarding address (jonathan@pearce.name) mitigates this somewhat.quantus wrote:The problem with changing your email address often is much like the problem of changing your cell phone number or even address often. You need to tell everyone the new address or number. Yes, this will keep you from getting spam. An interesting side note would be that changing your telephone number might actually cause you to get more telemarketers.
The strange TLD is also a pain when trying to enter your email address into some forms because they'll reject it saying that it's an invalid email address.
It sounds like you should be tuning SpamAssassin, not adding generic patterns to sieve scripts. SpamAssassin has all the pattern recognition going on already. If stuff is still getting through it, then you should increase the weights of the specific tests which your spam hits. Or you should increase the sensitivity of SpamAssassin to the point where it starts marking your spam as spam. Or you should contribute heuristics to their test list if you have something clever that they don't do.
Heh, Kerry Wood just hit a 3 run home run to tie the game.
-
- Tenth Dan Procrastinator
- Posts: 4891
- Joined: Fri Jul 18, 2003 3:09 am
- Location: San Jose, CA
You know, I would do this, but I'm pretty sure I don't have access to SpamAssassin's config to do this. I only have access to sieve.Dwindlehop wrote:It sounds like you should be tuning SpamAssassin, not adding generic patterns to sieve scripts. SpamAssassin has all the pattern recognition going on already. If stuff is still getting through it, then you should increase the weights of the specific tests which your spam hits. Or you should increase the sensitivity of SpamAssassin to the point where it starts marking your spam as spam. Or you should contribute heuristics to their test list if you have something clever that they don't do.
-
- Grand Pooh-Bah
- Posts: 6722
- Joined: Tue Sep 19, 2006 8:45 pm
- Location: Portland, OR
- Contact:
Ha!quantus wrote:You know, I would do this, but I'm pretty sure I don't have access to SpamAssassin's config to do this. I only have access to sieve.Dwindlehop wrote:It sounds like you should be tuning SpamAssassin, not adding generic patterns to sieve scripts. SpamAssassin has all the pattern recognition going on already. If stuff is still getting through it, then you should increase the weights of the specific tests which your spam hits. Or you should increase the sensitivity of SpamAssassin to the point where it starts marking your spam as spam. Or you should contribute heuristics to their test list if you have something clever that they don't do.
This is the current list of tests SpamAssassin(tm) performs on mail messages to determine if they're spam or not. If you wish to change the score from the default, add a line like this to your ~/.spamassassin/user_prefs:
score NAME_OF_TEST 3.0
-
- Tenth Dan Procrastinator
- Posts: 4891
- Joined: Fri Jul 18, 2003 3:09 am
- Location: San Jose, CA
Ok, the only configuration that's allowed is similar to what's available on yahoo! mail which is the ability to specify addresses and domains which definately are/are not spammers. Of course sieve allows me to do the same thing in a much more powerful manner. Yes, I could just forward my mail to my own personal machine with SpamAssassin installed and tuned to my needs, but that's too much effort for now. The band-aid approach I'm using is doing quite well.
-
- Grand Pooh-Bah
- Posts: 6722
- Joined: Tue Sep 19, 2006 8:45 pm
- Location: Portland, OR
- Contact:
Huh? I'm no longer arguing that you should stop tuning Sieve. I'm just curious as to what you're saying.quantus wrote:Ok, the only configuration that's allowed is similar to what's available on yahoo! mail which is the ability to specify addresses and domains which definately are/are not spammers. Of course sieve allows me to do the same thing in a much more powerful manner. Yes, I could just forward my mail to my own personal machine with SpamAssassin installed and tuned to my needs, but that's too much effort for now. The band-aid approach I'm using is doing quite well.
You can change the weight of any of the SpamAssassin tests. Like "Character set indicates a foreign language"; "Razor2 gives confidence between 51 and 100"; "Message-Id is fake (in Outlook Express format)"; "Talks about millions of dollars"; or any of the other bazillion SpamAssassin tests. Click on the link and read the list.
-
- Tenth Dan Procrastinator
- Posts: 4891
- Joined: Fri Jul 18, 2003 3:09 am
- Location: San Jose, CA
I saw the link and have read the different tests. There's no way I can get to that config to edit the values used by Cyrus. The values are set for everyone.Dwindlehop wrote:Huh? I'm no longer arguing that you should stop tuning Sieve. I'm just curious as to what you're saying.quantus wrote:Ok, the only configuration that's allowed is similar to what's available on yahoo! mail which is the ability to specify addresses and domains which definately are/are not spammers. Of course sieve allows me to do the same thing in a much more powerful manner. Yes, I could just forward my mail to my own personal machine with SpamAssassin installed and tuned to my needs, but that's too much effort for now. The band-aid approach I'm using is doing quite well.
You can change the weight of any of the SpamAssassin tests. Like "Character set indicates a foreign language"; "Razor2 gives confidence between 51 and 100"; "Message-Id is fake (in Outlook Express format)"; "Talks about millions of dollars"; or any of the other bazillion SpamAssassin tests. Click on the link and read the list.
-
- Tenth Dan Procrastinator
- Posts: 4891
- Joined: Fri Jul 18, 2003 3:09 am
- Location: San Jose, CA
Agreed, that is the file, but I'm 99% sure that it would have to reside on cyrus.andrew.cmu.edu and not in my andrew account. This is especially true since there's no info on how to configure SpamAssassin on the computing info page while there is info about Sieve there. I will try it though just because you want me to. I'll try upping the score for ROT13 email address to 5 and see if that generates a X-SpamWarning.Dwindlehop wrote:You change ~/.spamassassin/user_prefs. That's a per-user config file to override the server config. I don't see the problem.