November 16, 2003

Terminator 4 - Inauguration Day

Just released for television - Terminator 4 - Inauguration Day begins tomorrow. The series is a made for television movie that will be shown in clips on the local news.

The plot - the future is altered by a political coup at the begining of the 21st century. An actor siezed control of the California governorship that leads to the eventual takeover by the machines. To honor the causal event the machines build thenselves in the actor's image. The movie will be directed by Dick Cheney and produced by Kenneth Lay.

The star of the movie - Arnold Schwarzenegger - plays the part of the governor and the machines made in his image. When asked about the making of the movie Arnold replied, "I don't understand why this movie is going to take three years to make." Arnold also commented that it was highly unusual to show the movie on television while it was being filmed. The director told Arnold not to worry about it and handed him his script for tomorrow's series premier.

Posted by marc at 05:15 PM | Comments (3) | TrackBack

Bin Laden Still Free

It's been 796 days since 9-11. Bin Laden is still free - are you?

Posted by marc at 02:31 PM | Comments (1) | TrackBack

Advanced Spam Filtering using Spamassassin and Exim

I have the most advanced spam filtering system on the planet. I feel like I've actually beaten the spam problem. More details can be found on Computer Tyme Hosting.

How do I do it? What is the magig? Well - there is no magic. I'm using a combination of the Exim MTA and Spamassassing with a bunch of my own custom rules and tricks.

Two Spam Piles

Spamassassin is very good by itself - but not good enough. one thing that the Spamassassin folks haven't quite grasped is sorting Spam into 2 piles - high scoring spam and low scoring spam.

The high spam is almost surely spam. The low spam is probably spam - but if there is a false positive - it will be low scoring. Thus the false positive is easy to find. By using this system the high spam can be ignored or trashed without losing anything. I get about 300-400 spams a day. Most all are caught as high spam.

Direct IMAP folder delivery

Once the spam is tagged - if the user is using IMAP and has folders named spam-high and spam-low - the Exim MTA delivers the spam directly into those folders rather than the Inbox. In this way the inbox is spam free and can be downloaded without downloading spam that is left on the server side. This makes downloading much quicker.

The spam folders are still accessable - so you can look at the spam you are missing. You can check the spam-low for false positives. And - IMAP allows you to create more server side folders for other important information. With a Squirrelmail interface, you can access your email from any browser.

Making the Spam Filter Smarter

Spamassassin uses a Bayesian filter that allows it to learn from spam and nonspam and get sparter. Very high scoring spam (+15 points) and very low scoring spam (-2 points) are autolearned. But - I provide two other imap folders to train the filter on missed spam. Just drag spam-low and missed spam into the spam-missed folder and - every 15 minutes - the learn bot comes along and learns it. Next time that spam comes in it is caught.

Exim Rules for Blacklisting

One of the major advances I made over Spamassassin is adding blacklisting lists to Exim. These lists - just text files - add headers if there is a match. One of the things I list are things that spam links to. Spam wants you to do something and often that means click on a link. I have a list of about 400 sites that if spam links to it - I flag it. I add spamassassin rules to score there extra headers. This trich proves to be extremely effective.

I have other lists too. I blacklist based on received strings so that sending hosts are blocked. I have a list of misspelled words like p0rn that spammers user to get around spam filters. I have a blacklist of dead email targets that no one is really mailing to. If the spam CCs and of these nonexistent people - it gets flagged.

I also have whitelists that whitelist various hosts, newsgroups, words, etc. Whitelisting creates a negative score bringing the spam score below 0. This creates a good stream for non-spam for the autolearning system so that it knows what spam and nonspam look like.

Taking out the Trash

The spam does not accululate on the server forever. Once a week the trash bot come along and empties out old messages from the spam and trash folders. Anything over 15 days is gone. So - you don't have to even delete your spam. Just leave it on the server and the trash bot will cleran it up for you.

Summary of Enhancements:


  • Two levels of Spam Tagging
  • Direct Delivery to IMAP Folders
  • Learning System for User Feedback
  • Multiple Exim Blacklist Front end
  • Server side Trash Collection

How well does it work?

When I started spam filtering I though that 75% would be real good and that 80% was a theoretical maximum. I am now running about 99% accurate, so of the 300-400 spams I get every day - only 3 or 4 get through. This saves me a hell of a lot of time. If not for this spam filtering - I wouldn't be able to get nearly as much done. I don't have a lot of hours to devote to deleting spam. This save me a ton of time.

Where can I get this?

Well - I do email hosting as well as web hosting. So - if you have a domain and you want this - I can fix you up. If I like your cause - I might even host it for free.

Posted by marc at 12:52 PM | Comments (1) | TrackBack