<<<MacOS 10.10 Yosemite Guides

Training the Spam Filter

Now that I have the spam filter working the next trick is to setup processes tht will teach the bayes spam filter what is, and is not spam. The Email client does this all automatically, but there is no such functionality built into the server to perform this task.

But, thanks to topicdesk.com, it is very easy to set up the server for training the SpamAssassin bayesian spam filter.

The basis of the system is to create two email accounts, the first for storing known junk mail (SPAM), the second for storing mail that was incorrectly marked as junk (HAM) by the bayesian filter.

On the server

DiskUtility

Turn off all services except email for these accounts

DiskUtility

Make it such that these two new users do not appear on the logon page

$ sudo dscl . create /Users/junkmail    IsHidden 1 
$ sudo dscl . create /Users/notjunkmail IsHidden 1 
    

If you want to make them visible again, re-run the commands with a 0 instead of 1, further information is available on the Apple WEB site

On YOUR email client

DiskUtility DiskUtility

Now, when junk mail comes in that your email client correctly identifies, it will be automatically placed in the “junkmail” account’s inbox. And when you manually mark a message as junk mail, it will also be moved there.

Any miscategorized messages (i.e., not junk mail) can be manually moved into the “notjunkmail” inbox.

But this is only half of the solution we also need to use TopicDesk’s spam trainer to take the data from these two accounts and use it to train the bayesian spam filter

DiskUtility

Launch the Spam Trainer Installer and follow the commands

DiskUtility

Next run Spam Trainer for the first time from the command line to check that everything is running properly

$ sudo /usr/sbin/spamtrainer 

 +————————————————————————————————————————————————————————————————————+ 
 |                                                                    | 
 |                            spamtrainer                             | 
 |                                                                    | 
 |                           Version 2.1.0                            | 
 |                                                                    | 
 |                      Copyright (c) 2005 - 2014                     | 
 |            Athanasios Alexandrides [tools@topicdesk.com]           | 
 |                                                                    | 
 +————————————————————————————————————————————————————————————————————+ 

Starting spamtrainer...

Training from user folders
Learning SPAM...
Learned new SPAM (junk mail)
Learning HAM...
Learned new HAM (not junk mail)
Syncing SpamAssassin Database
Displaying SpamAssassin Database Stats
0.000          0          0          0  non-token data: spam
0.000          0        261          0  non-token data: ham
Done!
Output produced by spamtrainer Version 2.1.0


You are using spamtrainer to train and maintain your content filter.
If you find the software useful, please spread the word. Thank you!
See: http://topicdesk.com/
    

And finally set it up to run every night

This is done by running Spam Trainer in install/check mode and answering the questions, mine are as follows

$ sudo /usr/sbin/spamtrainer -i

Checking if there is a startup item for 'learn_junk_mail' or 'spamtrainer'

There IS NO plist for learn_junk_mail

There IS NO plist for spamtrainer
if you want to use 'spamtrainer' it is recommended that this be added
Would you like me to enable it for you (yes/no)
yes
What time would you like spamtrainer to run (24-hour format)?
Please enter the hour (1)

Please enter the minutes (0)

Would you like SPAM/HAM messages to be deleted after the learn process? (n)

Would you like to log bayes stats into /var/log/spamtrainer.log? (n)
y
If you would you like to have spamtrainer mail you a report after it runs, then please enter an e-mail address. Enter for no (n)
admin@cougar.eu.com
Enter name of mail store with SPAM/HAM mailboxes? Enter for default (default)

Enter name of mailbox with SPAM? Enter for default (junkmail)

Enter name of mailbox with HAM? Enter for default (notjunkmail)

A plist with the following parameters for 'spamtrainer' has been prepared
/usr/sbin/spamtrainer -m admin@cougar.eu.com -l
It will run each day at 1:0
Would you like to add and enable it? (yes/no)
yes
The following launchd plist item for 'spamtrainer' has been enabled'
/usr/sbin/spamtrainer -m admin@cougar.eu.com -l
It will run each day at 1:0
    

And that is it, I chose not to delete the messages automatically as the mail system is used by everone in my family and I need to give them the chance to retrive incorrectly flagged junk mail