Set-up for Email

Reading email
Forwarding email Updated
Sending email
Spam filtering implementing SpamAssassin
Virus protection
Email recovery
CSIM Logo WelcomeCourses
Faculty, Student, Staff
Projects and reports
Conferences, workshop and seminars
Laboratories and reasearch facilities
Information related to CSIM
Information non-related to CSIM
Address, map, phone, etc.
Search

Email recovery

Starting on January 23, 2006, an email recovery facility has been set-up. This system allows you to select and recover old emails that are not longer in your mailbox.

Every email received in CSIM is copied to the back-up machine —except the mail that are quarantined because they were detected as virus or spam.

The mail back-up machine has a 25 giga bytes hard disk to save incoming emails; when the hard disk is 90% full, older email will be deleted until 5% of the disk space has been freed. Emails are deleted on a first in first out policy, from the mailboxes of all users. No warning is issued. Under the current situation, emails can be kept for 114 weeks (over two years) before they get removed from the back-up system, then emails from the oldest 6~7 weeks will be deleted.

As a user, you can recover your old email with the home made web interface. Recovered messages will come to your regular mailbox.

Note that this interface has limited capabilities: while it looks like a web mail system, it will not allow you to save attachment for example. Only image attachments can be seen. Also you have to specify a search criteria before you can view the list of old emails: this is due to the fact that the list will grow very long over the time (some users receive 50,000 messages per year) and it would be useless to display the list without some classification done. The search criteria are limited to the one supported by the IMAP server that is running on the mail back-up machine as it is defined in RFC 1176 (tag SEARCH search_criteria); search criteria can only be combined with AND type of boolean.

Messages can take up to five minutes before they are visible in the recovery system. After you have asked for a specific recovery, it can also take up to five minutes before the messages gets into your regular mailbox.

How does it work?

Architecture

The system is based on the architecture depicted below. The file server and the mail server are two major components of the system: if the file server is down, the mail server will not work, even before mail back-up was installed.

The mail server is running procmail as mail delivery agent, incoming emails are delivered in the system INBOX in $HOME/Maildir/new/.

The file server is offering files via NFS, every user is allocated an home directory in /home/sub_dir/user, /home is physically located on the file server. There is also an existing web server running apache, openssl and php including imap module; the web server also NFS mount /home from the file server. The web server could be running on the same machine as the mail back-up.

I have added one machine with 40 GB IDE/ATA disk, running FreeBSD 6.4. The disk has been partitionned to keep 27 GB for a filesystem mounted as /home. In this directory I created a structure very similar to the one used for the users home directories; there are directories of the form /home/sub_dir/user. The regular /home directory from the file server is NFS mounted on /transit, it will be used to save temporary files. I have installed courrier-imap from the ports. Like every machine in the system, authentication is done via LDAP.

  /dev/ad0s1a / ufs rw 1 1 /dev/ad0s1g /home ufs rw 2 2 /dev/ad0s1f /usr ufs rw 2 2 /dev/ad0s1e /var ufs rw 2 2 procfs /proc procfs rw 0 0 file server:/home /transit nfs rw 0 0  
Filesystem mounted on the mail bak-up server

Up to four machines are being used by that system, so a central configuration file is not possible. Most of the pathnames are hardcoded in the various scripts.

Duplicate the messages

The mail server makes a duplicate of every email that are delivered to the user (this does not include viruses nor spam) with the following procmail receipe. This recipe is located toward the end of the system procmail rcfile, so all previous processing has been done. When the variable NOBACKUP is set to 1 (one) the email is not duplicated; this avoids to duplicate and back-up again a message that is being recovered.

As the filename is ending with a slash (/), procmail will make de delivery in format maildir, in /home/mail-temp/save/user_name/new/timestamp.processID.name.of.the.machine. The subdirectories user_name and new are created by procmail when needed, with proper file mode and ownership to preserve confidentiality.

The directory /home/mail-temp/save/ has been created manually and belongs to root with file mode set to 0700.

  # keep a copy for back-up :0c * ! NOBACKUP ?? 1 # unless NOBACKUP = 1 /home/mail-temp/save/$LOGNAME/ # copy in /home/mail-temp/save/user_name  
Procmail recipe to copy the email before delivery

Move to final location

The mail back-up server runs a shell-script that moves the duplicate messages from the transit directory /transit/mail-temp/save/user_name/new/timestamp.processID.name.of.the.machine to the final back-up directory /home/sub_dir/user_name/Maildir/cur/timestamp.processID.name.of.the.machine. The final back-up directory is the location where courrier-imap is expecting to find the user incoming mailbox. The directories /home/sub_dir were created manually with no specific restriction on the ownership, other directories are created by the script when needed with appropriate file mode and ownership.

Because the final back-up directories must follow what courrier-imap is expecting, those users that have their home directory outside of the /home space cannot have their messsages preserved (this concerned the user root among others).

This script is run at start-up and will pause for 5 minutes after moving a batch of messages; it is also in charge of deleting older backed-up messages when the disk becomes too full. When all the back-up messages are deleted for a given user, the directories for that user are removed too: as the back-up messages are kept for three years, if all messages are deleted, it means the user has not received a single new message within the last three years, it means that most probably that user is not registered any more and it is safe to remove the directories that are empty now.

At that point, email messages are kept in the back-up server for a certain (long) period of time and can be recovered with the web interface.

Web interface for email recovery

A serie of php-scripts and web pages allow a user to mark certain emails for recovery. The proper way to crypt the password in a php session has been removed from the scripts included here.

When a user marks a message for recovery, a copy of that message is saved in /home/mail-temp/restore/user_name/cur/timestamp.recover.randomstring on the filesystem NFS mounted from the file server. This is also a transit storage.

Messages in transit must be protected for confidentiality: the short utility giveroot is changing the ownership of a file to be user 2 (daemon) and group 1000 (httpd). This utility has to run with set_uid root in order to be allowed to make the call to chown(2).

  $ ll /usr/local/sbin/giveroot 895944 6 -rwsr-xr-x 1 root www 5018 Jan 31 13:41 /usr/local/sbin/giveroot  
File mode for giveroot

Call to the courrier-imap server on the mail back-up server are made in a such a way the messages on the back-up are never modified, they are not flagged as seen or whatever. So tape archiving of the mail back-up machine can be done in a nice way: a message is archived on tape only once, there will not be any incremental archive for a given message as a given message will never change.

The new version of the web interface (September 2009) includes few improvements:

Delivery of the recovered message

The main mail server is in charge of delivering the recovered message to the users mailbox. A shell-script lists all the messages from /home/mail-temp/restore/user_name/cur/ and call procmail to deliver them to the user.

Messages are pushed with the enveloppe-from set to recovery@cs.ait.ac.th; this address is being used by the procmail recipe to make sure that recovered messages are not filtered by the anti-spam, and that they are not backed-up a second time.

  # email pushed from recovery :0 * ^From recovery@cs.ait.ac.th { EXCEPTION=1 # do not filter for spam NOBACKUP=1 # do not back-up }  
Procmail recipe used when pushing a recovered message to the user mailbox

Cleaning of the transit disk

The transit disk will keep directories for the users in /home/mail-temp/save/user_name and /home/mail-temp/restore/user_name. The directories are containing the three subdirectories tmp, new and cur that should be empty most of the time (unless there is a message in transit for back-up or for recovery). So the disk space used remains small (about 4 mega bytes for 200 users).

The scripts in charge of moving the transit messages cannot delete the transit user directories because a new email could be incoming while the directory is being deleted; the only safe time when the user transit directories can be removed is when the user account is removed from the system.

The directories for the user will be cleaned when and only when the user account is being deleted from the system, this is part of the general procedure that removes a user from the system.

Powered by: Procmail AMaViS

CSIM home pageWMailAccount managementCSIM LibraryNetwork test toolsSearch CSIM directories
Contact us: Olivier Nicole CSIM    SET    AIT Last update: Sep 2009