[Dovecot] Integrating Dovecot with Amazon Web Services

Gary Mort garyamort at gmail.com
Thu Jun 28 17:43:29 EEST 2012


I did some searching in the mail archives and didn't see any discussion of
integration with AWS, so I wanted to through out my thoughts/plans and see
if it has been done before.

I am setting up my own personal website on EC2 along with an email server,
and I really don't like the idea of using the disk drive as permanent mail
storage.  EBS is too small instance storage is ephermeral.

Looking over the docs, the dbox format seems most easily copied for my
needs.
http://wiki2.dovecot.org/MailboxFormat/dbox

To make life easy, I'll stick with just single-dbox as a start, however
multi-dbox would be doable.

With dbox, the only thing that I need to change is the alternate storage
model:
"An upshot of the way alternate storage works is that any given storage
file (mailboxes/<folder>/dbox-Mails/u.* (sdbox) or storage/m.* (mdbox)) can
only appear *either* in the primary storage area *or* the alternate storage
area but not both — if the corresponding file appears in both areas then
there is an inconsistency."

First I want to add AWS S3 as a storage option for alternate storage.

Then instead of the above model, the new model would be that email is
always stored in alternate storage, and may be in primary storage.  So,
when mail comes in, I'd have Dovecot save the email to the alternate
storage S3 bucket and update the indexs and other information[ideally, for
convenience purposes, a few bits of relevant indexing information can be
stored as metadata in the S3 object  - sufficient so that instead of
retrieving the entire S3 object, just the meta data can be pulled to build
indexes.

When a client attempts to retrieve an email message, Dovecot would check
primary storage as it does now, if the message is not found than it will
retrieve it from the alternate storage system AND store a copy in the
primary storage.

Primary storage can be periodically purged, have quota's to keep it from
growing too large, etc.

In this way, primary storage can be viewed as a message cache, just keeping
the messages that are currently of interest, while S3 is the real data.

[Ideally, this can be expanded so that when a message comes in, in addition
to storing a copy in S3, an AWS SNS notification can be issued so if
multiple IMAP servers are running, they can all subscribe to the same SNS
channel and update themselves as needed].

This give me unlimited disk storage at S3 prices, I would even like to be
able to set a few options based on the folder, so I can enable versioning
on important message folders, use the even cheaper reduced redundancy
storage for archives, and set expiration dates on email in the trash and
spam folders so S3 will automatically purge the messages after a month.


Secondly, I'd like to replace the Mysql database usage with a simpleDB
database.  While simpleDB lacks much of MySQL's sophistication, it doesn't
seem that Dovecot is really using any of that, so simpleDB can be
functionally equivalent.

The primary purpose of using simpleDB is that this way the entire Dovecot
system can be ephermeral.   When a properly configured dovecot AMI is
launched, it will start up, pull it's config data from an S3 bucket,
subscribe to the SNS channel for new updates, and then start the Dovecot
server.  It won't care if it is the only Dovecot server, or if there are
500 other servers running.  They all share the same simpleDB database.
 Whenever any change is made that is relevant to server configuration, a
notice is generated to SNS, and all the email is stored in S3.


As a starting point, I'm thinking the best place for me to start coding is
the single-s3-dbox message store as it has the least moving parts[mainly
just fix up the save function to run the way I need it to, and the retrieve
function to make a local copy of any incoming email...additional metadata
functionality can be added later].

Has anyone else been working on something similar?

-Gary


More information about the dovecot mailing list