I did some searching in the mail archives and didn't see any discussion of integration with AWS, so I wanted to through out my thoughts/plans and see if it has been done before.
I am setting up my own personal website on EC2 along with an email server, and I really don't like the idea of using the disk drive as permanent mail storage. EBS is too small instance storage is ephermeral.
Looking over the docs, the dbox format seems most easily copied for my needs. http://wiki2.dovecot.org/MailboxFormat/dbox
To make life easy, I'll stick with just single-dbox as a start, however multi-dbox would be doable.
With dbox, the only thing that I need to change is the alternate storage model: "An upshot of the way alternate storage works is that any given storage file (mailboxes/<folder>/dbox-Mails/u.* (sdbox) or storage/m.* (mdbox)) can only appear *either* in the primary storage area *or* the alternate storage area but not both — if the corresponding file appears in both areas then there is an inconsistency."
First I want to add AWS S3 as a storage option for alternate storage.
Then instead of the above model, the new model would be that email is always stored in alternate storage, and may be in primary storage. So, when mail comes in, I'd have Dovecot save the email to the alternate storage S3 bucket and update the indexs and other information[ideally, for convenience purposes, a few bits of relevant indexing information can be stored as metadata in the S3 object - sufficient so that instead of retrieving the entire S3 object, just the meta data can be pulled to build indexes.
When a client attempts to retrieve an email message, Dovecot would check primary storage as it does now, if the message is not found than it will retrieve it from the alternate storage system AND store a copy in the primary storage.
Primary storage can be periodically purged, have quota's to keep it from growing too large, etc.
In this way, primary storage can be viewed as a message cache, just keeping the messages that are currently of interest, while S3 is the real data.
[Ideally, this can be expanded so that when a message comes in, in addition to storing a copy in S3, an AWS SNS notification can be issued so if multiple IMAP servers are running, they can all subscribe to the same SNS channel and update themselves as needed].
This give me unlimited disk storage at S3 prices, I would even like to be able to set a few options based on the folder, so I can enable versioning on important message folders, use the even cheaper reduced redundancy storage for archives, and set expiration dates on email in the trash and spam folders so S3 will automatically purge the messages after a month.
Secondly, I'd like to replace the Mysql database usage with a simpleDB database. While simpleDB lacks much of MySQL's sophistication, it doesn't seem that Dovecot is really using any of that, so simpleDB can be functionally equivalent.
The primary purpose of using simpleDB is that this way the entire Dovecot system can be ephermeral. When a properly configured dovecot AMI is launched, it will start up, pull it's config data from an S3 bucket, subscribe to the SNS channel for new updates, and then start the Dovecot server. It won't care if it is the only Dovecot server, or if there are 500 other servers running. They all share the same simpleDB database. Whenever any change is made that is relevant to server configuration, a notice is generated to SNS, and all the email is stored in S3.
As a starting point, I'm thinking the best place for me to start coding is the single-s3-dbox message store as it has the least moving parts[mainly just fix up the save function to run the way I need it to, and the retrieve function to make a local copy of any incoming email...additional metadata functionality can be added later].
Has anyone else been working on something similar?
-Gary