Re: [Dovecot] Integrating Dovecot with Amazon Web Services

28 Jun 2012


      On Thu, Jun 28, 2012 at 1:14 PM, Timo Sirainen <tss@iki.fi> wrote:
...
On 28.6.2012, at 17.43, Gary Mort wrote:
...
First I want to add AWS S3 as a storage option for alternate storage.
Then instead of the above model, the new model would be that email is
always stored in alternate storage, and may be in primary storage.  So,
when mail comes in, I'd have Dovecot save the email to the alternate
storage S3 bucket and update the indexs and other information[ideally,
for
convenience purposes, a few bits of relevant indexing information can be
stored as metadata in the S3 object  - sufficient so that instead of
retrieving the entire S3 object, just the meta data can be pulled to
build
indexes.
The indexes have to be in primary storage.
True, but the data they are based on I'm assuming does not include the full
email message, just a few key pieces:
uniqueid, subject, from, to, etc.
For an always running server, the indexes are always up to date in primary.
For a server starting up with no index data, it will need to rebuild the
index information[or for a second server running when new email has been
delivered].
As such, rather then download every single email message just for a few
bits of key info, I can run a re-index process to pull just the meta
information and grab the data from there.
...
...
When a client attempts to retrieve an email message, Dovecot would check
primary storage as it does now, if the message is not found than it will
retrieve it from the alternate storage system AND store a copy in the
primary storage.
I think the storing wouldn't be very useful. Most clients download the
message once. There's no reason to cache it if it doesn't get downloaded
again. The way it should work that new mails are immediately delivered to
both primary and alt storage.
I've got tons of space - so I don't mind having 750MB or so for primary
email message storage.   If I can track how many times a message was
actually read, over time I can get an idea of how I use it and setup the
primary storage purge rules accordingly.
...
...
Secondly, I'd like to replace the Mysql database usage with a simpleDB
database.  While simpleDB lacks much of MySQL's sophistication, it
doesn't
seem that Dovecot is really using any of that, so simpleDB can be
functionally equivalent.
Dovecot will probably get Redis and/or memcache backend for passdb+userdb.
If simpledb is similar key-value database I guess the same code could be
used partially.
simpleDB is more like SQLLITE:
"Amazon SimpleDB is a highly available and flexible non-relational data
store that offloads the work of database administration. Developers simply
store and query data items via web services requests and Amazon SimpleDB
does the rest."
http://aws.amazon.com/simpledb/
Data model:
http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/DataM...
Domain == Table
Item == row
ItemName == primary key
Attributes == column
Value == data in column[multi value, so there can be multiple values for an
attribute of an item]
There is no built in key relationship between data, it's just one big flat
table.   Columns/Attributes only have 2 types, string or integer
You query the data like an SQL table:
http://docs.amazonwebservices.com/AmazonSimpleDB/latest/DeveloperGuide/Using...
Because there are no dates, it's best to store dates as UTC timestamps
which are integers and can then be compared against numerically.
The datastore is spread over multiple Amazon data servers and can take up
to a second to sync, so there are two methods of querying the data.
Default: eventually consistent read: get the data quickly
Optional: consistent read: check /all/ datastores and get the latest data
Since the data in simpleDB may not be updated frequently, a simple hack
using the notification system could be:
Before updating simpleDB send SNS notice that the data is being updated and
where[domain, user, config]
Update Data
After updating simpleDB send SNS notice that the update is complete
Other servers running can record data updating notices in memory and expire
them in about 15 seconds.   For any queries they want to make for that type
of data in the next 15 seconds, they will use consistent read.
The nice thing about using S3 and simpleDB is that I can completely skip a
lot of steps in replication/distributed services as it is all handled
already.  And one can always take one set of api calls and substitute
another for a different notification system, distributed database, and
cloud file storage.