[Dovecot] Using a namespace for providing access to mail snapshots for user based on-demand restoration of email backups
Hi all,
I'm planning on implementing this in my new upcoming dovecot instance, and would like to hear thoughts on how best to accomplish this. We will be paying Timo's support company to do the work, but obviously, the less work in the form of coding he has to do to get this working (I'm hoping it won't be a lot), the more money it will save us... ;)
First - I currently use rsnapshot to backup emails, so that is the use-case I'm most interested in getting working. It is rsync based, and like other rsync based backup programs it uses hardlinks to save storage space - so you can have a *lot* of backups (going back months, or even years), where each snapshot only adds a little more to the total disk space being used.
The snapshots are stored with the following filesystem layout:
/path/to/snapshotsdir/hourly.0 ... /path/to/snapshotsdir/hourly.4 /path/to/snapshotsdir/daily.0 ... /path/to/snapshotsdir/daily.7 /path/to/snapshotsdir/weekly.0 ... /path/to/snapshotsdir/weekly.4 /path/to/snapshotsdir/monthly.0 ... /path/to/snapshotsdir/monthly.12 /path/to/snapshotsdir/yearly.0 ... /path/to/snapshotsdir/yearly.5
The 'names' (hourly, daily, weekly, monthly, yearly) are arbitrary (this is a bit confusing to people new to rsnapshot), and would *not* be used for displaying the mail folders to the users - it is the Date/Time stamps of each of the snapshot dirs above that would be used to display the folder names under the 'Time Machine' namespace. This is, I imagine, the part that will need some actual coding by Timo to get working - maybe just some new config variables added to the namespace code for mapping the date/time stamps of the directories to user friendly folder names in the namespace.
That said, I'd like to design this and have it coded such that it will work with almost any type of backup storage that stores snapshots as date/time stamped directories like this (there must be others, right?).
Also, it goes without saying that this code will be (if Timo is ok with it) part of the core dovecot code going forward, so anyone else will be able to benefit from it.
What I'm envisioning is something like this...
Define a namespace - for this example we'll call it 'Time Machine'
Under this namespace, each user will see their, and *only* their snapshots
So, each user would see something like this:
My Mail Account Inbox Drafts Templates Sent Time Machine (sorted above user created folders if possible) -4/3/12, 8:00am (first subfolder) Inbox Drafts etc... (all other folders and sub-folders shown here) +4/3/12, 12:00pm (first subfolder) etc... Other User Folders ...
Or even better, I'm thinking some magical code that can group them by Date, like:
-4/3/12 (first subfolder)
-8:00am (next sub-folder)
Inbox
Drafts
Etc... (all folders and sub-folders shown here)
+12:00pm
+4:00pm
+8:00pm
+4/4/12
etc...
Comments? Suggestions? Flames?
--
Best regards,
Charles
On 05-04-12 17:28, Charles Marcus wrote:
Hi all,
I'm planning on implementing this in my new upcoming dovecot instance, and would like to hear thoughts on how best to accomplish this. We will be paying Timo's support company to do the work, but obviously, the less work in the form of coding he has to do to get this working (I'm hoping it won't be a lot), the more money it will save us... ;)
First - I currently use rsnapshot to backup emails, so that is the use-case I'm most interested in getting working. It is rsync based, and like other rsync based backup programs it uses hardlinks to save storage space - so you can have a *lot* of backups (going back months, or even years), where each snapshot only adds a little more to the total disk space being used.
<snip>
What I'm envisioning is something like this...
Define a namespace - for this example we'll call it 'Time Machine'
Under this namespace, each user will see their, and *only* their snapshots
So, each user would see something like this:
My Mail Account Inbox Drafts Templates Sent Time Machine (sorted above user created folders if possible) -4/3/12, 8:00am (first subfolder) Inbox Drafts etc... (all other folders and sub-folders shown here) +4/3/12, 12:00pm (first subfolder) etc... Other User Folders ...
Or even better, I'm thinking some magical code that can group them by Date, like:
-4/3/12 (first subfolder) -8:00am (next sub-folder) Inbox Drafts Etc... (all folders and sub-folders shown here) +12:00pm +4:00pm +8:00pm +4/4/12 etc...
Comments? Suggestions? Flames?
The first interesting point I'd see with this, is that you supply the mail client with a near endless supply of folders, which would take a lot of caching space on the clients end, either (depending on the client and its configuration) from the moment that you enable this fort hem, or after someone starts searching in their 'time machine' for some old mail.
I see my mail client on a new install working quite hard to download mail headers for 2 years of postfix/dovecot/etc mailing lists, so what happens if you provide a 'time machine' namespace going 1 month back, 4 with snapshots a day (i.e. 31x4 =~ 120 times more headers to download/index).
-- Tom
On 2012-04-05 12:37 PM, Tom Hendrikx tom@whyscream.net wrote:
The first interesting point I'd see with this, is that you supply the mail client with a near endless supply of folders, which would take a lot of caching space on the clients end, either (depending on the client and its configuration) from the moment that you enable this fort hem, or after someone starts searching in their 'time machine' for some old mail.
I see my mail client on a new install working quite hard to download mail headers for 2 years of postfix/dovecot/etc mailing lists, so what happens if you provide a 'time machine' namespace going 1 month back, 4 with snapshots a day (i.e. 31x4 =~ 120 times more headers to download/index).
Interesting and valid point... hmmmm.....
First, these folders would be read-only - a user could copy something from there back to one of his other folders, but couldn't write anything in them - so nothing would be changing under this namespace, except new snapshots magically appearing, which means that once they are indexed, the indexes would never need to be rebuilt (unless they got corrupted somehow).
But, yeah, I can imagine some problems especially if someone has a ton of email. And while these would probably only be accessed rarely, in those cases where someone would want to access them, they would very likely want to be able to search, so disabling indexes wouldn't be a good idea...
Since we use Thunderbird, I can of course disable offline mode for everyone, so the only time headers would be downloaded would be when the user selects (or performs a search on) one (or more) of the folders.
Maybe Timo can think of something creative to minimize this problem...
--
Best regards,
Charles
On 5.4.2012, at 20.02, Charles Marcus wrote:
On 2012-04-05 12:37 PM, Tom Hendrikx tom@whyscream.net wrote:
The first interesting point I'd see with this, is that you supply the mail client with a near endless supply of folders, which would take a lot of caching space on the clients end, either (depending on the client and its configuration) from the moment that you enable this fort hem, or after someone starts searching in their 'time machine' for some old mail.
Since we use Thunderbird, I can of course disable offline mode for everyone, so the only time headers would be downloaded would be when the user selects (or performs a search on) one (or more) of the folders.
Do they need to be accessible via Thunderbird, or maybe only via a webmail? Or perhaps a secondary (normally disabled?) TB account where you've specified a "backup/" namespace prefix (which is normally hidden)?
On 5.4.2012, at 18.28, Charles Marcus wrote:
The snapshots are stored with the following filesystem layout:
/path/to/snapshotsdir/hourly.0 ... /path/to/snapshotsdir/hourly.4 /path/to/snapshotsdir/daily.0 .. The 'names' (hourly, daily, weekly, monthly, yearly) are arbitrary (this is a bit confusing to people new to rsnapshot), and would *not* be used for displaying the mail folders to the users - it is the Date/Time stamps of each of the snapshot dirs above that would be used to display the folder names under the 'Time Machine' namespace. This is, I imagine, the part that will need some actual coding by Timo to get working - maybe just some new config variables added to the namespace code for mapping the date/time stamps of the directories to user friendly folder names in the namespace.
I guess there could be kind of a "filter fs layout" that modifies the filesystem layout a bit and lets the underlying layout handle the rest:
namespace { location = maildir:/path/to/snapshotsdir:LAYOUT=timestamp }
Although it's annoying that it's not possible to have per-layout settings currently.. But I guess if this was implemented as plugin it would be enough to have:
plugin { timestamp_layout = maildir++ }
participants (3)
-
Charles Marcus
-
Timo Sirainen
-
Tom Hendrikx