What's a Reasonable Inbox Size?
Greetings,
I have several users who have inboxes that are over 20 GB.
Lately I have noticed Dovecot logs say it's taking over 30 seconds to sync their mailboxes.
As email admins, how do you handle inboxes that are so large? Do you use mailbox types that have better performance like dbox? We're using maildir.
What's a reasonable inbox size? Is 20+ GB reasonable and nothing to worry about?
Thanks for any insight here.
-- Asai7
On 5/7/2020 11:39 AM, Asai wrote:
What's a reasonable inbox size? Is 20+ GB reasonable and nothing to worry about?
Great question.
At my firm, we wrote rotation tools that work for mbox format to rotate inboxes monthly if they are over a certain size. We also do the sent items folders.
We find that large inboxes are bad for the server and bad for the client because the MUAs just don't handle it well either. 1 or 2GBs and you start to see issues.
After a little bit of user training, they like it. Part of the routine maintenance they need.
Regards, KAM
Thanks for your response,
So, how do those rotation scripts work in concept?
People are still able to access their old inboxes, but it just moves them to an archived state?
On 5/7/2020 9:40 AM, Kevin A. McGrail wrote:
On 5/7/2020 11:39 AM, Asai wrote:
What's a reasonable inbox size? Is 20+ GB reasonable and nothing to worry about?
Great question.
At my firm, we wrote rotation tools that work for mbox format to rotate inboxes monthly if they are over a certain size. We also do the sent items folders.
We find that large inboxes are bad for the server and bad for the client because the MUAs just don't handle it well either. 1 or 2GBs and you start to see issues.
After a little bit of user training, they like it. Part of the routine maintenance they need.
Regards, KAM
On 5/7/2020 12:43 PM, Asai wrote:
Thanks for your response,
So, how do those rotation scripts work in concept?
People are still able to access their old inboxes, but it just moves them to an archived state?
We rotate the folder to another name with the date like INBOX-2020-05-07 with instructions how to refresh their folder list (or even modify the .subscription file for the).
We also cull Trash, deleted items, and spam folders automatically as well.
Regards, KAM
We rotate the folder to another name with the date like INBOX-2020-05-07 with instructions how to refresh their folder list (or even modify the .subscription file for the).
We also cull Trash, deleted items, and spam folders automatically as well.
Regards, KAM
That makes sense. So you're saying that very large inboxes are generally bad for mobile devices? How are they bad for the servers?
Asai
On 2020-05-07 18:56, Asai wrote:
That makes sense. So you're saying that very large inboxes are generally bad for mobile devices? How are they bad for the servers?
aquamail have local cache upto 10000 last emails, and it does not fetch more then that, so basicly if all android users have all mail in inbox and useds aquamail it will work, i have over 300000 emails around 7GB totaly, if all this was in inbox will bring down thunderbird, since it fetches all 300000 email to offline stata as default, this also happens with outlook, sadly
in roundcube i make all mails into mountly folders with brings down amount of emails into lower subdirs on dovecot, with will then be speedy
it would be nice to know how to generic speed it up to google standards, virtual folder trick ?
On 5/7/2020 12:56 PM, Asai wrote:
That makes sense. So you're saying that very large inboxes are generally bad for mobile devices? How are they bad for the servers?
I'm saying that large inboxes can be generally bad for MUAs including mobile and desktop. Their just downloading and constantly grinding through data that isn't really being used. Same thing for the server.
If you have an inbox with 10K emails, are you reading them all? No, but you are likely checking the inbox every 5 mins.
And if you have a 20GB INBOX and you download it on mobile, depending on the MUA and settings, you might be just using a ton of cell data and storage on your phone.
Dovecot, of course, has intelligent caching and that helps a ton but overall it's an efficiency thing. Though not that long ago, it was also a way of keeping some MUAs like Outlook from crashing when the PST / OST file got too large.
Regards,
KAM
On 07 May 2020, at 10:40, Kevin A. McGrail KMcGrail@pccc.com wrote:
We find that large inboxes are bad for the server and bad for the client because the MUAs just don't handle it well either. 1 or 2GBs and you start to see issues.
Which is a good reason to move off mbox. There are several other much superior choices and as far as I know there is no reason to use mbox.
-- Minds are like parachutes, they only work when they are open.
On 7. May 2020, at 18.39, Asai asai@globalchangemusic.org wrote:
Greetings,
I have several users who have inboxes that are over 20 GB.
Lately I have noticed Dovecot logs say it's taking over 30 seconds to sync their mailboxes.
As email admins, how do you handle inboxes that are so large? Do you use mailbox types that have better performance like dbox? We're using maildir.
What's a reasonable inbox size? Is 20+ GB reasonable and nothing to worry about?
Thanks for any insight here.
It's more like huge number of files in single folder than the pure size of the mails combined. For filesystem based storage formats number of files is always a challenge. In your case mdbox probably would work better as it stores multiple mails in single file. The only downside is that it requires periodic purge operations to remove deleted mails from middle of the mail bundles.
Sami
On Thu, 7 May 2020, Asai wrote:
I have several users who have inboxes that are over 20 GB.
As email admins, how do you handle inboxes that are so large? Do you use mailbox types that have better performance like dbox? We're using maildir.
What's a reasonable inbox size? Is 20+ GB reasonable and nothing to worry about?
It depends on what you consider reasonable.
The processing time of file operation that iterates through a mailbox will generally go up proportinately with size. If you do a text search without some indexing system like Solr, it will take a very long time.
If the mailbox is just some archive that you pile up and forget about it except for once in a blue moon retrieval, then it might be reasonable.
If it's an active mailbox, it will be a pain to navigate, in the same way a single folder with 100K files or a file cabinet with huge stacks of envelopes.
I would guess some partioning of the large mailboxes into smaller mailboxes would help with active mailboxes. Most people spend most of their time on new/recent messages, so making time or size or subject based volmes wouldn't be a bad idea.
If the bulk of the size are redundant copies of attachments, then Dovecot's *dbox support de-duping which would aso help.
Joseph Tam jtam.home@gmail.com
It depends on what you consider reasonable.
The processing time of file operation that iterates through a mailbox will generally go up proportinately with size. If you do a text search without some indexing system like Solr, it will take a very long time.
If the mailbox is just some archive that you pile up and forget about it except for once in a blue moon retrieval, then it might be reasonable.
If it's an active mailbox, it will be a pain to navigate, in the same way a single folder with 100K files or a file cabinet with huge stacks of envelopes.
I would guess some partioning of the large mailboxes into smaller mailboxes would help with active mailboxes. Most people spend most of their time on new/recent messages, so making time or size or subject based volmes wouldn't be a bad idea.
If the bulk of the size are redundant copies of attachments, then Dovecot's *dbox support de-duping which would aso help.
So, generally speaking, you don't want to have inboxes that just sync all day long, due to massive amounts of small files in the inbox. This may be OK in the case of a rarely accessed archive folder, but not good for regularly accessed inboxes, etc.?
On Fri, 8 May 2020, asai@globalchangemusic.org wrote:
It depends on what you consider reasonable.
The processing time of file operation that iterates through a mailbox will generally go up proportinately with size. If you do a text search without some indexing system like Solr, it will take a very long time.
If the mailbox is just some archive that you pile up and forget about it except for once in a blue moon retrieval, then it might be reasonable.
If it's an active mailbox, it will be a pain to navigate, in the same way a single folder with 100K files or a file cabinet with huge stacks of envelopes.
I would guess some partioning of the large mailboxes into smaller mailboxes would help with active mailboxes. Most people spend most of their time on new/recent messages, so making time or size or subject based volmes wouldn't be a bad idea.
If the bulk of the size are redundant copies of attachments, then Dovecot's *dbox support de-duping which would aso help.
So, generally speaking, you don't want to have inboxes that just sync all day long, due to massive amounts of small files in the inbox. This may be OK in the case of a rarely accessed archive folder, but not good for regularly accessed inboxes, etc.?
Joseph Tam jtam.home@gmail.com
On Fri, 8 May 2020, Joseph Tam wrote:
It depends on what you consider reasonable.
Whoops. Editing error. What I wanted to send.
On Fri, 8 May 2020, asai@globalchangemusic.org wrote:
So, generally speaking, you don't want to have inboxes that just sync all day long, due to massive amounts of small files in the inbox.
I don't know enough about what is involved when your client tries to sync to comment on your particular situation. If the exchange of information involves only delta changes (e.g. list datum that have been added/removed since the last sync), and if this information is readily available in Dovecot's caches, then this operation might be optimized to take minimal time.
If however, it involves exchanging entire lists of many messages IDs, or worse, involves Dovecot accessing each message, it will result in large amounts of time spent in I/O (network, disk or both). With Maildir (many small message in a folder), this causes seeking all over the disk. Some filesystems (XFS?) may be better at this than others.
The description of your problem seems to suggest the latter, so breaking up gigantic mailboxes into manageable volumes will help.
If you really want to see what's going on when a client syncs, you can network trace, process trace, or use Dovecot's rawlog feature
https://wiki.dovecot.org/Debugging/Rawlog
to directly observe the iteraction between a server and client.
This may be OK in the case of a rarely accessed archive folder, but not good for regularly accessed inboxes, etc.?
This is not really so much technical advice as a rule of thumb: there's not a lot of payoff to optimizing rare operations.
Joseph Tam jtam.home@gmail.com
On 08 May 2020, at 12:54, asai@globalchangemusic.org wrote:
It depends on what you consider reasonable.
The processing time of file operation that iterates through a mailbox will generally go up proportinately with size. If you do a text search without some indexing system like Solr, it will take a very long time.
If the mailbox is just some archive that you pile up and forget about it except for once in a blue moon retrieval, then it might be reasonable.
If it's an active mailbox, it will be a pain to navigate, in the same way a single folder with 100K files or a file cabinet with huge stacks of envelopes.
I would guess some partioning of the large mailboxes into smaller mailboxes would help with active mailboxes. Most people spend most of their time on new/recent messages, so making time or size or subject based volmes wouldn't be a bad idea.
If the bulk of the size are redundant copies of attachments, then Dovecot's *dbox support de-duping which would aso help.
So, generally speaking, you don't want to have inboxes that just sync all day long, due to massive amounts of small files in the inbox. This may be OK in the case of a rarely accessed archive folder, but not good for regularly accessed inboxes, etc.?
Not really since most GUI clients keep all the folders synced, so moving files to different, smaller count mailboxes doesn’t reduce the number of files accessed.
The issue is if you have a folder with millions of files in it, most file systems don’t deal well with this.
But with mbox, each “folder” is a single file, and making a single multi-GB text file that has to be parsed is a definitely issue on any file system.
-- ALL WORK AND NO PLAY MAKES BART A DULL BOY ALL WORK AND NO PLAY MAKES BART A DULL BOY ALL WORK AND NO PLAY MAKES BART A DULL BOY Bart chalkboard Ep. 1F07
participants (7)
-
@lbutlr
-
Asai
-
asai@globalchangemusic.org
-
Benny Pedersen
-
Joseph Tam
-
Kevin A. McGrail
-
Sami Ketola