[Dovecot] Quota handling on NFS Maildir
Hi
Throughout the v. 1.1 betas I have been receiving casual reports that quota isn't handled correctly (running Maildir).
At the moment I'm running 1.1beta14 but this has been occurring in all other 1.1 betas as well (I don't know for sure whether this was also an issue in 1.0, but I think so). In some earlier betas the issue where worse than is is now (due to improved NFS locking I guess).
Apparently this occurs when the user shifts between pop3 and IMAP. maildirsize gets updated whenever mails are coming in, no problems there.
But mails removed aren't always counted as removed - as a result some users quota keep growing until they reach a limit and e-mails start getting rejected.
I have been doing some testing and it appears that the updating of the file maildirsize is sometimes delayed. I figure this is because Dovecot collects theese changes in the transactions or index file and then once in a while writes them to maildirsize.
Could there be a week spot in the code somewhere that allow for this kind of error when the use of POP3 and IMAP are mixed (I think this happens even when POP3 and IMAP access aren't concurrent)?
Both my index files and mail storage are located on NFS.
Regards, Mikkel
This is the output of dovecot -n:
# 1.1.beta14: /local/config/dovecot.conf Warning: fd limit 256 is lower than what Dovecot can use under full load (more than 768). Either grow the limit or change login_max_processes_count and max_mail_processes settings log_path: /local/log/dovecot.run info_log_path: /local/log/dovecot.run protocols: imap pop3 ssl_disable: yes disable_plaintext_auth: no login_dir: /opt/freeware/dovecot-1.1b14/var/run/dovecot/login login_executable(default): /opt/freeware/dovecot-1.1b14/libexec/dovecot/imap-login login_executable(imap): /opt/freeware/dovecot-1.1b14/libexec/dovecot/imap-login login_executable(pop3): /opt/freeware/dovecot-1.1b14/libexec/dovecot/pop3-login login_process_per_connection: no verbose_proctitle: yes first_valid_uid: 105 first_valid_gid: 105 mmap_disable: yes mail_nfs_storage: yes mail_nfs_index: yes mail_executable(default): /opt/freeware/dovecot-1.1b14/libexec/dovecot/imap mail_executable(imap): /opt/freeware/dovecot-1.1b14/libexec/dovecot/imap mail_executable(pop3): /opt/freeware/dovecot-1.1b14/libexec/dovecot/pop3 mail_plugins(default): quota imap_quota trash mail_plugins(imap): quota imap_quota trash mail_plugins(pop3): quota mail_plugin_dir(default): /opt/freeware/dovecot-1.1b14/lib/dovecot/imap mail_plugin_dir(imap): /opt/freeware/dovecot-1.1b14/lib/dovecot/imap mail_plugin_dir(pop3): /opt/freeware/dovecot-1.1b14/lib/dovecot/pop3 imap_client_workarounds(default): outlook-idle delay-newmail tb-extra-mailbox-sep imap_client_workarounds(imap): outlook-idle delay-newmail tb-extra-mailbox-sep imap_client_workarounds(pop3): pop3_client_workarounds(default): pop3_client_workarounds(imap): pop3_client_workarounds(pop3): outlook-no-nuls oe-ns-eoh auth default: mechanisms: plain login digest-md5 cram-md5 apop ntlm cache_size: 4096 cache_ttl: 120 passdb: driver: sql args: /local/config/dovecot-sql.conf userdb: driver: prefetch userdb: driver: sql args: /local/config/dovecot-sql.conf socket: type: listen client: path: /var/spool/postfix/private/auth mode: 432 user: postfix group: postfix master: path: /var/run/dovecot/auth-master mode: 384 user: vmail plugin: quota: maildir quota_rule2: Trash:storage=10M:messages=100 trash: /local/config/dovecot-trash.conf
On 2/21/2008, mikkel@euro123.dk (mikkel@euro123.dk) wrote:
At the moment I'm running 1.1beta14
I'm sure Timo appreciates any/all help he can get, but if you are going to go through the trouble of running the development beta versions, don't you think it would make sense to always use the latest - and especially, don't report problems on anything but the latest?
--
Best regards,
Charles
On 2/21/2008, mikkel@euro123.dk (mikkel@euro123.dk) wrote:
At the moment I'm running 1.1beta14
I'm sure Timo appreciates any/all help he can get, but if you are going to go through the trouble of running the development beta versions, don't you think it would make sense to always use the latest - and especially, don't report problems on anything but the latest?
Please!
Look at what I'm writing - this appears to have nothing to do with a specific beta. Also nothing appears to be committed to HG relating this issue since beta14.
... going to go through the trouble of running the development beta versions ...
No trouble in that regards, most of the 1.1betas have been stable enough for production and a lot better than 1.0 if performance is also considered an important parameter. But installing the latest one without waiting a little to see if anything comes up seems like a bad idea.
Regards, Mikkel
On Thu, 2008-02-21 at 11:03 +0100, mikkel@euro123.dk wrote:
I have been doing some testing and it appears that the updating of the file maildirsize is sometimes delayed.
When do you see this delaying?
I figure this is because Dovecot collects theese changes in the transactions or index file and then once in a while writes them to maildirsize.
POP3 is run in a single transaction, but the mails are expunged and maildirsize is updated only when QUIT is run. So you shouldn't be able to notice a situation where maildirsize doesn't match the maildir contents.
Could there be a week spot in the code somewhere that allow for this kind of error when the use of POP3 and IMAP are mixed (I think this happens even when POP3 and IMAP access aren't concurrent)?
Both IMAP and POP3 use the same code to access mailboxes and update quota, so I can't really think of anything.
Although it's of course possible to use different settings for POP3/IMAP.
mail_plugins(imap): quota imap_quota trash mail_plugins(pop3): quota
OK, both use quota plugin..
plugin: quota: maildir quota_rule2: Trash:storage=10M:messages=100
I guess quota_rule comes from userdb? Is it the same for both imap/pop3? Although that shouldn't matter since the maildirsize should be updated in any case..
On Thu, 2008-02-21 at 11:03 +0100, mikkel@euro123.dk wrote:
I have been doing some testing and it appears that the updating of the file maildirsize is sometimes delayed.
When do you see this delaying?
When I telnet manually to pop3 and imap and type commands to delete certain messages, while keeping an eye on the maildirsize file. This isn't always consistent, but it appears that the updates to maildirsize are grouped together before being committed.
I figure this is because Dovecot collects theese changes in the transactions or index file and then once in a while writes them to maildirsize.
POP3 is run in a single transaction, but the mails are expunged and maildirsize is updated only when QUIT is run. So you shouldn't be able to notice a situation where maildirsize doesn't match the maildir contents.
So what happens if QUIT is never run? If the the connection is broken before ending properly? Does the IMAP connection also have to be terminated properly before the updates are written?
This may be the cause of the issue since some users in a production environment will always break the connection (loss of internet connectivity, the client program crashing or just generally badly behaving e-mail clients).
Both IMAP and POP3 use the same code to access mailboxes and update quota, so I can't really think of anything.
Although it's of course possible to use different settings for POP3/IMAP.
mail_plugins(imap): quota imap_quota trash mail_plugins(pop3): quota
OK, both use quota plugin..
Yes and I know it works. When executing the pop3 remove commands myself I can see that changes are actually written to maildirsize.
plugin: quota: maildir quota_rule2: Trash:storage=10M:messages=100
I guess quota_rule comes from userdb? Is it the same for both imap/pop3? Although that shouldn't matter since the maildirsize should be updated in any case..
The queries are exactly alike for POP3 and IMAP.
Thanks for looking into this.
Regards, Mikkel
On Feb 22, 2008, at 11:26 AM, mikkel@euro123.dk wrote:
On Thu, 2008-02-21 at 11:03 +0100, mikkel@euro123.dk wrote:
I have been doing some testing and it appears that the updating of
the file maildirsize is sometimes delayed.When do you see this delaying?
When I telnet manually to pop3 and imap and type commands to delete certain messages, while keeping an eye on the maildirsize file. This isn't always consistent, but it appears that the updates to maildirsize are grouped together before being committed.
DELE with POP3 updates only process's internal state. QUIT expunges +
updates maildirsize.
STORE +FLAGS \Deleted updates message flags. EXPUNGE expnges + updates
maildirsize.
Do you mean that you can see a situation with QUIT or EXPUNGE command
when maildir's contents don't match maildirsize file? Or if you mean
only setting the deleted flags, that's normal then because nothing
really gets deleted yet anyway.
POP3 is run in a single transaction, but the mails are expunged and maildirsize is updated only when QUIT is run. So you shouldn't be
able to notice a situation where maildirsize doesn't match the maildir contents.So what happens if QUIT is never run? If the the connection is broken before ending properly?
Nothing gets deleted.
Does the IMAP connection also have to be terminated properly before
the updates are written?
No, but only EXPUNGE and CLOSE commands remove the messages marked as
deleted.
This may be the cause of the issue since some users in a production environment will always break the connection (loss of internet connectivity, the client program crashing or just generally badly
behaving e-mail clients).
For POP3 maybe, but they'd probably get duplicate messages downloaded
then too, unless the client is smart with UIDLs.
With IMAP could it be just that users have marked messages deleted and
their client hides them, but the messages are never expunged? Or that
the messages have been moved to Trash mailbox..
On Feb 22, 2008, at 11:26 AM, mikkel@euro123.dk wrote: Do you mean that you can see a situation with QUIT or EXPUNGE command when maildir's contents don't match maildirsize file? Or if you mean only setting the deleted flags, that's normal then because nothing really gets deleted yet anyway.
The delay wasn't long enough for me to physically test if the quota was our of sync meanwhile. My estimate is 5-10 seconds and it could be the system waiting for I/O or something. I just figured that maybe this was due to grouping of the transactions and thought that if so, then maybe it was possible that this code could fail in some situations.
This may be the cause of the issue since some users in a production environment will always break the connection (loss of internet connectivity, the client program crashing or just generally badly behaving e-mail clients).
For POP3 maybe, but they'd probably get duplicate messages downloaded then too, unless the client is smart with UIDLs.
I can see your point. If your coding does'n not allow for situations where dovecot gets interrupted in between doing the actual delete and update the maildirsize then I have no idea what happens. I just figured that maybe there was a week spot somewhere that could break in certain situations (like NFS locking troubles or something) :)
With IMAP could it be just that users have marked messages deleted and their client hides them, but the messages are never expunged? Or that the messages have been moved to Trash mailbox..
When I "du" the user's storage I can see that there is a lot less data than maildirsize claims (so this isn't due to hidden mails or mails in Trash).
Apparently the large majority of users have no problems whatsoever while e few specific users experience this every two weeks on average (when they finally reach the upper quota limit and report the error). The solution for now is to remove the file maildirsize so the quota will be recounted.
The way this happens it seems to me that it's not just a random NFS locking error but actually a bug somewhere and that some users manage to always trigger the bug (which seems to be triggered sometimes when pop3 is used) while others do not. But it's pretty difficult to get close to the cause.
Does Dovecot actually check whether updating the maildirsize is successful or not after calling the operations (e.g. what happens if the code is unable to read from or write to maildirsize)?
Regards, Mikkel
On Fri, 2008-02-22 at 11:52 +0100, mikkel@euro123.dk wrote:
The way this happens it seems to me that it's not just a random NFS locking error but actually a bug somewhere and that some users manage to always trigger the bug (which seems to be triggered sometimes when pop3 is used) while others do not. But it's pretty difficult to get close to the cause.
Does Dovecot actually check whether updating the maildirsize is successful or not after calling the operations (e.g. what happens if the code is unable to read from or write to maildirsize)?
maildirsize updating works by just appending a value in it. This isn't entirely NFS safe and Dovecot doesn't bother verifying that two processes didn't write to it at the same time overwriting each others. Perhaps it should and if it detects that it would rebuild the file.
But this shouldn't normally be a problem. If a user is over quota and either
a) maildirsize file contains more than just the summary line b) the file is older than 15 minutes
The maildirsize gets recalculated. So even if Dovecot completely screws up updating the maildirsize file, users shouldn't see "quota exceeded" errors unless the recalculation is also broken.
Is anything else than Dovecot delivering mails to the maildir?
On Fri, 2008-02-22 at 11:52 +0100, mikkel@euro123.dk wrote:
The maildirsize gets recalculated. So even if Dovecot completely screws up updating the maildirsize file, users shouldn't see "quota exceeded" errors unless the recalculation is also broken.
Is anything else than Dovecot delivering mails to the maildir?
Nothing else is accessing the Maildir.
A few days ago I changed the LDA conf to use INDEX=MEMORY in order to make some performance tests. Besides actually working faster in my setup this also made the quota problems go away.
I only made this change for the LDA. IMAP and POP3 are still using index files normally (on NFS).
The quota issue was turning into quite an annoyance since some users had a lot of trouble with this. So I'm glad this works for me even though the problem isn't really solved.
Maybe the LDA was locking for a long time while updating indexes. And maybe this resulted in the POP process being unable to access the Maildir file if the user was checking mail at the same time of a delivery? Now that I think of it I was regularly seeing files like ".nfsC164601" left in the maildir and I assume these are related to locking?
Just guessing here but it's kind of odd why it works now if it's not locking related.
Regards, Mikkel
participants (3)
-
Charles Marcus
-
mikkel@euro123.dk
-
Timo Sirainen