[Dovecot] Zero-sized Maildir files and index corruption on over-quota

newer
[Dovecot] Issues when compiling...

Edgar Fuß

5 Oct 2010 5 Oct '10

4:29 p.m.

We are using Postfix as an MTA delivering via Dovecot's LDA (with sieve). We also use Dovecot as a POP/IMAP server. Mail storage is Maildir on NFS, indexes are stored locally. Quotas are FS quotas enforced by the NFS server. The Dovecot version is 1.2.11.

Recently, for one user being over quota, some attempts to deliver mail bounced as expected while some others resulted in zero-sized Maildir files. I can verify zero-sized files in cur, while I cannot tell whether they were already zero-sized in new. However, since nobody except Dovecot is accessing the mail storage, it's either Dovecot's LDA or Dovecot's IMAP producing the zero-sized files.

Logs show successful delivery (deliver: stored mail into mailbox 'INBOX') but ,,Corrupted index cache file /var/db/dovecot/indexes/XXX/.INBOX/dovecot.index.cache: Corrupted virtual size for uid=YYY: 0 != NNNN'' alerts on IMAP access.

It looks like all mails which correctly got bounced are >64KB while those resulting in zero-sized files are <64KB (there is one >32KB). My first guess would be that maybe for large files, write() fails, while for small ones, close() fails or there is some buffering involved for large files.

Show replies by date

Eric Shubert

5 Oct 5 Oct

7:02 p.m.

Edgar Fuß wrote:

...

We are using Postfix as an MTA delivering via Dovecot's LDA (with sieve). We also use Dovecot as a POP/IMAP server. Mail storage is Maildir on NFS, indexes are stored locally. Quotas are FS quotas enforced by the NFS server. The Dovecot version is 1.2.11.

Recently, for one user being over quota, some attempts to deliver mail bounced as expected while some others resulted in zero-sized Maildir files. I can verify zero-sized files in cur, while I cannot tell whether they were already zero-sized in new. However, since nobody except Dovecot is accessing the mail storage, it's either Dovecot's LDA or Dovecot's IMAP producing the zero-sized files.

Logs show successful delivery (deliver: stored mail into mailbox 'INBOX') but ,,Corrupted index cache file /var/db/dovecot/indexes/XXX/.INBOX/dovecot.index.cache: Corrupted virtual size for uid=YYY: 0 != NNNN'' alerts on IMAP access.

It looks like all mails which correctly got bounced are >64KB while those resulting in zero-sized files are <64KB (there is one >32KB). My first guess would be that maybe for large files, write() fails, while for small ones, close() fails or there is some buffering involved for large files.

Are your NFS Maildirs mounted on more than one server? Check the list archive for "nfs director". There has been a good bit of discussion about this recently. Cor Bosman explain the situation nicely here: http://www.mail-archive.com/dovecot@dovecot.org/msg31713.html

-- -Eric 'shubes'

Edgar Fuß

7:16 p.m.

New subject: [Dovecot] Zero-sized Maildir files and index corruption on over-quota

...

Are your NFS Maildirs mounted on more than one server? Yes.

...

Check the list archive for "nfs director". I do read the list.

...

There has been a good bit of discussion about this recently. As I recall, this is about index corruption. What we face is not index corruption, but zero-length data files. Of course, for dovecot's program logic, this is an index corruption in the sense that the length recorded in the index (which is in fact correct) doesn't match the file size (because the file has been truncated to zero length).

...

Cor Bosman explain the situation nicely here: http://www.mail-archive.com/dovecot@dovecot.org/msg31713.html Yes, thanks, I already read that when it came up on the list.

Timo Sirainen

14 Oct 14 Oct

7:48 p.m.

New subject: [Dovecot] Zero-sized Maildir files and index corruption on over-quota

On Tue, 2010-10-05 at 15:29 +0200, Edgar Fuß wrote:

...

It looks like all mails which correctly got bounced are >64KB while those resulting in zero-sized files are <64KB (there is one >32KB). My first guess would be that maybe for large files, write() fails, while for small ones, close() fails or there is some buffering involved for large files.

Even if close() fails, the file is still in tmp/ and Dovecot aborts the save. The only reason I can think of is that you've done async NFS mount and the failed write() is noticed way too late by the kernel.

Edgar Fuß

15 Oct 15 Oct

12:29 p.m.

New subject: [Dovecot] Zero-sized Maildir files (and alledged index corruption) on over-quota

...

Even if close() fails, the file is still in tmp/ and Dovecot aborts the save. I don't know whether the NFS server behaves that way (I would have to try out), but how would dovecot's LDA behave if, for a small message, upon the write (to tmp) the write() succeeds and the close() (which would actually try to flush the buffered data to disc) fails?

...

The only reason I can think of is that you've done async NFS mount and the failed write() is noticed way too late by the kernel. Would you consider "upon close()" as "way too late"?

Timo Sirainen

2:05 p.m.

New subject: [Dovecot] Zero-sized Maildir files (and alledged index corruption) on over-quota

On 15.10.2010, at 10.29, Edgar Fuß wrote:

...

...
Even if close() fails, the file is still in tmp/ and Dovecot aborts the save. I don't know whether the NFS server behaves that way (I would have to try out), but how would dovecot's LDA behave if, for a small message, upon the write (to tmp) the write() succeeds and the close() (which would actually try to flush the buffered data to disc) fails?

It would log an error and delete the tmp file.

...

...
The only reason I can think of is that you've done async NFS mount and the failed write() is noticed way too late by the kernel. Would you consider "upon close()" as "way too late"?

Nope. I mean if it fails after close().

Edgar Fuß

7:03 p.m.

New subject: [Dovecot] Zero-sized Maildir files (and alledged index corruption) on over-quota

...

It would log an error and delete the tmp file. Excellent. Does this also apply if sieve is involved?

Timo Sirainen

8:10 p.m.

New subject: [Dovecot] Zero-sized Maildir files (and alledged index corruption) on over-quota

On Fri, 2010-10-15 at 18:03 +0200, Edgar Fuß wrote:

...

...
It would log an error and delete the tmp file. Excellent. Does this also apply if sieve is involved?

Yes. Sieve uses the same functions for saving mails as the rest of Dovecot.

Edgar Fuß

19 Oct 19 Oct

2:42 p.m.

New subject: [Dovecot] Zero-sized Maildir files on over-quota

...

Yes. Sieve uses the same functions for saving mails as the rest of Dovecot. It happened again, so I have more input:

I have several of these: Oct 9 00:41:22 xxx dovecot: deliver(xxxxxx): fdatasync(/import/mail/xxxxxx/dovecot-uidlist) failed: Disc quota exceeded Oct 9 00:41:22 xxx dovecot: deliver(xxxxxx): sieve: msgid=<XXXXX>: stored mail into mailbox 'INBOX'

I also have this one: Oct 6 18:14:47 xxxx dovecot: IMAP(xxxxxx): Broken file /import/mail/xxxxxx/.Trash/dovecot-uidlist line 4: UIDs not ordered (8171 >= 8172) Oct 6 18:14:47 xxxx dovecot: IMAP(xxxxxx): fdatasync(/import/mail/xxxxxx/.Trash/dovecot-uidlist.tmp) failed: Disc quota exceeded Oct 6 18:14:47 xxxx dovecot: IMAP(xxxxxx): Panic: file maildir-save.c: line 653 (maildir_transaction_save_commit_pre_sync): assertion failed: (first_uid != 0) Oct 6 18:14:47 xxxx dovecot: dovecot: child 15337 (imap) killed with signal 6 (core not dumped - set mail_drop_priv_before_exec=yes)

Hope that helps tracking down the bug.

Timo Sirainen

3:59 p.m.

New subject: [Dovecot] Zero-sized Maildir files on over-quota

On Tue, 2010-10-19 at 13:42 +0200, Edgar Fuß wrote:

...

...
Yes. Sieve uses the same functions for saving mails as the rest of Dovecot. It happened again, so I have more input:

I have several of these: Oct 9 00:41:22 xxx dovecot: deliver(xxxxxx): fdatasync(/import/mail/xxxxxx/dovecot-uidlist) failed: Disc quota exceeded

These are about errors writing to uidlist. It's not very good to keep these files in quota tracked filesystem, but they're not causing the zero byte files either.

...

Hope that helps tracking down the bug.

I'm still thinking this has more to do with your system than anything in Dovecot. But you could try if the attached patch catches those zero byte files by logging about them.

Edgar Fuß

28 Jun 28 Jun

3:51 p.m.

New subject: [Dovecot] Zero-sized Maildir files on over-quota

...

I'm still thinking this has more to do with your system than anything in Dovecot. Yes. We were finally able to reliably reproduce this. It looks like a strange bug in NetBSD's NFS: If you are over quota and write a small amount (probably less tha an NFS block) of data, the write() call will succeed and a following fsync() (or close()) will fail. However, when you insert a utimes() or futimes() call after the write(), the fsync()/close() will report success. I didn't dig into when Dovecot does that utimes() call, but it sometimes does and sometimes doesn't, explaining our problems to reproduce the bug.

4403

Age (days ago)

5400

Last active (days ago)

List overview

10 comments

3 participants

participants (3)

Edgar Fuß
Eric Shubert
Timo Sirainen