[Dovecot] Assertion Failure in mbox-sync.c
I've been getting a few of these errors on a couple different mboxes. This is using the CVS version as of May 14.
So far, these are the only errors and it looks like most/all of the older ones are gone. (maybe I shouldn't say that ;-)
dovecot: May 16 17:41:07 Error: 20973 IMAP(todd.bluegenesis.com): file mbox-sync.c: line 1165 (mbox_sync_handle_eof_updates): assertion failed: (file_size >= sync_ctx->file_input->v_offset) dovecot: May 16 17:41:10 Error: child 20973 (imap) killed with signal 6
(gdb) bt full #0 0xffffe410 in ?? () No symbol table info available. #1 0xbffff518 in ?? () No symbol table info available. #2 0x00000006 in ?? () No symbol table info available. #3 0x000051ed in ?? () No symbol table info available. #4 0xb7ef1d41 in raise () from /lib/tls/libc.so.6 No symbol table info available. #5 0xb7ef3529 in abort () from /lib/tls/libc.so.6 No symbol table info available. #6 0x0809911a in i_internal_panic_handler ( fmt=0x6
, args=0x51ed ) at failures.c:374 No locals. #7 0x08099534 in i_panic ( format=0x80a5ccc "file %s: line %d (%s): assertion failed: (%s)") at failures.c:173 No locals. #8 0x0806ef4b in mbox_sync (mbox=0x80c3ba8, flags=Variable "flags" is not available. ) at mbox-sync.c:1165 index_sync_ctx = Variable "index_sync_ctx" is not available.(gdb) bt #0 0xffffe410 in ?? () #1 0xbffff518 in ?? () #2 0x00000006 in ?? () #3 0x000051ed in ?? () #4 0xb7ef1d41 in raise () from /lib/tls/libc.so.6 #5 0xb7ef3529 in abort () from /lib/tls/libc.so.6 #6 0x0809911a in i_internal_panic_handler ( fmt=0x6
, args=0x51ed ) at failures.c:374 #7 0x08099534 in i_panic ( format=0x80a5ccc "file %s: line %d (%s): assertion failed: (%s)") at failures.c:173 #8 0x0806ef4b in mbox_sync (mbox=0x80c3ba8, flags=Variable "flags" is not available. ) at mbox-sync.c:1165 #9 0x08068891 in mbox_transaction_commit (_t=0x80cf430, flags=Variable "flags" is not available. ) at mbox-transaction.c:56 #10 0x0808d333 in mailbox_transaction_commit (t=0x0, flags=0) at mail-storage.c:418 #11 0x08058b56 in cmd_store (cmd=0x80bcb98) at cmd-store.c:122 #12 0x08059685 in _client_input (context=0x80bcb58) at client.c:334 #13 0x0809e5da in io_loop_handler_run (ioloop=0x80bb9b0) at ioloop-poll.c:184 #14 0x0809dcf9 in io_loop_run (ioloop=0x80bb9b0) at ioloop.c:218 #15 0x08060587 in main (argc=1, argv=0xbffffa44, envp=0xbffffa4c) at main.c:228I'm not sure about the variables here, the core file's still here if any other info will help:
(gdb) p sync_ctx->input->real_stream->statbuf $5 = {st_dev = 0, __pad1 = 0, __st_ino = 0, st_mode = 0, st_nlink = 0, st_uid = 0, st_gid = 0, st_rdev = 0, __pad2 = 0, st_size = 4294967295, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 1116279667, tv_nsec = 0}, st_mtim = {tv_sec = 1116279667, tv_nsec = 0}, st_ctim = { tv_sec = 1116279667, tv_nsec = 0}, st_ino = 0} (gdb) p sync_ctx->file_input->v_offset $8 = 142294527
The mbox size is larger than 142294527
I did notice this, but I'm not sure if I'm looking at the correct variable and don't think the size is large enough to matter, anyway.
(gdb) pt sync_ctx->input->real_stream->statbuf->st_size type = long long int (gdb) pt sync_ctx->file_input->v_offset type = long long unsigned int
Todd
On Mon, 2005-05-16 at 19:39 -0400, Todd Burroughs wrote:
dovecot: May 16 17:41:07 Error: 20973 IMAP(todd.bluegenesis.com): file mbox-sync.c: line 1165 (mbox_sync_handle_eof_updates): assertion failed: (file_size >= sync_ctx->file_input->v_offset) .. (gdb) p sync_ctx->input->real_stream->statbuf $5 = {st_dev = 0, __pad1 = 0, __st_ino = 0, st_mode = 0, st_nlink = 0, st_uid = 0, st_gid = 0, st_rdev = 0, __pad2 = 0, st_size = 4294967295, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 1116279667, tv_nsec = 0}, st_mtim = {tv_sec = 1116279667, tv_nsec = 0}, st_ctim = { tv_sec = 1116279667, tv_nsec = 0}, st_ino = 0}
sync_ctx->input is different than sync_ctx->file_input, where the statbuf was taken from (took a while to notice this :). Could you do the same for it?
(gdb) p sync_ctx->file_input->v_offset $8 = 142294527
The mbox size is larger than 142294527
But it crashes because fstat() returns a value smaller than that as the mbox's size. Maybe NFS returns some old size from cache? I don't really know how NFS caching usually works. If someone has any pointers (other than kernel sources :) it'd be helpful.
What OS are you using on NFS client side?
How about if you try running this little program, does it work:
/* gcc writetest.c -o writetest -Wall */
#include
int main(void) { int fd = open("writetest.tmp", O_RDWR | O_CREAT | O_TRUNC, 0600); char buf[1024]; struct stat st; int x, ret;
if (fd < 0) {
perror("open()");
return 1;
}
fstat(fd, &st);
for (x = 0; x < 100; x++) {
ret = write(fd, buf, sizeof(buf));
if (ret != sizeof(buf)) {
perror("write()");
return 1;
}
}
fstat(fd, &st);
printf("size again: %ld vs. %d\n", st.st_size, 100*sizeof(buf));
ret = write(fd, buf, sizeof(buf));
if (ret != sizeof(buf)) {
perror("write()");
return 1;
}
fstat(fd, &st);
printf("size finally: %ld vs. %d\n", st.st_size, 101*sizeof(buf));
return 0;
}
On Tue, 17 May 2005, Timo Sirainen wrote:
On Mon, 2005-05-16 at 19:39 -0400, Todd Burroughs wrote:
dovecot: May 16 17:41:07 Error: 20973 IMAP(todd.bluegenesis.com): file mbox-sync.c: line 1165 (mbox_sync_handle_eof_updates): assertion failed: (file_size >= sync_ctx->file_input->v_offset) .. (gdb) p sync_ctx->input->real_stream->statbuf $5 = {st_dev = 0, __pad1 = 0, __st_ino = 0, st_mode = 0, st_nlink = 0, st_uid = 0, st_gid = 0, st_rdev = 0, __pad2 = 0, st_size = 4294967295, st_blksize = 0, st_blocks = 0, st_atim = {tv_sec = 1116279667, tv_nsec = 0}, st_mtim = {tv_sec = 1116279667, tv_nsec = 0}, st_ctim = { tv_sec = 1116279667, tv_nsec = 0}, st_ino = 0}
sync_ctx->input is different than sync_ctx->file_input, where the statbuf was taken from (took a while to notice this :). Could you do the same for it?
There's an odd value in here... I checked and the values look correct, other than st_size.
(gdb) p sync_ctx->file_input->real_stream->statbuf $2 = {st_dev = 43, __pad1 = 0, __st_ino = 23252848, st_mode = 33152, st_nlink = 1, st_uid = 1022, st_gid = 60, st_rdev = 0, __pad2 = 0, st_size = 140737488355328, st_blksize = 32768, st_blocks = 278184, st_atim = {tv_sec = 1114557761, tv_nsec = 0}, st_mtim = { tv_sec = 1116279661, tv_nsec = 95645000}, st_ctim = {tv_sec = 1116279661, tv_nsec = 95645000}, st_ino = 23252848}
(gdb) p sync_ctx->file_input->v_offset $8 = 142294527
The mbox size is larger than 142294527
But it crashes because fstat() returns a value smaller than that as the mbox's size. Maybe NFS returns some old size from cache? I don't really know how NFS caching usually works. If someone has any pointers (other than kernel sources :) it'd be helpful.
What OS are you using on NFS client side?
We're using Linux with a 2.6.x kernel for the client and NetApp filers for the server.
How about if you try running this little program, does it work:
/* gcc writetest.c -o writetest -Wall */
Works perfectly. I tried looking at the file size from a different server while running the program and it also worked.
My mailbox should never get smaller, unless dovecot deletes some mail. It will grow with mail being added from different servers.
I don't think that NFS would return an older value, as there is one system wide cache per machine. It is possible to turn caching off, so I could test this. My understanding is that the client will cache attributes for some period of time, 3 to 60 seconds are the default min/max for Linux. I don't know how it determines how long it actually caches them for.
We have hundreds of NFS clients here and we don't run into many problems with file corruption, etc. Also, changes seem to get noticed very quickly. If I change something on a web site, it shows up immediately on other servers. Overall, NFS works quite well with our setup. It's more stable than I would expect ;-)
Todd
On 18.5.2005, at 03:00, Todd Burroughs wrote:
sync_ctx->input is different than sync_ctx->file_input, where the statbuf was taken from (took a while to notice this :). Could you do the same for it?
There's an odd value in here... I checked and the values look correct, other than st_size.
That's a gdb bug (or gcc debug information writer bug or something). 64bit numbers seem to be quite often broken with gdb. Usually they can be fixed by just discarding the highest 32 bits, but with struct stat the whole st_size seems to points to wrong location.
I don't think that NFS would return an older value, as there is one system wide cache per machine. It is possible to turn caching off, so I could test this. My understanding is that the client will cache attributes for some period of time, 3 to 60 seconds are the default min/max for Linux. I don't know how it determines how long it actually caches them for.
Right, I thought about this too after sending the mail. The caching problem could happen only if another computer modified the file, so the test program couldn't really trigger it.
But included a patch to Dovecot, it should give some more information now instead of just crashing.
On Tue, 24 May 2005, Timo Sirainen wrote:
On 18.5.2005, at 03:00, Todd Burroughs wrote:
That's a gdb bug (or gcc debug information writer bug or something). 64bit numbers seem to be quite often broken with gdb. Usually they can be fixed by just discarding the highest 32 bits, but with struct stat the whole st_size seems to points to wrong location.
Thanks, the gdb bug is good to know...
Right, I thought about this too after sending the mail. The caching problem could happen only if another computer modified the file, so the test program couldn't really trigger it.
I actually ran it on one machine and looked at the output from another. i.e., 'ls -l' on the empty file, then again just after running the program, all within a second. I don't know how the cache refresh timing works, but it did not cache the old attributes for the minumum of three seconds. Probably a good thing.
But included a patch to Dovecot, it should give some more information now instead of just crashing.
Found a small problem with the patch, but I think I've fixed it and we'll see what happens:
Compile errors:
mbox-sync.c: In function mbox_sync_handle_eof_updates': mbox-sync.c:1168: error: structure has no member named
ibox'
mbox-sync.c:1170: error: structure has no member named ibox' mbox-sync.c:1174: error: structure has no member named
ibox'
mbox-sync.c:1176: error: structure has no member named `ibox'
if (fstat(sync_ctx->ibox->mbox_fd, &st1) < 0)
i_error("fstat(mbox) failed: %m");
I substituted 'mbox' for 'ibox'. I think that's what you meant?
Todd
On Tue, 24 May 2005, Timo Sirainen wrote:
But included a patch to Dovecot, it should give some more information now instead of just crashing.
I've seen the error happen on a few different mailboxes since running the patched version. The last value is always one less than the first two. Here's an example:
dovecot: May 27 15:55:23 Error: 8039 IMAP(dirk.hostopia.com): MBOX /mailhome/new/d/h/dirk.hostopia.com/mbox: sizes differ: 400781206 vs 400781206 vs. 400781205
This is from this code in src/lib-storage/index/mbox/mbox-sync.c
i_error("MBOX %s: sizes differ: %"PRIuUOFF_T" vs %"
PRIuUOFF_T" vs. %"PRIuUOFF_T, sync_ctx->mbox->path,
st1.st_size, st2.st_size,
sync_ctx->file_input->v_offset);
Todd
On Fri, 2005-05-27 at 17:03 -0400, Todd Burroughs wrote:
On Tue, 24 May 2005, Timo Sirainen wrote:
But included a patch to Dovecot, it should give some more information now instead of just crashing.
I've seen the error happen on a few different mailboxes since running the patched version. The last value is always one less than the first two. Here's an example:
dovecot: May 27 15:55:23 Error: 8039 IMAP(dirk.hostopia.com): MBOX /mailhome/new/d/h/dirk.hostopia.com/mbox: sizes differ: 400781206 vs 400781206 vs. 400781205
Those values are correct. I think I figured out the problem, it was caching the file size internally in some situations after it had done modifications to mbox. Fixed in CVS.
participants (2)
-
Timo Sirainen
-
Todd Burroughs