[Dovecot] Dovecot-1.0.beta1: New mail sometimes fails to appear
Hi there...
Having updated to 1.0.beta1, I for a while thought a bad bug with IDLE handling that caused new mail in INBOX not to appear when using Mozilla Thunderbird has been solved. Although I can not reproduce one kind of problem anymore, after a short while it again appears that one does not always see one's new mail with Thunderbird.
I can not consistently reproduce the bug, but from some network captures it indeed appears that sometimes Thunderbird has (in one of its connections to the server) INBOX SELECTed and IDLE on it, new mail arrives, but 10 minutes afterwards it was still not sent an EXISTS message.
Initially the symptoms appeared to suggest that something really bad happened to the index files, but I tried INDEX=MEMORY and that did not help much. Thunderbird using multiple connections to the server does of course only make debugging more difficult. But it appears that without multiple connections, one has to constantly cycle around SELECTing to get notified on more than one mailbox, so that may justified.
I am a bit lost on this now, and would appreciate some ideas. Although there were several related bugs I noticed in Thunderbird 1.0, it appears that they are gone in 1.5, and yet now it really seems Dovecot is doing something wrong here.
Attached is the configuration file I use for the testing installation of Dovecot 1.0.beta1 (somebody tell me if you see something wrong there). The mailbox arrangement is as follows, right now (until we have a better way):
- INBOX in /var/mail/%u
- most mailboxes in ~/imap.maildir/... as maildirs.
- mboxes in ~/imap.mbox/... accessible as #imap.mbox/...
- the mbox:~/mail/ is for when namespaces are not used - now there are so it should not be relevant
Except for myself, and one test account I have, I have two guinea-pigs using it (who are well aware that it is still under development and that it will break once in a while) and reporting problems to me.
Vaguely (since I can't figure the exact reasons), the symptom can be described as follows: When using Mozilla Thunderbird to access mail, new mail arrives to INBOX (mbox:/var/mail/%u), but is sometimes not seen for a long while after it arrives. Sometimes the first (or first few) message arriving appears, but then no new messages appear, and sometimes everything works right and immediately new mail is seen. However whenever (according to Thunderbird) something changes in the mailbox, the mailbox is seen up-to-date as it really was at that time.
Thanks, any help appreciated, -- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
# Dovecot IMAP server configuration at CS.HUJI ############################################## # # $Id$
# GENERAL PARAMETERS: #####################
# Base directory where to store runtime data: base_dir = /var/run/dovecot
# Protocols we want to be serving: protocols = imap pop3
# IP or host address where to listen in for connections: listen = *
ssl_disable = yes
# LOGIN PROCESSES: ##################
# Directory for authentication sockets: #login_dir = /var/run/dovecot/login
# chroot login process to the login_dir: #login_chroot = yes
# User to use for the login process: login_user = nobody
# Maximum process size in megabytes: #login_process_size = 16
# Have each login be processed in its own process: #login_process_per_connection = yes
# Number of login processes to create: #login_processes_count = 3
# Maximum number of extra login processes to create: #login_max_processes_count = 128
# Maximum number of connections allowed in login state: #login_max_logging_users = 256
# Greeting message for clients: login_greeting = Dovecot mail access server at CS.HUJI ready.
# MAIL PROCESSES: #################
# Maximum number of running mail processes: #max_mail_processes = 1024
# Show more verbose process title (in ps): verbose_proctitle = yes
# Valid UID range for users: first_valid_uid = 10 #last_valid_uid = 0
# Valid GID range for users: #first_valid_gid = 1 #last_valid_gid = 0
# Extra groups for mail processes: #mail_extra_groups =
# ':' separated list of directories under which chrooting is allowed: #valid_chroot_dirs =
# Default chroot directory for mail processes: #mail_chroot =
# Default MAIL environment to use when it's not set: default_mail_env = mbox:~/mail/:INBOX=/var/mail/%u
# NAMESPACES: #############
namespace private { separator = / prefix = "" location = maildir:~/imap.maildir/ }
namespace private { separator = / prefix = "#imap.mbox/" location = mbox:~/imap.mbox/:INBOX=/var/mail/%u inbox = yes }
# MAIL STORE: #############
# Space-separated list of fields to initially save into cache file: #mail_cache_fields =
# Space-separated list of fields to never save into cache file: #mail_never_cache_fields =
# Like mailbox_check_interval, but used for IDLE command: #mailbox_idle_check_interval = 30
# Whether to allow full filesystem access to clients: #mail_full_filesystem_access = no
# Maximum allowed length for mail keyword name: #mail_max_keyword_length = 50
# Whether to save mails with CR+LF instead of plain LF: #mail_save_crlf = no
# Whether to use mmap to access mail files: #mail_read_mmaped = no
# Whether to disable mmap at all: mmap_disable = yes
# Whether to disable write to mmap: #mmap_no_write = no
# Locking method for index files: lock_method = dotlock
# Whether to list only directories in LIST command: maildir_stat_dirs = yes
# Whether to copy mail to other folders using hard links: #maildir_copy_with_hardlinks = no
# Locking methods to use for locking mbox: mbox_read_locks = dotlock mbox_write_locks = dotlock
# Maximum time in seconds to wait for lock (all of them): mbox_lock_timeout = 300
# Timeout to overwrite dotlock files if mailbox isn't modified: mbox_dotlock_change_timeout = 30
# Whether to read only appended data on change (unless corrupt): #mbox_dirty_syncs = yes
# Like mbox_dirty_syncs, but no full syncs with SELECT, EXAMINE, # EXPUNGE, or CHECK: #mbox_very_dirty_syncs = no
# Whether to delay writing mbox headers until doing a full write sync: #mbox_lazy_writes = yes
# umask to use for mail files and directories: umask = 0077
# Whether drop all privileges before exec of mail process: #mail_drop_priv_before_exec = no
# Maximum mail process size in megabytes: #mail_process_size = 256
# Log prefix for mail processes: mail_log_prefix = "%Us(%u): "
# IMAP SPECIFIC SETTINGS: #########################
protocol imap { # Login executable location: #login_executable = /usr/local/dovecot/libexec/dovecot/imap-login
# IMAP executable location:
#mail_executable = /usr/local/dovecot/libexec/dovecot/imap
# Maximum IMAP command line length in bytes:
#imap_max_line_length = 65536
# Suypport for dynamically loadable modules:
#mail_use_modules = no
#mail_modules = /usr/local/dovecot/lib/dovecot/imap
# Whether to send IMAP capabilities in greeting message:
login_greeting_capability = yes
# Workarounds for various client bugs:
imap_client_workarounds = outlook-idle netscape-eoh
}
# POP3 SPECIFIC SETTINGS: #########################
protocol pop3 { # Login executable location: #login_executable = /usr/local/dovecot/libexec/dovecot/pop3-login
# POP3 executable location:
#mail_executable = /usr/local/dovecot/libexec/dovecot/pop3
# Whether to not update recent/seen flags in POP3 sessions:
#pop3_no_flag_updates = no
# Whether to support deprecated LAST command:
#pop3_enable_last = no
# POP3 UIDL format to use:
#pop3_uidl_format = %08Xu%08Xv
# Support for dynamically loadable modules
#mail_user_modules = no
#mail_modules = /usr/local/dovecot/lib/dovecot/pop3
# Workarounds for various client bugs:
pop3_client_workarounds = outlook-no-nuls oe-ns-eoh
}
# AUTHENTICATION PROCESSES: ###########################
# Authentication executable location: #auth_executable = /usr/local/dovecot/libexec/dovecot/dovecot-auth
# Maximum authentication process size in megabytes: #auth_process_size = 256
# Authentication cache size in kilobytes: #auth_cache_size = 0
# Time to live in seconds for cached data: #auth_cache_ttl = 3600
# Space separated list of realms for SASL authentication mechanisms: #auth_realms =
# Default realm/domain to use if none was specified: #auth_default_realm =
# List of allowed characters in username: #auth_username_chars = abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789.-_@
# Username character translation before it's looked up in database: #auth_username_translation =
# Whether to disable plaintext authentication disable_plaintext_auth = no
# More verbose authentication logging: auth_verbose = yes
# Even more verbose authentication logging for debugging purposes: auth_debug = yes
auth default { # Space separated list of wanted authentication mechanisms mechanisms = plain
# Password database specification:
passdb pam {
}
# User database specification:
userdb passwd {
}
# User to use for the authentication process:
user = root
# Directory where to chroot the process:
#chroot =
# Number of authentication processes to create:
#count = 1
# Whether to require a valid SSL client certificate for
# authentication:
#ssl_require_client_cert = no
}
Hi Tom,
On 23/01/2006 1:05 a.m., Tom Alsberg wrote:
Hi there...
Having updated to 1.0.beta1, I for a while thought a bad bug with IDLE handling that caused new mail in INBOX not to appear when using Mozilla Thunderbird has been solved. Although I can not reproduce one kind of problem anymore, after a short while it again appears that one does not always see one's new mail with Thunderbird.
I can not consistently reproduce the bug, but from some network captures it indeed appears that sometimes Thunderbird has (in one of its connections to the server) INBOX SELECTed and IDLE on it, new mail arrives, but 10 minutes afterwards it was still not sent an EXISTS message.
I am seeing something like this too and am using Thunderbird, but I'm not sure that this is actually a Thunderbird problem. I'm unsure when it first appeared but it's pre -beta1 anyway.
What seems to happen for me is that if the new mail is delivered into Maildir/new/ or a file copied in there, I don't often see it until TB does it's 2 minute "check-for-new-mail-in-all-folders" scan. Then it is picked up as a new mail. There is no network activity from the server to the client.
However if I put a file by hand, say for example "cp /etc/hosts /home/reuben/Maildir/cur" into Maildir/cur/ then Thunderbird sees it immediately. Dovecot sends this to the client and breaks the IDLE straight away:
367 53.927740 192.168.0.5 -> 192.168.0.8 IMAP Response: * 21 EXISTS
It's a bit strange but I can reproduce it on demand, Tom can you reproduce it that way too or is this a different bug? [I think it's the same one..]
reuben
Reuben Farrelly wrote:
Tom can you reproduce it that way too or is this a different bug? [I think it's the same one..]
While it may be related to the same cause, I think it is not exactly the same bug - not the same issue anyway.
- Here this happens with INBOX being an mbox, not a maildir.
- 2 minutes do not help. In fact, it seems that when Thunderbird (1.5) is set to use IDLE, it does nothing (in the network capture I see nothing sent from it) when clicking "Get Mail".
This basically means that if I have Thunderbird set to use the IDLE extension, given this bug, sometimes the only way to see my new mail is to close Thunderbird and reopen it, since "Get Mail" does not appear to do anything (and the IDLE notify does not work).
If you find out something about your problem, though, I'd be interested to know.
reuben
-- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
On Mon, 2006-01-23 at 01:16 +1300, Reuben Farrelly wrote:
What seems to happen for me is that if the new mail is delivered into Maildir/new/ or a file copied in there, I don't often see it until TB does it's 2 minute "check-for-new-mail-in-all-folders" scan. Then it is picked up as a new mail. There is no network activity from the server to the client.
Are you using dnotify, inotify, kqueue or nothing? And select, poll, epoll, or kqueue for ioloop?
On 23/01/2006 1:44 a.m., Timo Sirainen wrote:
On Mon, 2006-01-23 at 01:16 +1300, Reuben Farrelly wrote:
What seems to happen for me is that if the new mail is delivered into Maildir/new/ or a file copied in there, I don't often see it until TB does it's 2 minute "check-for-new-mail-in-all-folders" scan. Then it is picked up as a new mail. There is no network activity from the server to the client.
Are you using dnotify, inotify, kqueue or nothing? And select, poll, epoll, or kqueue for ioloop?
I have in the past used inotify, but now using dnotify for a while because inotify won't build/link.
Same problem observed with both notification mechanisms :(
reuben
Previously I wrote:
- INBOX in /var/mail/%u
- most mailboxes in ~/imap.maildir/... as maildirs.
- mboxes in ~/imap.mbox/... accessible as #imap.mbox/...
- the mbox:~/mail/ is for when namespaces are not used - now there are so it should not be relevant
A few related details which are probably highly relevant I forgot to add:
Dovecot is running on FreeBSD 5.4
both home directories and /var/mail are on NFS - stored on another server (NetApp)
- This means that the only way to get new mail is to do continuously check (poll/busy-wait) whether there are any changes in the mailbox, which the IMAP process does. No epoll/inotify or anything like that.
- We have and had many different operating systems, and generally the situation here is that NFS locking (fcntl/flock) always breaks some way or another since we can't get all hosts to work the same way with it, so we don't try at all. All mail accesses (deliveries using Exim, and access using all mail clients we have around here) are using dotlocks (creating a file with the same name ending with ".lock") to avoid concurrency issues. Dotlocking is configured in dovecot.conf (attached in the referred message) as follows:
lock_method = dotlock mbox_read_locks = dotlock mbox_write_locks = dotlock # Timeout to overwrite dotlock files if mailbox isn't modified: mbox_dotlock_change_timeout = 30
-- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
On Sun, 2006-01-22 at 14:30 +0200, Tom Alsberg wrote:
- both home directories and /var/mail are on NFS - stored on another server (NetApp)
- This means that the only way to get new mail is to do continuously check (poll/busy-wait) whether there are any changes in the mailbox, which the IMAP process does. No epoll/inotify or anything like that.
Maybe NFS is caching Dovecot's stat() calls which check if the mbox has changed?
Timo Sirainen wrote:
Maybe NFS is caching Dovecot's stat() calls which check if the mbox has changed?
I will try to check whether this is the case. You mean the NFS client (FreeBSD kernel) caching, not the server (NetApp), right? Something like this happening on the server seems not to make any sense to me.
However I doubt that, as I've never had anything like that happening (we use FreeBSD and NFS all the time) - I'm using FreeBSD 5.4 on my workstation too, and there when reading mail (with mutt) I always see new mail.
-- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
On Sun, 2006-01-22 at 14:50 +0200, Tom Alsberg wrote:
Timo Sirainen wrote:
Maybe NFS is caching Dovecot's stat() calls which check if the mbox has changed?
I will try to check whether this is the case. You mean the NFS client (FreeBSD kernel) caching, not the server (NetApp), right? Something like this happening on the server seems not to make any sense to me.
Right, in client side.
However I doubt that, as I've never had anything like that happening (we use FreeBSD and NFS all the time) - I'm using FreeBSD 5.4 on my workstation too, and there when reading mail (with mutt) I always see new mail.
Well, that's pretty much all I can think of currently.. In non-NFS environment I haven't heard of problems with new mails not being found.
It's also possible that mutt does something slightly differently which causes NFS cache to be reset. You could maybe try this with Dovecot too. Although I'm not sure how that would be done.. chmod() apparently at least resets it.
Hmm. From mutt's configure:
AC_ARG_ENABLE(nfs-fix, [ --enable-nfs-fix Work around an NFS with broken attributes caching ], [if test x$enableval = xyes; then AC_DEFINE(NFS_ATTRIBUTE_HACK) fi])
Which does after each stat():
#ifdef NFS_ATTRIBUTE_HACK if (sb.st_mtime > sb.st_atime) { newtime.modtime = sb.st_mtime; newtime.actime = time (NULL); utime (ctx->path, &newtime); } #endif
I have reports from users stating the same problem. They are also using NFS for their homedirectories. We use maildir there and no /var/mail or so. dovecot is 0.99.10.6 and no special configuration; locks are default (commented out). Up to now, I could not find out anything. If you want me to investigate, I would be very willing but would need help on how to debug the IMAP conversation.
Yours, Jakob Curdes
Jakob Curdes wrote:
I have reports from users stating the same problem. They are also using NFS for their homedirectories. We use maildir there and no /var/mail or so. dovecot is 0.99.10.6 and no special configuration; locks are default (commented out). Up to now, I could not find out anything. If you want me to investigate, I would be very willing but would need help on how to debug the IMAP conversation.
Wow.. and I thought _I_ was using an old version (0.99.14)... Seriously, update now. Between just 0.99.10 and 0.99.14 there was a considerable number of important fixes. And between 0.99 and 1.0beta1, there's reportedly large performance increases, improved indexing, improved mbox handling, and a large number of added features.
I'm hoping your reason for using such an old version is that's what your distro has packaged.
-- Curtis Maloney cmaloney@cardgate.net
Wow.. and I thought _I_ was using an old version (0.99.14)... Seriously, update now. Between just 0.99.10 and 0.99.14 there was a considerable number of important fixes. And between 0.99 and 1.0beta1, there's reportedly large performance increases, improved indexing, improved mbox handling, and a large number of added features.
I'm hoping your reason for using such an old version is that's what your distro has packaged.
My reason for using such an old version is that up to the recent report of mail not appearing there were no problems whatsoever and it is running in a protected environment - so I stuck to "never change a runnning system". I have other machines running newer versions, but could never make out a difference in stability. Performance might not be such an issue in small environments with <15 users. As for the 1.0 jump, I am waiting for the noise on this list to settle before upgrading.
Yours, Jakob Curdes
Jakob Curdes wrote:
My reason for using such an old version is that up to the recent report of mail not appearing there were no problems whatsoever and it is running in a protected environment - so I stuck to "never change a runnning system". I have other machines running newer versions, but could never make out a difference in stability. Performance might not be such an issue in small environments with <15 users. As for the 1.0 jump, I am waiting for the noise on this list to settle before upgrading.
After I'd send my post, I realised I should have elaborated somewhat. I currently run 0.99.14, and recommend you at least move up to it. I understand entirely the move to 1.0 is non-trivial, and have not had the time to do it myself.
And yes, "if it aint broke..." is also another good reason to not upgrade :)
... must remember not to post before my coffee...
-- Curtis Maloney cmaloney@cardgate.net
Sorry for the late reply onthe subject. However I thought I would check some more before replying.
Basically, I reached the conclusion that it is not NFS caching in fault - below are the explanations.
I'm trying to debug the notify callbacks, however the system seems a bit complex for now, and I'm still attempting to find heads and tails in it. Could you perhaps write something explaining how the notify callbacks are called - what functions are involved, when the mailbox is updated (stat shows a newer mtime) what the procedure is?
Following are answers to your last reply on the subject claiming it is NFS caching that ruins it here. However, the final evidence is in this:
I added a debug logging to the stat loop in the file src/lib-storage/index/index-mailbox-check.c (line 44, in the sources of .beta2):
for (file = ibox->notify_files; file != NULL; file = file->next) {
if (stat(file->path, &st) == 0 &&
file->last_stamp != st.st_mtime) {
i_info("CSDEBUG: activity on file \"%s\","
" last noted at %d, modified at %d",
file->path, file->last_stamp,
st.st_mtime);
file->last_stamp = st.st_mtime;
notify = TRUE;
}
}
This means I see in the log a line whenever a file (or directory) of a mailbox SELECTED (IDLE on) is updated. I indeed saw many such lines, and waited until it happens again that new mail arrives but is not seen by Thunderbird. And indeed at some point new mail arrived (to my inbox - /var/mail/alsbergt, which I had selected and open in Thunderbird), but I did not see it. In the log, I had:
dovecot: Jan 31 10:40:53 Info: imap(alsbergt): CSDEBUG: activity on file "/var/mail/alsbergt", last noted at 1138696544, modified at 1138696850
At the same time I had ethereal capturing the connection; I saw no EXISTS sent to the client (Thunderbird at that time).
Another message arrived after 10 minutes, again this debugging line appeared (with different timestamps, of course). Then after a while another message arrived, but this time an EXISTS with all three messages was sent to the client.
So obviously the results of stat in this case do show that the file is updated. However somewhere later where Dovecot should notify (send an EXISTS) to the client, it does not for some reason. It appears to me that there is a bug somewhere in the callbacks reached by the following code:
if (notify) {
ibox->notify_last_sent = ioloop_time;
ibox->notify_pending = FALSE;
ibox->notify_callback(&ibox->box, ibox->notify_context);
}
I'd love some assistance in debugging this. If you could explain exactly what happens when notify_callback is called (I've not got it all sorted out yet), where it is set, and what should by design be the course, I may understand better what is going on. Or anything else I could do (add some more debugging prints? send a core or some network capture?) that would enable you to trace this?
Anyway, to the answers for your last mail:
On Sun, Jan 22, 2006 at 03:01:47PM +0200, Timo Sirainen wrote:
<snip />
I will try to check whether this is the case. You mean the NFS client (FreeBSD kernel) caching, not the server (NetApp), right? Something like this happening on the server seems not to make any sense to me.
Right, in client side.
Well, to begin with, doing "ls" on the file basically does stat on it, so if stat is cached I should not see the file's size or mtime changed in ls, which I do (trying to do ls - on the same host as the Dovecot server is running on - on the mailbox when this happens that new mail arrives and is not seen).
<snip /> It's also possible that mutt does something slightly differently which causes NFS cache to be reset. You could maybe try this with Dovecot too. Although I'm not sure how that would be done.. chmod() apparently at least resets it.
I'm not sure exactly where to do that, but if you tell me where I could add some chmod or something and see if this makes a difference.
Hmm. From mutt's configure:
AC_ARG_ENABLE(nfs-fix, [ --enable-nfs-fix Work around an NFS with broken attributes caching ], [if test x$enableval = xyes; then AC_DEFINE(NFS_ATTRIBUTE_HACK) fi])
However doing mutt -v, it appears that the mutt on our FreeBSD systems is built without --enable-nfs-fix. I also read a bit in the mutt mailing list and as I understood this is a different issue with some systems.
Which does after each stat():
<snip />
I tried also running a small C program which does stat every second and prints out if there is a difference in mtime from the last result, and ran it on one of the mailboxes, and it always showed a difference when new mail arrived, even when Dovecot would not send an EXISTS as response to the IDLE command.
But anyway this is all irrelevant in the light of what I wrote above - that Dovecot itself does notice a change in the mtime as returned by stat while the client does not see new mail in IDLE.
Thanks, help appreciated, -- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
Hi there...
I am still awaiting replies to my last (rather long) message on this subject. Some assistance in debugging would be appreciated.
In the meanwhile, getting deeper into the sequence of calls when new mail arrives, I have traced the issue to be related the following in imap_sync_deinit(...):
if (status.messages != ctx->messages_count) {
client_send_line(ctx->client,
t_strdup_printf("* %u EXISTS", status.messages));
}
As I understand, at this point, status.messages should be the number of messages in the mailbox, and ctx->message_count the number of messages last reported to the client.
Putting some debugging prints in there, it appears that sometimes when new mail arrives to the mbox, while Dovecot did detect that the mtime of the mailbox has changed, at this point, status.messages does not reflect that (for some reason - although there is new mail in the mailbox), and therefore no EXISTS message is sent to the client.
Any insights on that?
On another note, shouldn't the EXISTS message be on IDLE sent whenever there is a change in the mailbox? After all, a message can be replaced or changed, in which case the number of messages will not be updated, but one would want the client informed on the change...
Thanks, help appreciated, -- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
On Sun, 2006-02-05 at 12:06 +0200, Tom Alsberg wrote:
In the meanwhile, getting deeper into the sequence of calls when new mail arrives, I have traced the issue to be related the following in imap_sync_deinit(...):
if (status.messages != ctx->messages_count) { client_send_line(ctx->client, t_strdup_printf("* %u EXISTS", status.messages)); }
As I understand, at this point, status.messages should be the number of messages in the mailbox, and ctx->message_count the number of messages last reported to the client.
Right. But have you checked the code in lib-storage/index/mbox/mbox-sync.c? That's where it really does the mbox syncing and also checks again if mbox has changed before doing anything. Probably mbox_sync_has_changed() would be a good place to put some checks.
On another note, shouldn't the EXISTS message be on IDLE sent whenever there is a change in the mailbox? After all, a message can be replaced or changed, in which case the number of messages will not be updated, but one would want the client informed on the change...
ctx->messages_count is decreased whenever expunges happen (which is the only way to change/replace a mail), so the above checks works correctly.
On Sun, Feb 05, 2006 at 01:46:16PM +0200, Timo Sirainen wrote:
As I understand, at this point, status.messages should be the number of messages in the mailbox, and ctx->message_count the number of messages last reported to the client.
Right. But have you checked the code in lib-storage/index/mbox/mbox-sync.c?
I'm checking it now. it seems to be reached much earlier through imap_sync_init -> mailbox_sync_init -> mbox_storage_sync_init. I don't quite understand the code there, yet, but I'm learning somewhat on Dovecot's working from it.
That's where it really does the mbox syncing and also checks again if mbox has changed before doing anything. Probably mbox_sync_has_changed() would be a good place to put some checks.
Will do that.
Any ideas on what the problem may be related to, though? Some further assistance on how to check and resolve this would be appreciated.
I'm quite sure this is a bug in Dovecot now... I can confirm it is not in the client as I see Thunderbird is really IDLE on the mailbox, and I can confirm it is not caused by any NFS caching, as I see that it does get the correct results from stat.
One thing that could help me - is there some documentation on the inner architecture of Dovecot? I.E. What all the structs and their fields are conceptually for, and all the abstractions defined and used?
Can you explain the role of the {imap,mailbox}_sync_init/{imap,mailbox}_sync_deinit functions? Some clear notion of that could help me understand what is going on.
On another note, shouldn't the EXISTS message be on IDLE sent whenever there is a change in the mailbox? After all, a message can be replaced or changed, in which case the number of messages will not be updated, but one would want the client informed on the change...
(This is not related to my problem here - which is evident when new mail arrives and thus the message count does increase, but anyway)
ctx->messages_count is decreased whenever expunges happen (which is the only way to change/replace a mail), so the above checks works correctly.
What about when a message is changed/replaced from outside of IMAP (i.e. a user has a mailer accessing the mailboxes through the filesystem)? I thought Dovecot was supposed to be safe to use concurrent with local mail access...
Thanks, cheers, -- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
On Mon, 2006-02-06 at 15:31 +0200, Tom Alsberg wrote:
Can you explain the role of the {imap,mailbox}_sync_init/{imap,mailbox}_sync_deinit functions? Some clear notion of that could help me understand what is going on.
I'll reply more later this evening, but here's a quick explanation:
lib-storage/ code is generic mailbox handling code while imap/ contains imap-specific code.
lib-storage/index/mbox/mbox-sync.c sync mbox <-> dovecot's index. lib-storage/index/index-sync.c syncs "client's view of mailbox" <-> dovecot index.
imap/imap-sync.c then just maps lib-storage/index/index-sync.c to IMAP events.
IDLE works by waiting for some file's mtime to change. When that happens, it starts the mbox <-> index syncing. There are also some checks that it doesn't do this if it has reported changes within last 10 seconds or something (point is that if there's IDLE going on and Dovecot reports new mails to client, but client doesn't react to it, there's no point in reporting more new mails at least immediately). That's where I thought might be some bug..
On another note, shouldn't the EXISTS message be on IDLE sent whenever there is a change in the mailbox? After all, a message can be replaced or changed, in which case the number of messages will not be updated, but one would want the client informed on the change...
(This is not related to my problem here - which is evident when new mail arrives and thus the message count does increase, but anyway)
ctx->messages_count is decreased whenever expunges happen (which is the only way to change/replace a mail), so the above checks works correctly.
What about when a message is changed/replaced from outside of IMAP (i.e. a user has a mailer accessing the mailboxes through the filesystem)? I thought Dovecot was supposed to be safe to use concurrent with local mail access...
Mails shouldn't be externally modified. Most mail clients that I know of modify mails using expunge + append method, so I haven't tried to bother handling that in any nice way. I'm not sure if there even is any good way to handle it..
P.S. I have now set index files to sit on a local disk, and the problem still occurs.
-- Tom
On Mon, Feb 06, 2006 at 03:46:00PM +0200, Timo Sirainen wrote:
On Mon, 2006-02-06 at 15:31 +0200, Tom Alsberg wrote:
Can you explain the role of the {imap,mailbox}_sync_init/{imap,mailbox}_sync_deinit functions? Some clear notion of that could help me understand what is going on.
I'll reply more later this evening, but here's a quick explanation: <snip />
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
On Sun, Feb 05, 2006 at 01:46:16PM +0200, Timo Sirainen wrote:
Probably mbox_sync_has_changed() would be a good place to put some checks.
OK, I put some checks there. Now here is something interesting:
It appears this function is called several times after activity on the mbox.
The first time it is called, is through imap_sync_init->mailbox_sync_init->mbox_storage_sync_init->mbox_sync, the second is through imap_sync_deinit->mailbox_sync_deinit->mailbox_transaction_commit->mbox_transaction_commit->mbox_sync, the third again from imap_sync_init->... and the fourth again from imap_sync_deinit.
Normally, when new mail arrives, the first three times it returns 1, and from the fourth on, it returns 0. EXISTS is sent from imap_sync_deinit after the second time it returned 1.
However, it happens that on the first call mbox_sync_has_changed returns 1, but on the second call it returns 0. Now, the first time this happens, EXISTS is still sent from imap_sync_deinit to the client (after the second call to mbox_sync_has_changed).
However after that, when new mail arrives, mbox_sync_has_changed returns 1 the first time and 0 the next times again, but in imap_sync_deinit, status.messages and ctx->messages_count are equal, and thus no EXISTS is sent to the client.
This means, that by looking at the logs including my debug messages with the values of status.messages and ctx->messages_count and returns value of mbox_sync_has_changed(...), I can tell whether the next time new mail arrives to the same user the problem will occur or not.
This is odd... I'll continue to investigate. Any idea what this behaviour may mean?
Besides further clarifications on the operation of sync, could you describe the path (in terms of functions called) from mbox_sync to the updating of status.messages? Perhaps I could put some debugging messages in there and see what is going on...
Thanks, regards, -- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
Some more new information...
First, I was wrong in the last noted observation (probably my eyes got confused between the lines in the log). When the first call to mbox_sync_has_changed returns 1 and the next ones return 0, Thunderbird is not notified of the new mail that has arrived (no EXISTS is sent), even the first time. So I can't tell in advance what will be with the next message that arrives.
Also, I put some more debugging into mbox_sync_has_changed, and can say the following: When everything is right, mbox_sync_has_changed returns 1 (the second and third times too) because mbox->mbox_dirty_stamp is different from st->st_mtime, in the last statement of mbox_sync_has_changed:
return st->st_mtime != mbox->mbox_dirty_stamp || st->st_size != mbox->mbox_dirty_size;
While when it does not work, mbox->mbox_dirty_stamp equals st->st_mtime, and it returns 0. In both cases, hdr->sync_stamp is different from st->st_mtime.
(The same applies to the sizes as well, that is, they are different where the timestamps are different and equal where the timestamps are equal).
A more interesting revalation: It is not only that EXISTS is not sent, and not only to do with IDLE. When new mail arrives, even when Thunderbird issues a SELECT (by doing a check for new mail), it does not see the new messages.
Furthermore: If the last message seen from within Thunderbird was not yet read, and the problem occurs (new mail arrives that is not seen), this last message will then seem corrupted from Thunderbird the next message in the mailbox.
- it will contain at the end some garbage filling, and then part of
This is while the mailbox itself is completely valid, has no such garbage in it, and the messages are validly separated by an empty line and a valid "From " line. Perhaps this means there's some race-condition bug in the mbox reading/index syncing code? That's the code I least understand as of now...
Attached is an example of one such last message saved from Thunderbird after new mail arrived that was not seen. The filler characters are bytes of 0x80, and then the next message from "n-Path: " is present.
Any hints?
Cheers, -- Tom
On Mon, Feb 06, 2006 at 05:03:49PM +0200, I wrote:
<snip /> However, it happens that on the first call mbox_sync_has_changed returns 1, but on the second call it returns 0. Now, the first time this happens, EXISTS is still sent from imap_sync_deinit to the client (after the second call to mbox_sync_has_changed).
However after that, when new mail arrives, mbox_sync_has_changed returns 1 the first time and 0 the next times again, but in imap_sync_deinit, status.messages and ctx->messages_count are equal, and thus no EXISTS is sent to the client. <snip />
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
Sorry, forgot the attachment...
Here it is.
-- Tom
On Tue, Feb 07, 2006 at 10:13:03AM +0200, I wrote:
<snip /> 4. Furthermore: If the last message seen from within Thunderbird was not yet read, and the problem occurs (new mail arrives that is not seen), this last message will then seem corrupted from Thunderbird the next message in the mailbox.
- it will contain at the end some garbage filling, and then part of
This is while the mailbox itself is completely valid, has no such garbage in it, and the messages are validly separated by an empty line and a valid "From " line. Perhaps this means there's some race-condition bug in the mbox reading/index syncing code? That's the code I least understand as of now...
Attached is an example of one such last message saved from Thunderbird after new mail arrived that was not seen. The filler characters are bytes of 0x80, and then the next message from "n-Path: " is present.
Any hints?
Cheers, -- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
Hi, Timo.
Have you been looking into what I described in my last few messages about this problem? Some more information about the problem in them, but I could not yet see it exactly, and I got no replies from you...
Some assistance in overcoming this problem would be greatly appreciated.
Here's some more information I gathered in the meanwhile:
It appears this does not necessarily have to do with the IDLE extension. Even when Thunderbird is set not to use it, but poll once in a while, still at some point it does not see new messages on SELECT. Pressing "Get Mail" sends a SELECT but that doesn't do much either.
The abovementioned point combined with the corrupted last message seen in new mail that arrives (point 4 in my quoted message) makes me believe this is probably a bug in the code syncing between mbox and index.
This is interesting: Having Thunderbird open, I wait until the problem occurs (speeding it up by sending a test message every minute - after in average about 10 messages suddenly a new message arriving will not be seen), then send a few more test messages that Thunderbird does not see arriving either. Now I open another IMAP client (say, mutt on another computer) to the same account on Dovecot. On connection of the second IMAP client, suddenly it will see all mail and also the old open Thunderbird will suddenly get all the messages it lost.
Any ideas?
Thanks, help appreciated, -- Tom
On Tue, Feb 07, 2006 at 10:13:03AM +0200, Tom Alsberg wrote:
Some more new information...
First, I was wrong in the last noted observation (probably my eyes got confused between the lines in the log). When the first call to mbox_sync_has_changed returns 1 and the next ones return 0, Thunderbird is not notified of the new mail that has arrived (no EXISTS is sent), even the first time. So I can't tell in advance what will be with the next message that arrives.
Also, I put some more debugging into mbox_sync_has_changed, and can say the following: When everything is right, mbox_sync_has_changed returns 1 (the second and third times too) because mbox->mbox_dirty_stamp is different from st->st_mtime, in the last statement of mbox_sync_has_changed:
return st->st_mtime != mbox->mbox_dirty_stamp || st->st_size != mbox->mbox_dirty_size;
While when it does not work, mbox->mbox_dirty_stamp equals st->st_mtime, and it returns 0. In both cases, hdr->sync_stamp is different from st->st_mtime.
(The same applies to the sizes as well, that is, they are different where the timestamps are different and equal where the timestamps are equal).
A more interesting revalation: It is not only that EXISTS is not sent, and not only to do with IDLE. When new mail arrives, even when Thunderbird issues a SELECT (by doing a check for new mail), it does not see the new messages.
Furthermore: If the last message seen from within Thunderbird was not yet read, and the problem occurs (new mail arrives that is not seen), this last message will then seem corrupted from Thunderbird the next message in the mailbox.
- it will contain at the end some garbage filling, and then part of
This is while the mailbox itself is completely valid, has no such garbage in it, and the messages are validly separated by an empty line and a valid "From " line. Perhaps this means there's some race-condition bug in the mbox reading/index syncing code? That's the code I least understand as of now...
Attached is an example of one such last message saved from Thunderbird after new mail arrived that was not seen. The filler characters are bytes of 0x80, and then the next message from "n-Path: " is present.
Any hints?
Cheers, -- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
On Tue, 2006-02-07 at 10:13 +0200, Tom Alsberg wrote:
Furthermore: If the last message seen from within Thunderbird was not yet read, and the problem occurs (new mail arrives that is not seen), this last message will then seem corrupted from Thunderbird the next message in the mailbox.
- it will contain at the end some garbage filling, and then part of
This is while the mailbox itself is completely valid, has no such garbage in it, and the messages are validly separated by an empty line and a valid "From " line. Perhaps this means there's some race-condition bug in the mbox reading/index syncing code? That's the code I least understand as of now...
Attached is an example of one such last message saved from Thunderbird after new mail arrived that was not seen. The filler characters are bytes of 0x80, and then the next message from "n-Path: " is present.
They're actually 0x00 inside the mbox file, but Dovecot just translates them to 0x80 since IMAP doesn't allow sending 0x00 characters.
Now, these 0x00 characters are added to mbox file when it's rewriting the headers and it figures out that it needs more space. So it first grows the file by writing 0x00 characters to the end of the file. Then it starts moving message data towards the end of file.
So you mean that this corruption is only seen by Thunderbird, and it won't stay in the mbox file itself? In that case it means that mbox locking isn't working properly, ie. a Dovecot process is reading the mbox while another Dovecot process is modifying it. I don't remember having tested mbox_read_locks=dotlock, but I'd have thought it would have worked..
On Tue, Feb 14, 2006 at 10:59:49AM +0200, Timo Sirainen wrote:
Attached is an example of one such last message saved from Thunderbird after new mail arrived that was not seen. The filler characters are bytes of 0x80, and then the next message from "n-Path: " is present.
They're actually 0x00 inside the mbox file, but Dovecot just translates them to 0x80 since IMAP doesn't allow sending 0x00 characters.
Oh, I see...
BTW, why 0x80, of all characters?
Now, these 0x00 characters are added to mbox file when it's rewriting the headers and it figures out that it needs more space. So it first grows the file by writing 0x00 characters to the end of the file. Then it starts moving message data towards the end of file.
I'll look for the code that does that, and put some debugging prints there to see if I can verify it.
So you mean that this corruption is only seen by Thunderbird, and it won't stay in the mbox file itself?
Indeed. The corruption is as seen by Thunderbird through IMAP. Accessing the mbox file directly, there is no corruption in it.
In that case it means that mbox locking isn't working properly, ie. a Dovecot process is reading the mbox while another Dovecot process is modifying it.
May very well be the case. Especially considering that Thunderbird has multiple connections to the server open. I'm not sure what causes Dovecot to rewrite headers at each such point (given no client initiated write or even read), but I can think of some...
I don't remember having tested mbox_read_locks=dotlock, but I'd have thought it would have worked..
I'll try to investigate more soon. If you can provide some ideas on how to check the locking it could help (given that Dovecot locks for a short period of time only, I can't easily fabricate concurrent locking from outside, and it appears just holding a read lock all the time will not do much).
Perhaps some of the other details of the problem I have described can shed some more light (i.e. the tracing of mbox_sync, the fact that opening a new IMAP connection corrects the problem also for the old one, etc.)?
I will look some more and probably follow up to you with some further details.
Cheers, -- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
On Tue, 2006-02-14 at 11:40 +0200, Tom Alsberg wrote:
On Tue, Feb 14, 2006 at 10:59:49AM +0200, Timo Sirainen wrote:
Attached is an example of one such last message saved from Thunderbird after new mail arrived that was not seen. The filler characters are bytes of 0x80, and then the next message from "n-Path: " is present.
They're actually 0x00 inside the mbox file, but Dovecot just translates them to 0x80 since IMAP doesn't allow sending 0x00 characters.
Oh, I see...
BTW, why 0x80, of all characters?
Just copying UW-IMAP's behavior.
I don't remember having tested mbox_read_locks=dotlock, but I'd have thought it would have worked..
I'll try to investigate more soon. If you can provide some ideas on how to check the locking it could help (given that Dovecot locks for a short period of time only, I can't easily fabricate concurrent locking from outside, and it appears just holding a read lock all the time will not do much).
Well, I noticed that at least the default dotlock timeout isn't necessarily big enough. You could see if just changing mbox_dotlock_change_timeout = 120 helps.
Also I think I'll have to make Dovecot somehow touch the lock file once in a while so that if it's held for reading for a long time it's not overridden by other processes which think it's a stale lock file.
Anyway one possible way to test locking is to set breakpoints into end of mbox_lock_dotlock() function and while it has stopped, run another process to see what it thinks of the dotlock.
BTW. your hostnames aren't same in the different machines, right? And there's no other weirdness that prevents one process from seeing another? Because if Dovecot thinks that the process which has created the dotlock doesn't exist, it overrides the dotlock immediately.
On Tue, Feb 14, 2006 at 03:19:23PM +0200, Timo Sirainen wrote:
Well, I noticed that at least the default dotlock timeout isn't necessarily big enough. You could see if just changing mbox_dotlock_change_timeout = 120 helps.
No, doesn't help. Can reproduce the problem just as easily after having set that. that.
Also I think I'll have to make Dovecot somehow touch the lock file once in a while so that if it's held for reading for a long time it's not overridden by other processes which think it's a stale lock file.
Would be good (although I'm not sure, how much time can a read normally take?). But this does not seem to be the problem here. It's short timeframes (less than 10 seconds), and Exim respects the locks for much longer.
Anyway one possible way to test locking is to set breakpoints into end of mbox_lock_dotlock() function and while it has stopped, run another process to see what it thinks of the dotlock.
I'm not sure what mean by running another process and seeing what it thinks of the dotlock (another IMAP process? how? what to check?), but I've set a break after the line confirming the dotlock:
mbox->mbox_dotlocked = TRUE;
and at this point, a .lock file really exists. So at least the locking works, it appears also that dovecot properly takes care if it is already locked.
BTW. your hostnames aren't same in the different machines, right? And there's no other weirdness that prevents one process from seeing another? Because if Dovecot thinks that the process which has created the dotlock doesn't exist, it overrides the dotlock immediately.
No, there is nothing special of this kind. All hosts have unique hostnames, and all processes see all other processes on the server.
Dovecot and the MTA (Exim) delivery run on different hosts, though, but that is natural and shouldn't be a problem, right?
I will continue investigating, but I'm no longer sure it is locking related. Further advice is appreciated, of course.
Cheers, -- Tom
-- Tom Alsberg - hacker (being the best description fitting this space) Web page: http://www.cs.huji.ac.il/~alsbergt/ DISCLAIMER: The above message does not even necessarily represent what my fingers have typed on the keyboard, save anything further.
On Tue, 2006-02-14 at 18:06 +0200, Tom Alsberg wrote:
BTW. your hostnames aren't same in the different machines, right? And there's no other weirdness that prevents one process from seeing another? Because if Dovecot thinks that the process which has created the dotlock doesn't exist, it overrides the dotlock immediately.
No, there is nothing special of this kind. All hosts have unique hostnames, and all processes see all other processes on the server.
What kind of dotlock files does exim create? Dovecot stores inside them "pid:hostname". I guess it's possible that if Exim also writes something inside them, Dovecot might interpret it wrong..
Dovecot and the MTA (Exim) delivery run on different hosts, though, but that is natural and shouldn't be a problem, right?
Right.
On Tue, 2006-02-14 at 19:04 +0200, Timo Sirainen wrote:
On Tue, 2006-02-14 at 18:06 +0200, Tom Alsberg wrote:
BTW. your hostnames aren't same in the different machines, right? And there's no other weirdness that prevents one process from seeing another? Because if Dovecot thinks that the process which has created the dotlock doesn't exist, it overrides the dotlock immediately.
No, there is nothing special of this kind. All hosts have unique hostnames, and all processes see all other processes on the server.
What kind of dotlock files does exim create? Dovecot stores inside them "pid:hostname". I guess it's possible that if Exim also writes something inside them, Dovecot might interpret it wrong..
Hmm. I just looked Exim's sources. Looks like it creates the file with empty contents. Also Dovecot has pretty strict checks in its contents, so probably isn't related to this.
Anyway, if the lock files do work properly (and I tested it today that it works at least with me with Linux and NFS), then I can't really think of anything else than some file caching issue.. Is it possible for you to try this with all NFS caching disabled? I'm not exactly sure how FreeBSD's NFS caching works though..
participants (5)
-
Curtis Maloney
-
Jakob Curdes
-
Reuben Farrelly
-
Timo Sirainen
-
Tom Alsberg