Dovecot's transactions often write-lock dovecot.index.log twice (without unlocking in between), and then unlock it twice. Is that intended, or is that a bug? If I read the man pages correctly, double write-locking is pointless but safe. But double unlocking could cause data corruption during the period between the first and second unlocks, because the locks don't nest (are not refcounted). The first unlock leaves the file totally unlocked. Could this be the source of some index corruption errors?
For example:
- Process A acquires a F_WRLCK on a file.
- Through a different code path, process A again acquires a F_WRLCK on the same file. Since these locks don't nest this is essentially a no-op.
- Process A modifies the file.
- Process A unlocks the file to undo the lock in step 2. Now the file has no lock on it at all. 5a. Process B locks, modifies, and unlocks the file. 5b. Process A also modifies the file, thinking that it still has a write lock.
- Process A unlocks the file to undo the lock in step 1, but this is another no-op. Data corruption is possible if steps 5a and 5b conflict, or if 5b uses stale data.
Here is a code path in dovecot-1.2.11 which exhibits the above behavior.
"Outer" write lock: maildir_transaction_save_commit_pre -> maildir_transaction_save_commit_pre_sync -> maildir_sync_index_begin -> mail_index_sync_begin -> mail_index_sync_begin_to -> mail_index_sync_begin_init -> mail_transaction_log_sync_lock -> mail_transaction_log_lock_head -> mail_transaction_log_file_lock -> mail_index_lock_fd -> file_wait_lock -> file_lock_do
"Inner" write lock: maildir_transaction_save_commit_pre -> maildir_sync_index_finish -> mail_index_sync_commit -> mail_index_write -> mail_transaction_log_rotate -> mail_transaction_log_file_create -> mail_transaction_log_file_create2 -> mail_transaction_log_file_lock -> mail_index_lock_fd -> file_wait_lock -> file_lock_do
The file being locked is the same, dovecot.index.log.
In another case, maildir_sync_index_begin is reached this way (outer lock): mailbox_sync -> mailbox_sync_init -> maildir_storage_sync_init -> maildir_sync_context -> maildir_sync_index_begin -> mail_index_sync_begin -> etc. And maildir_sync_index_finish is reached this way (inner lock): mailbox_sync -> mailbox_sync_init -> maildir_storage_sync_init -> maildir_sync_context -> maildir_sync_index_finish -> etc.
Could these lead to data corruption?
Please let me know if you need any more information. Thanks.