Timo Sirainen put forth on 10/20/2010 11:53 AM:
On Tue, 2010-10-19 at 21:55 -0500, Stan Hoeppner wrote:
Any chance the mbox/mdbox writer code could be modified to do physical preallocation on files to help avoid file(system) fragmentation?
I've been thinking about that before.
"What you want is _physical_ preallocation, not speculative preallocation. i.e. look up XFS_IOC_RESVSP or FIEMAP so your application does _permanent_ preallocate past EOF.
Oh, interesting. I didn't know that was possible. And even better: Linux has fallocate() that can do it for other filesystems than just XFS. Or looks like it's only XFS and ext4 (ext3 doesn't support it). I don't know if other OSes support this. Maybe in future I could make mdbox support writing to files whose size has been preallocated by actually writing NUL bytes, but that requires some extra code.
http://hg.dovecot.org/dovecot-2.0/rev/22c81f884032 http://hg.dovecot.org/dovecot-2.0/rev/b884441a713f
There exists posix_fallocate() which would widen the platforms that would support this Timo. You may also want to look at posix_fadvise() as well (if you're not using it already) which might increase Dovecot's overall disk performance a bit.
NOTE: I don't believe fallocate() in either posix or linux only form will actually accomplish decreased m[d]box file fragmentation. I don't believe it actually increases the file size on disk, i.e. physically allocating additional free extents tailing the end of the file. fallocate() is _speculative_ preallocation, which isn't what you want. mbox and mdbox file _will_ grow, so you'd want _physical_ preallocation. I'm not sure if physical preallocation requires writing a bunch of zeros to the end of the file or not. I don't "think" it does. I think you can extend the size of the file past EOF to grow the file and the remainder is just left at nulls or something. Again, I've not a dev. I know just barely enough about this stuff to get myself into real trouble. ;)
See these comments:
On Tue, Oct 19, 2010 at 10:03:19PM -0500, Stan Hoeppner wrote:
Dave Chinner put forth on 10/19/2010 6:42 PM:
I've explained how allocsize works, and that speculative allocation gets truncated away whenteh file is closed. Hence is the application is doing:
open() seek(EOF) write() close()
I don't know if it changes anything in the sequence above, but Dovecot uses mmap i/o. As I've said, I'm not a dev. Just thought this could/might be relevant. Would using mmap be compatible with physical preallocation? mmap() can't write beyond EOF or extend the file. hence it would have to be:
open()
mmap()
ftrucate(new_size)
<write via mmap>
In this method, there is no speculative preallocation because the there is never a delayed allocation that extends the file size. it simply doesn't matter where the close() occurs. Hence if you use mmap() writes like this, the only way you can avoid fragmentation is to use physical preallocation beyond EOF before you start any writes....
It would be beneficial I think if you'd sub to the xfs list Timo and pick some brains. All the devs there are Linux devs but have experience with many platforms including IRIX and other UNIX variants. Most if not all of them have been developing on UNIX systems their entire careers, and only UNIX. They could answer any question you have about the Linux IO subsystem, not just XFS specific stuff. Some are current SGI employees some former, some Redhat, etc. They could probably answer any posix call questions you might have as well.
http://oss.sgi.com/mailman/listinfo/xfs
-- Stan