On Sep 24, 2008, at 10:03 PM, Allen Belletti wrote:
As best I can determine, the worst problems occur when certain users with very large Inboxes (~10k messages) receive new mail and their client looks up information about that message. GFS doesn't seem to efficiently handle the large directories that contain folders like this. As a result, lots of I/O ops are generated and performance suffers for everyone.
I am beginning to wonder if it might be more efficient to revert to
the old mbox format, with one file per folder (plus whatever indices are creates.) It seems that this ought to work better with GFS which is geared toward smaller numbers of larger files. Is anyone on the list currently doing that? Alternately, any thoughts regarding tuning or other options would be appreciated.
One possibility would be to use dbox format with hashed directories so
for each mailbox it could create n directories where to store the
messages. Two problems here though:
dbox code hasn't been tested all that much yet in real world (but
it works well in my stress tests)dbox doesn't yet support directory hashing, but it would be pretty
easy to implement.