"Miloslav" == Miloslav Hůla <miloslav.hula@gmail.com> writes:
Miloslav> Dne 10.09.2020 v 17:40 John Stoffel napsal(a):
So why not run the backend storage on the Netapp, and just keep the indexes and such local to the system? I've run Netapps for many years and they work really well. And then you'd get automatic backups using schedule snapshots.
Keep the index files local on disk/SSDs and put the maildirs out to NFSv3 volume(s) on the Netapp(s). Should do wonders. And you'll stop needing to do rsync at night.
Miloslav> It's the option we have in minds. As you wrote, NetApp is very solid. Miloslav> The main reason for local storage is, that IMAP server is completely Miloslav> isolated from network. But maybe one day will use it.
It's not completely isolated, it can rsync data to another host that has access to the Netapp. *grin*
Miloslav> :o)
Miloslav> Unfortunately, to quickly fix the problem and make server Miloslav> usable again, we already added SSD and moved indexes on Miloslav> it. So we have no measurements in old state.
That's ok, if it's better, then its better. How is the load now? Looking at the output of 'iostat -x 30' might be a good thing.
Miloslav> Load is between 1 and 2. We can live with that for now.
Has IMAP access gotten faster or more consistent under load? That's the key takeaway, not system load, since the LoadAvg isn't really a good measure on Linux.
Basically, has your IO pattern or IO wait times improved?
Miloslav> Situation is better, but I guess, problem still exists. I Miloslav> takes some time to load be growing. We will see.
Hmm... how did you setup the new indexes volume? Did you just use btrfs again? Did you mirror your SSDs as well?
Miloslav> Yes. Just two SSD into free slots, propagate them as two RAID-0 into OS Miloslav> and btrfs RAID-1.
Miloslav> It is a nasty, I know, but without outage. It is a just quick attempt to Miloslav> improve the situation. Our next plan is to buy more controllers, Miloslav> schedule an outage on weekend and do it properly.
That is a good plan in any case.
Do the indexes fill the SSD, or is there 20-30% free space? When an SSD gets fragmented, it's performance can drop quite a bit. Did you put the SSDs onto a seperate controller? Probably not. So now you've just increased the load on the single controller, when you really should be spreading it out more to improve things.
Miloslav> SSD are almost empty, 2.4GB of 93GB is used after 'doveadm Miloslav> index' on all mailboxes.
Interesting. I wonder if there's other dovecot files that could be moved over to increase speed because they're IOPs or IO bound still?
Another possible hack would be to move some stuff to a RAM disk, assuming your server is on a UPS/Generator incase of power loss. But that's an unsafe hack.
Also, do you have quotas turned on? That's a performance hit for sure.
Miloslav> No, we are running without quotas.
By quotas, I mean btrfs quotas, just to be clear.
Miloslav> Thank you for the fio tip. Definetly I'll try that.
Please do! Getting some numbers from there will let you at least document your changes in performance.
But overall, if sounds like you've made some progress and gotten better performance.