Stan Hoeppner stan@hardwarefreak.com wrote:
It's highly likely your problems can be solved without the drastic architecture change, and new problems it will introduce, that you describe below.
The main reason is I need to replace the hardware as its service contract ends this year and I am not able to extend it further.
The box so far is fine, there are normally no problems during normal operations with speed or responsiveness towards the end-user.
Sometimes, higher peak loads tend to strain the system a bit and this is starting to occur more often.
First thought was to move this setup into our VMware cluster (yeah, I know, spare me the screams), since the hardware used there is way more powerfull than the hardware used now and I wouldn't have to buy new servers for my mail system (which is kind of painful to do in an universitary environment, especially in Germany, if you want to invest an amount of money above a certain amount).
But then I thought about the problems with VMs this size and got to the idea with the distributed setup, splitting the one server into 4 or 6 backend servers.
As I said: "idea". Other ideas making my life easier are more than welcome.
Ideas? Suggestions? Nudges in the right direction?
Yes. We need more real information. Please provide:
- Mailbox count, total maildir file count and size
about 10,000 Maildir++ boxes
900GB for 1300GB used, "df -i" says 11 million inodes used
I know, this is very _tiny_ compared to the systems ISPs are using.
- Average/peak concurrent user connections
IMAP: Average 800 concurrent user connections, peaking at about 1400. POP3: Average 300 concurrent user connections, peaking at about 600.
- CPU type/speed/total core count, total RAM, free RAM (incl buffers)
Currently dual-core AMD Opteron 2210, 1.8GHz.
Right now, in the middle of the night (2:30 AM here) on a Sunday, thus a low point in the usage pattern:
total used free shared buffers cached
Mem: 12335820 9720252 2615568 0 53112 680424 -/+ buffers/cache: 8986716 3349104 Swap: 5855676 10916 5844760
System reaches its 7 year this summer which is the end of its service contract.
- Storage configuration--total spindles, RAID level, hard or soft RAID
RAID 6 with 12 SATA1.5 disks, external 4Gbit FC
Back in 2005, a SAS enclosure was way to expensive for us to afford.
- Filesystem type
XFS in a LVM to allow snapshots for backup
I of course aligned the partions on the RAID correctly and of course created a filesystem with the correct parameters wrt. spindels, chunk size, etc.
- Backup software/method
Full backup with Bacula, taking about 24 hours right now. Because of this, I switched to virtual full backups, only ever doing incremental and differental backups off of the real system and creating synthetic full backups inside Bacula. Works fine though, incremental taking 2 hours, differential about 4 hours.
The main problem of the backup time is Maildir++. During a test, I copied the mail storage to a spare box, converted it to mdbox (50MB file size) and the backup was lightning fast compared to the Maildir++ format.
Additonally compressing the mails inside the mdbox and not having Bacula compress them for me reduce the backup time further (and speeding up the access through IMAP and POP3).
So this is the way to go, I think, regardless of which way I implement the backend mail server.
- Operating system
Debian Linux Lenny, currently with kernel 2.6.39
Instead of telling us what you think the solution to your unidentified bottleneck is and then asking "yeah or nay", tell us what the problem is and allow us to recommend solutions.
I am not asking for "yay or nay", I just pointed out my idea, but I am open to other suggestions.
If the general idea is to buy a new big single storage system, I am more than happy to do just this, because this will prevent any problems I might have with a distributed one before they even can occur.
Maybe two HP DL180s (one for production and one as test/standby-system) with an SAS attached enclosure for storage?
Keeping in mind the new system has to work for some time (again 5 to 7 years) I have to be able to extend the storage space without to much hassle.
Grüße, S°
-- Sigmentation fault. Core dumped.