Btrfs RAID-10 performance
John Stoffel
john at stoffel.org
Wed Sep 9 04:15:00 EEST 2020
>>>>> "Miloslav" == Miloslav Hůla <miloslav.hula at gmail.com> writes:
Miloslav> Hello,
Miloslav> I sent this into the Linux Kernel Btrfs mailing list and I got reply:
Miloslav> "RAID-1 would be preferable"
Miloslav> (https://lore.kernel.org/linux-btrfs/7b364356-7041-7d18-bd77-f60e0e2e2112@lechevalier.se/T/).
Miloslav> May I ask you for the comments as from people around the Dovecot?
Miloslav> We are using btrfs RAID-10 (/data, 4.7TB) on a physical Supermicro
Miloslav> server with Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz and 125GB of RAM.
Miloslav> We run 'btrfs scrub start -B -d /data' every Sunday as a cron task. It
Miloslav> takes about 50 minutes to finish.
Miloslav> # uname -a
Miloslav> Linux imap 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1 (2020-01-20) x86_64
Miloslav> GNU/Linux
Miloslav> RAID is a composition of 16 harddrives. Harddrives are connected via
Miloslav> AVAGO MegaRAID SAS 9361-8i as a RAID-0 devices. All harddrives are SAS
Miloslav> 2.5" 15k drives.
Can you post the output of "cat /proc/mdstat" or since you say you're
using btrfs, are you using their own RAID0 setup? If so, please post
the output of 'btrfs stats' or whatever the command is you use to view
layout info?
Miloslav> Server serves as a IMAP with Dovecot 2.2.27-3+deb9u6, 4104 accounts,
Miloslav> Mailbox format, LMTP delivery.
How ofter are these accounts hitting the server?
Miloslav> We run 'rsync' to remote NAS daily. It takes about 6.5 hours to finish,
Miloslav> 12'265'387 files last night.
That's.... sucky. So basically you're hitting the drives hard with
random IOPs and you're probably running out of performance. How much
space are you using on the filesystem?
And why not use brtfs send to ship off snapshots instead of using
rsync? I'm sure that would be an improvement...
Miloslav> Last half year, we encoutered into performace
Miloslav> troubles. Server load grows up to 30 in rush hours, due to
Miloslav> IO waits. We tried to attach next harddrives (the 838G ones
Miloslav> in a list below) and increase a free space by rebalace. I
Miloslav> think, it helped a little bit, not not so rapidly.
If you're IOPs bound, but not space bound, then you *really* want to
get an SSD in there for the indexes and such. Basically the stuff
that gets written/read from all the time no matter what, but which
isn't large in terms of space.
Also, adding in another controller card or two would also probably
help spread the load across more PCI channels, and reduce contention
on the SATA/SAS bus as well.
Miloslav> Is this a reasonable setup and use case for btrfs RAID-10?
Miloslav> If so, are there some recommendations to achieve better
Miloslav> performance?
1. move HOT data to SSD based volume RAID 1 pair. On a seperate
controller.
2. add more controllers, which also means you're more redundant in
case one controller fails.
3. Clone the system and put Dovecot IMAP director in from of the
setup.
4. Stop using rsync for copying to your DR site, use the btrfs snap
send, or whatever the commands are.
5. check which dovecot backend you're using and think about moving to
one which doesn't involve nearly as many files.
6. Find out who your biggest users are, in terms of emails and move
them to SSDs if step 1 is too hard to do at first.
Can you also grab some 'iostat -dhm 30 60' output, which is 30
minutes of data over 30 second intervals? That should help you narrow
down which (if any) disk is your hotspot.
It's not clear to me if you have one big btrfs filesystem, or a bunch
of smaller ones stiched together. In any case, it should be very easy
to get better performance here.
I think someone else mentioned that you should look at your dovecot
backend, and you should move to the fastest one you can find.
Good luck!
John
Miloslav> # megaclisas-status
Miloslav> -- Controller information --
Miloslav> -- ID | H/W Model | RAM | Temp | BBU | Firmware
Miloslav> c0 | AVAGO MegaRAID SAS 9361-8i | 1024MB | 72C | Good | FW:
Miloslav> 24.16.0-0082
Miloslav> -- Array information --
Miloslav> -- ID | Type | Size | Strpsz | Flags | DskCache | Status | OS
Miloslav> Path | CacheCade |InProgress
Miloslav> c0u0 | RAID-0 | 838G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdq | None |None
Miloslav> c0u1 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sda | None |None
Miloslav> c0u2 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdb | None |None
Miloslav> c0u3 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdc | None |None
Miloslav> c0u4 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdd | None |None
Miloslav> c0u5 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sde | None |None
Miloslav> c0u6 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdf | None |None
Miloslav> c0u7 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdg | None |None
Miloslav> c0u8 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdh | None |None
Miloslav> c0u9 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdi | None |None
Miloslav> c0u10 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdj | None |None
Miloslav> c0u11 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdk | None |None
Miloslav> c0u12 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdl | None |None
Miloslav> c0u13 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdm | None |None
Miloslav> c0u14 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdn | None |None
Miloslav> c0u15 | RAID-0 | 838G | 256 KB | RA,WB | Enabled | Optimal |
Miloslav> /dev/sdr | None |None
Miloslav> -- Disk information --
Miloslav> -- ID | Type | Drive Model | Size | Status
Miloslav> | Speed | Temp | Slot ID | LSI ID
Miloslav> c0u0p0 | HDD | SEAGATE ST900MP0006 N003WAG0Q3S3 | 837.8 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 53C | [8:14] | 32
Miloslav> c0u1p0 | HDD | HGST HUC156060CSS200 A3800XV250TJ | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 38C | [8:0] | 12
Miloslav> c0u2p0 | HDD | HGST HUC156060CSS200 A3800XV3XT4J | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 43C | [8:1] | 11
Miloslav> c0u3p0 | HDD | HGST HUC156060CSS200 ADB05ZG4XLZU | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 46C | [8:2] | 25
Miloslav> c0u4p0 | HDD | HGST HUC156060CSS200 A3800XV3DWRL | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 48C | [8:3] | 14
Miloslav> c0u5p0 | HDD | HGST HUC156060CSS200 A3800XV3XZTL | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 52C | [8:4] | 18
Miloslav> c0u6p0 | HDD | HGST HUC156060CSS200 A3800XV3VSKJ | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 55C | [8:5] | 15
Miloslav> c0u7p0 | HDD | SEAGATE ST600MP0006 N003WAF1LWKE | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 56C | [8:6] | 28
Miloslav> c0u8p0 | HDD | HGST HUC156060CSS200 A3800XV3XTDJ | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 55C | [8:7] | 20
Miloslav> c0u9p0 | HDD | HGST HUC156060CSS200 A3800XV3T8XL | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 57C | [8:8] | 19
Miloslav> c0u10p0 | HDD | HGST HUC156060CSS200 A7030XHL0ZYP | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 61C | [8:9] | 23
Miloslav> c0u11p0 | HDD | HGST HUC156060CSS200 ADB05ZG4VR3P | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 60C | [8:10] | 24
Miloslav> c0u12p0 | HDD | SEAGATE ST600MP0006 N003WAF195KA | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 60C | [8:11] | 29
Miloslav> c0u13p0 | HDD | SEAGATE ST600MP0006 N003WAF1LTZW | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 56C | [8:12] | 26
Miloslav> c0u14p0 | HDD | SEAGATE ST600MP0006 N003WAF1LWH6 | 558.4 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 55C | [8:13] | 27
Miloslav> c0u15p0 | HDD | SEAGATE ST900MP0006 N003WAG0Q414 | 837.8 Gb | Online,
Miloslav> Spun Up | 12.0Gb/s | 47C | [8:15] | 33
Miloslav> # btrfs --version
Miloslav> btrfs-progs v4.7.3
Miloslav> # btrfs fi show
Miloslav> Label: 'DATA' uuid: 5b285a46-e55d-4191-924f-0884fa06edd8
Miloslav> Total devices 16 FS bytes used 3.49TiB
Miloslav> devid 1 size 558.41GiB used 448.66GiB path /dev/sda
Miloslav> devid 2 size 558.41GiB used 448.66GiB path /dev/sdb
Miloslav> devid 4 size 558.41GiB used 448.66GiB path /dev/sdd
Miloslav> devid 5 size 558.41GiB used 448.66GiB path /dev/sde
Miloslav> devid 7 size 558.41GiB used 448.66GiB path /dev/sdg
Miloslav> devid 8 size 558.41GiB used 448.66GiB path /dev/sdh
Miloslav> devid 9 size 558.41GiB used 448.66GiB path /dev/sdf
Miloslav> devid 10 size 558.41GiB used 448.66GiB path /dev/sdi
Miloslav> devid 11 size 558.41GiB used 448.66GiB path /dev/sdj
Miloslav> devid 13 size 558.41GiB used 448.66GiB path /dev/sdk
Miloslav> devid 14 size 558.41GiB used 448.66GiB path /dev/sdc
Miloslav> devid 15 size 558.41GiB used 448.66GiB path /dev/sdl
Miloslav> devid 16 size 558.41GiB used 448.66GiB path /dev/sdm
Miloslav> devid 17 size 558.41GiB used 448.66GiB path /dev/sdn
Miloslav> devid 18 size 837.84GiB used 448.66GiB path /dev/sdr
Miloslav> devid 19 size 837.84GiB used 448.66GiB path /dev/sdq
Miloslav> # btrfs fi df /data/
Miloslav> Data, RAID10: total=3.48TiB, used=3.47TiB
Miloslav> System, RAID10: total=256.00MiB, used=320.00KiB
Miloslav> Metadata, RAID10: total=21.00GiB, used=18.17GiB
Miloslav> GlobalReserve, single: total=512.00MiB, used=0.00B
Miloslav> I do not attach whole dmesg log. It is almost empty, without errors.
Miloslav> Only lines about BTRFS are about relocations, like:
Miloslav> BTRFS info (device sda): relocating block group 29435663220736 flags 65
Miloslav> BTRFS info (device sda): found 54460 extents
Miloslav> BTRFS info (device sda): found 54459 extents
More information about the dovecot
mailing list