<html>
<head>
<style type="text/css">
body,p,td,div,span{
font-size:14px;font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol";
};
body p{
margin:0px;
}
</style>
</head>
<body>The 9361-8i does support passthrough ( JBOD mode ). Make sure you have the latest firmware.<div><br><br>On Wednesday, 09/09/2020 at 03:55 Miloslav Hůla wrote:<br><blockquote style="border:0;border-left: 2px solid #22437f; padding:0px; margin:0px; padding-left:5px; margin-left: 5px; ">Hi, thank you for your reply. I'll continue inline...<br>
<br>
Dne 09.09.2020 v 3:15 John Stoffel napsal(a):<br>
> Miloslav> Hello,<br>
> Miloslav> I sent this into the Linux Kernel Btrfs mailing list and I got reply:<br>
> Miloslav> "RAID-1 would be preferable"<br>
> Miloslav> (<a href="https://lore.kernel.org/linux-btrfs/7b364356-7041-7d18-bd77-f60e0e2e2112@lechevalier.se/T" target="_blank" class="normal-link">https://lore.kernel.org/linux-btrfs/7b364356-7041-7d18-bd77-f60e0e2e2112@lechevalier.se/T</a>/).<br>
> Miloslav> May I ask you for the comments as from people around the Dovecot?<br>
> <br>
> <br>
> Miloslav> We are using btrfs RAID-10 (/data, 4.7TB) on a physical Supermicro<br>
> Miloslav> server with Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz and 125GB of RAM.<br>
> Miloslav> We run 'btrfs scrub start -B -d /data' every Sunday as a cron task. It<br>
> Miloslav> takes about 50 minutes to finish.<br>
> <br>
> Miloslav> # uname -a<br>
> Miloslav> Linux imap 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1 (2020-01-20) x86_64<br>
> Miloslav> GNU/Linux<br>
> <br>
> Miloslav> RAID is a composition of 16 harddrives. Harddrives are connected via<br>
> Miloslav> AVAGO MegaRAID SAS 9361-8i as a RAID-0 devices. All harddrives are SAS<br>
> Miloslav> 2.5" 15k drives.<br>
> <br>
> Can you post the output of "cat /proc/mdstat" or since you say you're<br>
> using btrfs, are you using their own RAID0 setup? If so, please post<br>
> the output of 'btrfs stats' or whatever the command is you use to view<br>
> layout info?<br>
<br>
There is a one PCIe RAID controller in a chasis. AVAGO MegaRAID SAS <br>
9361-8i. And 16x SAS 15k drives conneced to it. Because the controller <br>
does not support pass-through for the drives, we use 16x RAID-0 on <br>
controller. So, we get /dev/sda ... /dev/sdp (roughly) in OS. And over <br>
that we have single btrfs RAID-10, composed of 16 devices, mounted as /data.<br>
<br>
We have chosen this wiring for severeal reasons:<br>
- easy to increase a capacity<br>
- easy to replace drives by larger ones<br>
- due to checksuming, btrfs does not need fsck in case of power failure<br>
- btrfs scrub discovers failing drive sooner than S.M.A.R.T. or RAID <br>
controller<br>
<br>
<br>
> Miloslav> Server serves as a IMAP with Dovecot 2.2.27-3+deb9u6, 4104 accounts,<br>
> Miloslav> Mailbox format, LMTP delivery.<br>
> <br>
> How ofter are these accounts hitting the server?<br>
<br>
IMAP serves for a univesity. So there are typical rush hours from 7AM to <br>
3PM. Lowers during the evening, almost not used during the night.<br>
<br>
<br>
> Miloslav> We run 'rsync' to remote NAS daily. It takes about 6.5 hours to finish,<br>
> Miloslav> 12'265'387 files last night.<br>
> <br>
> That's.... sucky. So basically you're hitting the drives hard with<br>
> random IOPs and you're probably running out of performance. How much<br>
> space are you using on the filesystem?<br>
<br>
It's not so sucky how it seems. rsync runs during the night. And even <br>
reading is high, server load stays low. We have problems with writes.<br>
<br>
<br>
> And why not use brtfs send to ship off snapshots instead of using<br>
> rsync? I'm sure that would be an improvement...<br>
<br>
We run backup to external NAS (NetApp) for a disaster recovery scenario. <br>
Moreover NAS is spreaded across multiple locations. Then we create NAS <br>
snapshot, tens days backward. All snapshots easily available via NFS <br>
mount. And NAS capacity is cheaper.<br>
<br>
<br>
> Miloslav> Last half year, we encoutered into performace<br>
> Miloslav> troubles. Server load grows up to 30 in rush hours, due to<br>
> Miloslav> IO waits. We tried to attach next harddrives (the 838G ones<br>
> Miloslav> in a list below) and increase a free space by rebalace. I<br>
> Miloslav> think, it helped a little bit, not not so rapidly.<br>
> <br>
> If you're IOPs bound, but not space bound, then you *really* want to<br>
> get an SSD in there for the indexes and such. Basically the stuff<br>
> that gets written/read from all the time no matter what, but which<br>
> isn't large in terms of space.<br>
<br>
Yes. We are now on 66% capacity. Adding SSD for indexes is our next step.<br>
<br>
<br>
> Also, adding in another controller card or two would also probably<br>
> help spread the load across more PCI channels, and reduce contention<br>
> on the SATA/SAS bus as well.<br>
<br>
Probably we will wait how SSD helps first, but as you wrote, it is <br>
possible next step.<br>
<br>
> Miloslav> Is this a reasonable setup and use case for btrfs RAID-10?<br>
> Miloslav> If so, are there some recommendations to achieve better<br>
> Miloslav> performance?<br>
> <br>
> 1. move HOT data to SSD based volume RAID 1 pair. On a seperate<br>
> controller.<br>
<br>
OK<br>
<br>
> 2. add more controllers, which also means you're more redundant in<br>
> case one controller fails.<br>
<br>
OK<br>
<br>
> 3. Clone the system and put Dovecot IMAP director in from of the<br>
> setup.<br>
<br>
I still hope that one server can handle 4105 accounts.<br>
<br>
> 4. Stop using rsync for copying to your DR site, use the btrfs snap<br>
> send, or whatever the commands are.<br>
<br>
I hope it is not needed in our scenario.<br>
<br>
> 5. check which dovecot backend you're using and think about moving to<br>
> one which doesn't involve nearly as many files.<br>
<br>
Maildir is comfortable for us. Time to time, users call us with: "I <br>
accidentally deleted the folder" and it is super easy to copy it back <br>
from backup.<br>
<br>
> 6. Find out who your biggest users are, in terms of emails and move<br>
> them to SSDs if step 1 is too hard to do at first.<br>
<br>
OK<br>
<br>
> Can you also grab some 'iostat -dhm 30 60' output, which is 30<br>
> minutes of data over 30 second intervals? That should help you narrow<br>
> down which (if any) disk is your hotspot.<br>
<br>
OK, thanks for the tip.<br>
<br>
> It's not clear to me if you have one big btrfs filesystem, or a bunch<br>
> of smaller ones stiched together. In any case, it should be very easy<br>
> to get better performance here.<br>
<br>
I hope I've made it clear above.<br>
<br>
> I think someone else mentioned that you should look at your dovecot<br>
> backend, and you should move to the fastest one you can find.<br>
> <br>
> Good luck!<br>
> John<br>
<br>
Thank you for your time and advices!<br>
<br>
Kind regards<br>
Milo<br>
<br>
<br>
> Miloslav> # megaclisas-status<br>
> Miloslav> -- Controller information --<br>
> Miloslav> -- ID | H/W Model | RAM | Temp | BBU | Firmware<br>
> Miloslav> c0 | AVAGO MegaRAID SAS 9361-8i | 1024MB | 72C | Good | FW:<br>
> Miloslav> 24.16.0-0082<br>
> <br>
> Miloslav> -- Array information --<br>
> Miloslav> -- ID | Type | Size | Strpsz | Flags | DskCache | Status | OS<br>
> Miloslav> Path | CacheCade |InProgress<br>
> Miloslav> c0u0 | RAID-0 | 838G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdq | None |None<br>
> Miloslav> c0u1 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sda | None |None<br>
> Miloslav> c0u2 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdb | None |None<br>
> Miloslav> c0u3 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdc | None |None<br>
> Miloslav> c0u4 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdd | None |None<br>
> Miloslav> c0u5 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sde | None |None<br>
> Miloslav> c0u6 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdf | None |None<br>
> Miloslav> c0u7 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdg | None |None<br>
> Miloslav> c0u8 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdh | None |None<br>
> Miloslav> c0u9 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdi | None |None<br>
> Miloslav> c0u10 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdj | None |None<br>
> Miloslav> c0u11 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdk | None |None<br>
> Miloslav> c0u12 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdl | None |None<br>
> Miloslav> c0u13 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdm | None |None<br>
> Miloslav> c0u14 | RAID-0 | 558G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdn | None |None<br>
> Miloslav> c0u15 | RAID-0 | 838G | 256 KB | RA,WB | Enabled | Optimal |<br>
> Miloslav> /dev/sdr | None |None<br>
> <br>
> Miloslav> -- Disk information --<br>
> Miloslav> -- ID | Type | Drive Model | Size | Status<br>
> Miloslav> | Speed | Temp | Slot ID | LSI ID<br>
> Miloslav> c0u0p0 | HDD | SEAGATE ST900MP0006 N003WAG0Q3S3 | 837.8 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 53C | [8:14] | 32<br>
> Miloslav> c0u1p0 | HDD | HGST HUC156060CSS200 A3800XV250TJ | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 38C | [8:0] | 12<br>
> Miloslav> c0u2p0 | HDD | HGST HUC156060CSS200 A3800XV3XT4J | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 43C | [8:1] | 11<br>
> Miloslav> c0u3p0 | HDD | HGST HUC156060CSS200 ADB05ZG4XLZU | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 46C | [8:2] | 25<br>
> Miloslav> c0u4p0 | HDD | HGST HUC156060CSS200 A3800XV3DWRL | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 48C | [8:3] | 14<br>
> Miloslav> c0u5p0 | HDD | HGST HUC156060CSS200 A3800XV3XZTL | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 52C | [8:4] | 18<br>
> Miloslav> c0u6p0 | HDD | HGST HUC156060CSS200 A3800XV3VSKJ | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 55C | [8:5] | 15<br>
> Miloslav> c0u7p0 | HDD | SEAGATE ST600MP0006 N003WAF1LWKE | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 56C | [8:6] | 28<br>
> Miloslav> c0u8p0 | HDD | HGST HUC156060CSS200 A3800XV3XTDJ | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 55C | [8:7] | 20<br>
> Miloslav> c0u9p0 | HDD | HGST HUC156060CSS200 A3800XV3T8XL | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 57C | [8:8] | 19<br>
> Miloslav> c0u10p0 | HDD | HGST HUC156060CSS200 A7030XHL0ZYP | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 61C | [8:9] | 23<br>
> Miloslav> c0u11p0 | HDD | HGST HUC156060CSS200 ADB05ZG4VR3P | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 60C | [8:10] | 24<br>
> Miloslav> c0u12p0 | HDD | SEAGATE ST600MP0006 N003WAF195KA | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 60C | [8:11] | 29<br>
> Miloslav> c0u13p0 | HDD | SEAGATE ST600MP0006 N003WAF1LTZW | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 56C | [8:12] | 26<br>
> Miloslav> c0u14p0 | HDD | SEAGATE ST600MP0006 N003WAF1LWH6 | 558.4 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 55C | [8:13] | 27<br>
> Miloslav> c0u15p0 | HDD | SEAGATE ST900MP0006 N003WAG0Q414 | 837.8 Gb | Online,<br>
> Miloslav> Spun Up | 12.0Gb/s | 47C | [8:15] | 33<br>
> <br>
> <br>
> <br>
> Miloslav> # btrfs --version<br>
> Miloslav> btrfs-progs v4.7.3<br>
> <br>
> <br>
> <br>
> Miloslav> # btrfs fi show<br>
> Miloslav> Label: 'DATA' uuid: 5b285a46-e55d-4191-924f-0884fa06edd8<br>
> Miloslav> Total devices 16 FS bytes used 3.49TiB<br>
> Miloslav> devid 1 size 558.41GiB used 448.66GiB path /dev/sda<br>
> Miloslav> devid 2 size 558.41GiB used 448.66GiB path /dev/sdb<br>
> Miloslav> devid 4 size 558.41GiB used 448.66GiB path /dev/sdd<br>
> Miloslav> devid 5 size 558.41GiB used 448.66GiB path /dev/sde<br>
> Miloslav> devid 7 size 558.41GiB used 448.66GiB path /dev/sdg<br>
> Miloslav> devid 8 size 558.41GiB used 448.66GiB path /dev/sdh<br>
> Miloslav> devid 9 size 558.41GiB used 448.66GiB path /dev/sdf<br>
> Miloslav> devid 10 size 558.41GiB used 448.66GiB path /dev/sdi<br>
> Miloslav> devid 11 size 558.41GiB used 448.66GiB path /dev/sdj<br>
> Miloslav> devid 13 size 558.41GiB used 448.66GiB path /dev/sdk<br>
> Miloslav> devid 14 size 558.41GiB used 448.66GiB path /dev/sdc<br>
> Miloslav> devid 15 size 558.41GiB used 448.66GiB path /dev/sdl<br>
> Miloslav> devid 16 size 558.41GiB used 448.66GiB path /dev/sdm<br>
> Miloslav> devid 17 size 558.41GiB used 448.66GiB path /dev/sdn<br>
> Miloslav> devid 18 size 837.84GiB used 448.66GiB path /dev/sdr<br>
> Miloslav> devid 19 size 837.84GiB used 448.66GiB path /dev/sdq<br>
> <br>
> <br>
> <br>
> Miloslav> # btrfs fi df /data/<br>
> Miloslav> Data, RAID10: total=3.48TiB, used=3.47TiB<br>
> Miloslav> System, RAID10: total=256.00MiB, used=320.00KiB<br>
> Miloslav> Metadata, RAID10: total=21.00GiB, used=18.17GiB<br>
> Miloslav> GlobalReserve, single: total=512.00MiB, used=0.00B<br>
> <br>
> <br>
> <br>
> Miloslav> I do not attach whole dmesg log. It is almost empty, without errors.<br>
> Miloslav> Only lines about BTRFS are about relocations, like:<br>
> <br>
> Miloslav> BTRFS info (device sda): relocating block group 29435663220736 flags 65<br>
> Miloslav> BTRFS info (device sda): found 54460 extents<br>
> Miloslav> BTRFS info (device sda): found 54459 extents<br>
></blockquote></div></body></html>