Definitive guide to running ObjectiveFS on top of a Clustered File System?

Wed Jun 2 18:09:06 EEST 2021

Hey there,

I am working on migrating a Dovecot system with + 50.000 users / 5TB on
mail data to a new set-up that is better scalable. But I found it difficult
to find a good solution. In this email I lay out what I learned, in the
hope that others can profit. But also others can help me on this design
before I actually start migrating.

*My aim is to get a clustered file system setup for Dovecot. But with the
additional constraint of not having to pay on a per-mailbox basis to some
vendor. *

Why a clustered file system? I believe this yields the best solution. Your
machines are now stateless: consuming services. And scaling or recovery is
now relatively easy.

Why the constraint of not paying per-mailbox? If you did pay a fee per
mailbox; you probably would be better of just using a managed mail service
altogether.

High-level
========

We are going to use ObjectiveFS as our clustered file system (not involved
with them in any way). They use Fuse to make it behave like a POSIX
filesystem. Due to the way they compact files that belong together it
handles a large number of small files relatively fine. The great thing is:
we can mount this filesystem on multiple machines. This piece of software
is proprietary, but it's price is very reasonable, and it is a fixed
monthly cost. We can use S3, Google Cloud Storage, or others as the actual
storage system.

We store metadata about the mailboxes in a database that is available to
all machines. We can use any managed database service for this. We
explicitly set the proxy_host for each user in this database. Thus every
user gets routed to the same machine. We make sure to distributed the users
among the available machines.

All traffic is received by a TCP load balancer. This will forward the
traffic to the underlying machines. Ideally this is a managed cloud
service. I think there are two valid choices. But I prefer the first one:

   - Use a TCP pass-through load balancer. This way the source IP is
   retained without requiring any change to the protocol (an example of this
   is Google Cloud external TCP/UDP Network Load Balancing
   <https://cloud.google.com/load-balancing/docs/network>).
   - Use an TCP SSL terminating load balancer. And make it forward in
   HaProxy's PROXY format so we retain the source IP.

That traffic is now routed to N-machines. Each of those machines runs
Dovecot. And Dovecot is configured with Dovecot Proxy (not to be confused
with HaProxy's PROXY protocol). Such that Dovecot will proxy the connection
to another machine if required (or continue on this machine if you are
already on the correct proxy_host). I believe this can be done using
proxy_maybe
<https://doc.dovecot.org/configuration_manual/authentication/proxies/>.
Depending on the choice of load balancer you may need to configure Dovecot
to understand HaProxy's PROXY format.

We are going to use the maildir format. And store those files on the
ObjectiveFS filesystem. The index to Dovecot is stored on a local SSD.

Choice motivations
===============

ObjectiveFS:

   - Can probably be another clustered file system as well.
   - But ObjectiveFS works quite well because it has support for handling
   many small files. And can be configured with a local SSD disk.
   - Using this software can get a clustered file system to run quite easy.

Regarding the mailbox format:

   - dbox is said to be more performant. And having less files is an
   advantage in general. But one big disadvantage: because the index contains
   crucial information, you will need to place this index on the Clustered
   File System as well. Maildir does not have this problem, and that is why we
   prefer it.
   - obox is only available as Dovecot Pro, and solves a lot of problems.
   But you really can only get Dovecot Pro if you pay a fee per mailbox. We
   don't want that!

Why don't use Dovecot Director as well?

   - You can not run Dovecot Director on the same machine as your Dovecot
   Backend / proxy. Thus we would need to introduce 2 additional hosts to just
   accept and route the traffic. This would increase the complexity of our
   solution.
   - But it could be build on top of this set-up fairly easy.
   - I don't see too many advantages for Director however currently.
   - One big advantage *could* be if it would automatically remove failed
      nodes. However this only works if you use Dovemon
      <https://doc.dovecot.org/configuration_manual/dovemon/>. And that is
      packaged with OX/Dovecot and thus requires paying per user.
      - So in case you want to put a node down you get a slightly easier
      API to do so. Namely *doveadm director add <backend server ip> 0*.
      However we can simply imitate this by updating our database and changing
      the proxy_host to another destination as well.

Load balancer

   - You could also set-up your MX records to directly point to your
   Dovecot nodes. This should work as well I guess.
   - But I think the load balancer makes for better control, over changing
   and managing DNS.

Questions
========

In general I'd like to know whether this is a good idea!

But some other questions I have at this point that could make or break this
setup:

   - Can a Dovecot backend instance be configured to *also* run Dovecot
   proxy?
   - Can a Dovecot Proxy node receive HaProxy PROXY traffic?

Details
=====

Once I am sure I am on the right track I'd like to post the settings I use
for the software as well. But let's first discuss the stuff above!

Best,
Roel van Duijnhoven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20210602/92cdbf00/attachment.html>