Replicator: Panic: data stack: Out of memory

Paul Kudla (SCOM.CA Internet Services Inc.) paul at scom.ca
Sat Jun 4 12:39:58 UTC 2022


actually suggestion below is a good idea

run

ps -axww (or top)

to list active processes this will give you some hints

top is better for overall memory

i also have a perl script that will show actual memory useage, free etc

utilities like this are handy to have

also i found i had to set in dovecot.conf

default_process_limit = 16384

also are you running debug ?

auth_debug = no
auth_debug_passwords = no

mail_debug = no

ie set debug to = yes?

might give more detail if this is really a dovecot issue.

other background processes can eat memory

I run mailscanner for example and someone every one in a while tries to 
crash it!

it recovers but lord knows



mem outputs :

# mem

SYSTEM MEMORY SUMMARY:
mem_used:           16GB [ 12%] Logically used memory
mem_avail:   +     111GB [ 87%] Logically available memory
-------------- ------------ ----------- ------
mem_total:   =     128GB [100%] Logically total memory

SYSTEM MEMORY INFORMATION:
mem_wire:           13GB [ 10%] Wired: disabled for paging out
mem_active:  +       0GB [  0%] Active: recently referenced
mem_inactive:+      71GB [ 57%] Inactive: recently not referenced
mem_cache:   +       0GB [  0%] Cached: almost avail. for allocation
mem_free:    +      40GB [ 32%] Free: fully available for allocation
mem_gap_vm:  +       0GB [  0%] Memory gap: UNKNOWN
-------------- ------------ ----------- ------
mem_all:     =     124GB [100%] Total real memory managed
mem_gap_sys: +       3GB        Memory gap: Kernel?!
-------------- ------------ -----------
mem_phys:    =     127GB        Total real memory available
mem_gap_hw:  +       0GB        Memory gap: Segment Mappings?!
-------------- ------------ -----------
mem_hw:      =     128GB        Total real memory installed


-----------------------------------------------------------------------
# cat /programs/common/mem
#!/usr/local/bin/perl
##
##  freebsd-memory -- List Total System Memory Usage
##  Copyright (c) 2003-2004 Ralf S. Engelschall <rse at engelschall.com>
##
##  Redistribution and use in source and binary forms, with or without
##  modification, are permitted provided that the following conditions
##  are met:
##  1. Redistributions of source code must retain the above copyright
##     notice, this list of conditions and the following disclaimer.
##  2. Redistributions in binary form must reproduce the above copyright
##     notice, this list of conditions and the following disclaimer in the
##     documentation and/or other materials provided with the distribution.
##
##  THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
##  ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
##  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 
PURPOSE
##  ARE DISCLAIMED.  IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE
##  FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
CONSEQUENTIAL
##  DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
##  OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
##  HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 
STRICT
##  LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
ANY WAY
##  OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
##  SUCH DAMAGE.
##

#   query the system through the generic sysctl(8) interface
#   (this does not require special priviledges)
my $sysctl = {};
my $sysctl_output = `/sbin/sysctl -a`;
foreach my $line (split(/\n/, $sysctl_output)) {
     if ($line =~ m/^([^:]+):\s+(.+)\s*$/s) {
         $sysctl->{$1} = $2;
     }
}

#   round the physical memory size to the next power of two which is
#   reasonable for memory cards. We do this by first determining the
#   guessed memory card size under the assumption that usual computer
#   hardware has an average of a maximally eight memory cards installed
#   and those are usually of equal size.
sub mem_rounded {
     my ($mem_size) = @_;
     my $chip_size  = 1;
     my $chip_guess = ($mem_size / 8) - 1;
     while ($chip_guess != 0) {
         $chip_guess >>= 1;
         $chip_size  <<= 1;
     }
     my $mem_round = (int($mem_size / $chip_size) + 1) * $chip_size;
     return $mem_round;
}

#   determine the individual known information
#   NOTICE: forget hw.usermem, it is just (hw.physmem - 
vm.stats.vm.v_wire_count).
#   NOTICE: forget vm.stats.misc.zero_page_count, it is just the subset of
#           vm.stats.vm.v_free_count which is already pre-zeroed.
my $mem_hw        = &mem_rounded($sysctl->{"hw.physmem"});
my $mem_phys      = $sysctl->{"hw.physmem"};
my $mem_all       = $sysctl->{"vm.stats.vm.v_page_count"}      * 
$sysctl->{"hw.pagesize"};
my $mem_wire      = $sysctl->{"vm.stats.vm.v_wire_count"}      * 
$sysctl->{"hw.pagesize"};
my $mem_active    = $sysctl->{"vm.stats.vm.v_active_count"}    * 
$sysctl->{"hw.pagesize"};
my $mem_inactive  = $sysctl->{"vm.stats.vm.v_inactive_count"}  * 
$sysctl->{"hw.pagesize"};
my $mem_cache     = $sysctl->{"vm.stats.vm.v_cache_count"}     * 
$sysctl->{"hw.pagesize"};
my $mem_free      = $sysctl->{"vm.stats.vm.v_free_count"}      * 
$sysctl->{"hw.pagesize"};

#   determine the individual unknown information
my $mem_gap_vm    = $mem_all - ($mem_wire + $mem_active + $mem_inactive 
+ $mem_cache + $mem_free);
my $mem_gap_sys   = $mem_phys - $mem_all;
my $mem_gap_hw    = $mem_hw   - $mem_phys;

#   determine logical summary information
my $mem_total = $mem_hw;
my $mem_avail = $mem_inactive + $mem_cache + $mem_free;
my $mem_used  = $mem_total - $mem_avail;

#   information annotations
my $info = {
     "mem_wire"     => 'Wired: disabled for paging out',
     "mem_active"   => 'Active: recently referenced',
     "mem_inactive" => 'Inactive: recently not referenced',
     "mem_cache"    => 'Cached: almost avail. for allocation',
     "mem_free"     => 'Free: fully available for allocation',
     "mem_gap_vm"   => 'Memory gap: UNKNOWN',
     "mem_all"      => 'Total real memory managed',
     "mem_gap_sys"  => 'Memory gap: Kernel?!',
     "mem_phys"     => 'Total real memory available',
     "mem_gap_hw"   => 'Memory gap: Segment Mappings?!',
     "mem_hw"       => 'Total real memory installed',
     "mem_used"     => 'Logically used memory',
     "mem_avail"    => 'Logically available memory',
     "mem_total"    => 'Logically total memory',
};

#   print system results
printf("\n");

printf("SYSTEM MEMORY SUMMARY:\n");
printf("mem_used:      %7dGB [%3d%%] %s\n", $mem_used  / 
(1024*1024*1024), ($mem_used  / $mem_total) * 100, $info->{"mem_used"});
printf("mem_avail:   + %7dGB [%3d%%] %s\n", $mem_avail / 
(1024*1024*1024), ($mem_avail / $mem_total) * 100, $info->{"mem_avail"});
printf("-------------- ------------ ----------- ------\n");
printf("mem_total:   = %7dGB [100%%] %s\n", $mem_total / 
(1024*1024*1024), $info->{"mem_total"});

printf("\n");

printf("SYSTEM MEMORY INFORMATION:\n");
printf("mem_wire:      %7dGB [%3d%%] %s\n", $mem_wire     / 
(1024*1024*1024), ($mem_wire     / $mem_all) * 100, $info->{"mem_wire"});
printf("mem_active:  + %7dGB [%3d%%] %s\n", $mem_active   / 
(1024*1024*1024), ($mem_active   / $mem_all) * 100, $info->{"mem_active"});
printf("mem_inactive:+ %7dGB [%3d%%] %s\n", $mem_inactive / 
(1024*1024*1024), ($mem_inactive / $mem_all) * 100, 
$info->{"mem_inactive"});
printf("mem_cache:   + %7dGB [%3d%%] %s\n", $mem_cache    / 
(1024*1024*1024), ($mem_cache    / $mem_all) * 100, $info->{"mem_cache"});
printf("mem_free:    + %7dGB [%3d%%] %s\n", $mem_free     / 
(1024*1024*1024), ($mem_free     / $mem_all) * 100, $info->{"mem_free"});
printf("mem_gap_vm:  + %7dGB [%3d%%] %s\n", $mem_gap_vm   / 
(1024*1024*1024), ($mem_gap_vm   / $mem_all) * 100, $info->{"mem_gap_vm"});
printf("-------------- ------------ ----------- ------\n");
printf("mem_all:     = %7dGB [100%%] %s\n", $mem_all      / 
(1024*1024*1024), $info->{"mem_all"});
printf("mem_gap_sys: + %7dGB        %s\n",  $mem_gap_sys  / 
(1024*1024*1024), $info->{"mem_gap_sys"});
printf("-------------- ------------ -----------\n");
printf("mem_phys:    = %7dGB        %s\n",  $mem_phys     / 
(1024*1024*1024), $info->{"mem_phys"});
printf("mem_gap_hw:  + %7dGB        %s\n",  $mem_gap_hw   / 
(1024*1024*1024), $info->{"mem_gap_hw"});
printf("-------------- ------------ -----------\n");
printf("mem_hw:      = %7dGB        %s\n",  $mem_hw       / 
(1024*1024*1024), $info->{"mem_hw"});

#   print logical results
------------------------------------------------------------------






top will display something like this ?

last pid: 85373;  load averages:  0.71,  0.48,  0.38 
 
        up 72+04:51:41  08:22:11
207 processes: 1 running, 206 sleeping
CPU:  1.5% user,  0.0% nice,  0.4% system,  0.0% interrupt, 98.0% idle
Mem: 336M Active, 71G Inact, 139M Laundry, 13G Wired, 770M Buf, 40G Free
ARC: 4319M Total, 1346M MFU, 501M MRU, 2368K Anon, 55M Header, 2414M Other
      383M Compressed, 1469M Uncompressed, 3.83:1 Ratio
Swap: 16G Total, 16G Free

   PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU 
COMMAND
  1986 pgsql         1  26    0   195M    46M select  14 426:27  11.47% 
postgres
83810 pgsql         1  27    0   200M   171M select  13   3:15   9.19% 
postgres
  1882 root        128  20    0    11M  3732K rpcsvc  15  29.7H   2.26% nfsd
  1987 pgsql         1  20    0   195M    47M select   5  33:21   1.84% 
postgres
  1985 root         34  21    0   141M    88M sigwai  14  72:22   1.32% 
named
  1937 root          1  20    0    27M    15M select  15 491:36   0.90% 
python3.8
99555 root          1  20    0    28M    18M select  10 634:23   0.88% 
python3.8
  1939 root          1  20    0    27M    15M select   1 939:47   0.87% 
python3.8
  1988 pgsql         1  20    0   195M    47M select   7   6:58   0.28% 
postgres
  1989 pgsql         1  20    0   195M    47M select   8   2:14   0.17% 
postgres
  1964 pgsql         1  20    0   194M   164M select   9  10:02   0.08% 
postgres
85373 root          1  20    0    14M  3644K CPU0     0   0:00   0.07% top
  3150 pgsql         1  20    0   195M    42M select   6  39:21   0.06% 
postgres

ps -axw or ps -axww or freebsd

# ps -axww
   PID TT  STAT          TIME COMMAND
     0  -  DLs     3788:48.94 [kernel]
     1  -  ILs        0:05.38 /sbin/init --
     2  -  DL         0:00.00 [crypto]
     3  -  DL         0:00.00 [crypto returns 0]
     4  -  DL         0:00.00 [crypto returns 1]
     5  -  DL         0:00.00 [crypto returns 2]
     6  -  DL         0:00.00 [crypto returns 3]
     7  -  DL         0:00.00 [crypto returns 4]
     8  -  DL         0:00.00 [crypto returns 5]
     9  -  DL         0:00.00 [crypto returns 6]
    10  -  DL         0:00.00 [audit]
    11  -  RNL  1629112:33.34 [idle]
    12  -  WL       180:00.70 [intr]
    13  -  DL       123:57.70 [geom]
    14  -  DL         0:00.00 [crypto returns 7]
    15  -  DL         0:00.00 [crypto returns 8]
    16  -  DL         0:00.00 [crypto returns 9]
    17  -  DL         0:00.00 [crypto returns 10]
    18  -  DL         0:00.00 [crypto returns 11]
    19  -  DL         0:00.00 [crypto returns 12]
    20  -  DL         0:00.00 [crypto returns 13]
    21  -  DL         0:00.00 [crypto returns 14]
    22  -  DL         0:00.00 [crypto returns 15]
    23  -  DL         0:00.00 [sequencer 00]
    24  -  DL         0:00.00 [cam]
    25  -  DL         5:42.32 [usb]
    26  -  DL         0:00.47 [soaiod1]
    27  -  DL         0:00.47 [soaiod2]
    28  -  DL         0:00.46 [soaiod3]
    29  -  DL         0:00.47 [soaiod4]
    30  -  DL      1714:58.15 [zfskern]
    31  -  DL         0:00.00 [sctp_iterator]
    32  -  DL        12:50.77 [pf purge]
    33  -  DL         2:16.82 [rand_harvestq]
    34  -  DL        29:00.62 [pagedaemon]
    35  -  DL         0:00.00 [vmdaemon]
    36  -  DL         5:25.68 [bufdaemon]
    37  -  DL         1:44.98 [vnlru]
    38  -  DL      2040:33.82 [syncer]
  1657  -  Is         0:01.21 /sbin/devd
  1863  -  Ss         0:03.44 /usr/sbin/rpcbind
  1878  -  Is         0:00.08 /usr/sbin/mountd -r -S
  1880  -  Is         0:00.27 nfsd: master (nfsd)
  1882  -  S       1780:23.16 nfsd: server (nfsd)
  1907  -  Ss        10:01.06 /usr/sbin/syslogd -s
  1909  -  Is         0:00.55 /usr/sbin/inetd -wW -C 50 -s 500
  1911  -  Is         0:00.25 /usr/sbin/sshd
  1955  -  Is        24:50.70 /usr/local/sbin/clamd
  1964  -  Ss        10:02.28 postmaster: checkpointer    (postgres)
  1965  -  Ss         1:38.52 postmaster: background writer    (postgres)
  1966  -  Ss         3:48.60 postmaster: walwriter    (postgres)
  1967  -  Ss         2:03.84 postmaster: autovacuum launcher    (postgres)
  1968  -  Ss        12:41.60 postmaster: stats collector    (postgres)
  1969  -  Is         0:01.82 postmaster: logical replication launcher 
   (postgres)
  1974  -  Ss        37:19.26 postmaster: walsender pgsql 
10.221.0.16(30421)  (postgres)
  1976  -  Ss        39:37.29 postmaster: walsender pgsql 
10.221.0.10(64872)  (postgres)
  1985  -  Is        72:21.96 /usr/local/sbin/named -d 0 -4
  1986  -  Ss       426:29.15 postmaster: pgsql scom_billing 
10.221.0.18(52852)  (postgres)
  1987  -  Ss        33:21.50 postmaster: pgsql scom_billing 
10.221.0.18(60830)  (postgres)
  1988  -  Ss         6:57.70 postmaster: pgsql scom_billing 
10.221.0.18(34255)  (postgres)
  1989  -  Ss         2:13.52 postmaster: pgsql scom_billing 
10.221.0.18(17265)  (postgres)
  2073  -  Ss        10:12.46 /usr/local/libexec/postfix/master -w
  2076  -  I          0:07.82 qmgr -l -t fifo -u
  2166  -  Is         1:53.61 /usr/local/libexec/postfix/master -w
  2168  -  I          0:55.23 qmgr -l -t fifo -u
  2238  -  Is         1:49.77 /usr/local/libexec/postfix/master -w
  2240  -  I          1:01.17 qmgr -l -t fifo -u
  2253  -  I          0:39.34 tlsmgr -l -t unix -u
  2397  -  Is         0:05.58 MailScanner: starting child (perl)
  2513  -  Is         0:20.43 /usr/sbin/cron -s
  3150  -  Rs        39:21.01 postmaster: walsender pgsql 
10.221.0.6(10000)  (postgres)
  3175  -  Is         0:00.35 postmaster: pgsql scom_billing 
10.221.0.6(10017)  (postgres)
  3176  -  Is         0:10.80 postmaster: pgsql scom_billing 
10.221.0.6(10018)  (postgres)
  3177  -  Ss         1:10.22 postmaster: pgsql scom_billing 
10.221.0.6(10019)  (postgres)



Happy Saturday !!!
Thanks - paul

Paul Kudla


Scom.ca Internet Services <http://www.scom.ca>
004-1009 Byron Street South
Whitby, Ontario - Canada
L1N 4S3

Toronto 416.642.7266
Main 1.866.411.7266
Fax 1.888.892.7266
Email paul at scom.ca

On 6/4/2022 5:15 AM, dovecot-bounces at dovecot.org wrote:
> 
> On 2022-06-04 02:46, Ivan Jurišić wrote:
>>> Ok a little more help :
>>> vsz_limit = 0 --> means unlimited ram for allocation, change
>>> this/try 2g etc pending avaliable ram.
>>
>> I try with 524M, 1G, 2G, 4G and 8G but in any case repclicator proces
>> got crash.
> 
> Maybe there is another service process causing OOM? e.g. check clamd, 
> antivirus DBs tend to be quite big and in updating for sometime becomes 
> double the size due to reloading.
> 
> Also, somtimes httpd service when using event worker, and its not tuned 
> properly, it will cause the OOM crash to other service along itself.
> 
> Good luck.
> 
> Zakaria.
> 


More information about the dovecot mailing list