Need help deduplicating messages fetched with getmail into dovecot mailbox
Hello everyone,
I'm using getmail to fetch some external mails from POP3 into a dovecot mailbox (providing IMAP). Getmail seems to have some serious problems keeping track of mails already fetched. I have every Mail exactly twice in my Mailbox delivered by Dovecot. Of course they are not in the original external POP3 Inbox.
I hope I'm in right place here. I think the problem is not dovecot related, but getmail. As there is a huge number of user complains about getmail (and obviously no alternative?!) I hope to find some help here for a workaround.
I noticed getmail can be configured for some kind of filtering to sort out Mails for local delivery to dovecot. http://pyropus.ca/software/getmail/configuration.html#filter-examples
Does anyone know how to query the message-id (given by mail-headers) with doveadm?
My first test was this:
doveadm fetch -u my-mailbox@domain.net "mailbox date.sent" message-id "369408722.286104911.1547114312259.Foo.root@someone.org"
But then I noticed this is only ment to be used with dovecot internal GUID - I think... Is it possible to query for the mail message-id also? Is it posible to reconfigure indexing to include this propery from the original message-file? I made a small perl-script to extract it using Perl-Lib Email::Simple.
To: my-mailbox@domain.net *Message-ID: 369408722.286104911.1547114312259.Foo.root@someone.org* Subject: Some test
I also already tried the doveadm deduplicate, but this also doesn't work as it is also basing on the internal GUID of dovecot and every duplicate-message seems to have a new, unique GUID.
doveadm deduplicate -u gabriel.kaufmann@gmx.net mailbox-guid
May be anyone know a better solution.
-- Best regards
Gabriel Kaufmann
On Thu, 10 Jan 2019, Gabriel Kaufmann wrote:
Does anyone know how to query the message-id (given by mail-headers) with doveadm?
My first test was this:
doveadm fetch -u my-mailbox@domain.net "mailbox date.sent" message-id "369408722.286104911.1547114312259.Foo.root@someone.org"
doveadm fetch -u my-mailbox@domain.net 'guid hdr.message-id' ...
You're on your own for everything else.
Joseph Tam jtam.home@gmail.com
Hello Joseph,
thanks for your reply.
doveadm fetch -u my-mailbox@domain.net 'guid hdr.message-id' ...
You're on your own for everything else.
That works and I may be can make it work with that using a shell-script interacting with getmail as Filter. But indeed it's fetching ALL message-ids. It would be perfect if I could make search query on 'guid hdr.message-id' to get only a result (or none) if there is a message matching the message-id.
I've tried to adopt your suggestion into a doveadm query, but it's not working.
doveadm -f table search -u 'my-mailbox@domain.net' 'guid hdr.Message-ID' '1546519978.5428@foo.com' Fatal: Unknown argument GUID HDR.MESSAGE-ID or doveadm search -u 'gabriel.kaufmann@gmx.net' 'guid hdr.Message-ID' '1546519978.5428@paypal.com' Fatal: Unknown argument GUID HDR.MESSAGE-ID
Either it's not working at all or I've done something wrong. Do you know it that is possible?
Usually getmail should already notice which messages have been fetched and it's creating some kind of simple file-database containing message-id's already. But for some reason it's not working really good (there are many complains about that to be find by google). Ending up to query every 5 minutes over my whole Inbox (using doveadm fetch) will be a performance killer - I think. Creating my own dedupe-database additionally to getmail is overhead and I think it would lead into developing my own 'getmail' or try to extend the existing code myself (whatever is less time-consuming).
Do you have any idea if it's possible to use doveadm search for single message-id without having to query over all messages?
Best regards
Gabriel Kaufmann
On Fri, 11 Jan 2019, Gabriel Kaufmann wrote:
Hello Joseph,
thanks for your reply.
doveadm fetch -u my-mailbox@domain.net 'guid hdr.message-id' ...
You're on your own for everything else.
That works and I may be can make it work with that using a shell-script interacting with getmail as Filter. But indeed it's fetching ALL message-ids. It would be perfect if I could make search query on 'guid hdr.message-id' to get only a result (or none) if there is a message matching the message-id.
Whether this is good depends on how much duplication there is. If you're adding a small number of message to a large corpus, it *may* be better to loop through message-ids. If you're merging in a large mailbox, it's probably better to do bulk dumps of both boxes, then process them.
I'm not sure whether dovecot's caches are sequential O(n) or hashed O(1), but each query has overhead, so you may be better off doing a dump of message-ID's, then cross-referencing.
Do you have any idea if it's possible to use doveadm search for single message-id without having to query over all messages?
"-ftable" is just to make it easier to parse.
doveadm -ftable fetch -u my-mailbox@domain.net \
'guid hdr.message-id' \
HEADER message-id '<1546519978.5428@paypal.com>'
Keep in mind search is for case-insensitive fragments, so this pattern matches be a superset of the above '1546519978.5428@PAYPAL.COM'.
Joseph Tam jtam.home@gmail.com
Hello again,
I was able to workaround the getmail issue producing dublettes using Sieve-Filter to find and discard dublettes. I was using this Rule-Set (I found somewhere else in Internet). I only have sometimes dublettes now, but not as worse as before (every mail twice!).
# Track duplicate Mail-Deliveries require ["duplicate", "imap4flags"]; if duplicate :header "message-id" { discard; stop; }
keep;
Best regards
Gabriel Kaufmann
participants (2)
-
Gabriel Kaufmann
-
Joseph Tam