charset-specific searches, and continuation lines

Fri Sep 5 02:13:17 UTC 2014

Michael M Slusarz <slusarz at curecanti.org> writes:

> Quoting Eric Abrahamsen <eric at ericabrahamsen.net>:
>
>> Hi there,
>>
>> I'm looking into improving IMAP search support for the Gnus Emacs mail
>> client, and trying to add the ability to search non-ascii characters. So
>> far as I know, I start this invocation with something like:
>>
>> . UID SEARCH CHARSET UTF-8 TEXT {NNN}
>>
>> Where NNN is the number of bytes in my search string. Dovecot then
>> responds with:
>>
>> + OK
>>
>> So... what do I do then? I don't actually know what the next statement
>> is, to provide the actual search string itself. Googling has proved
>> unhelpful, as most of the examples online don't actually show this "+
>> OK" response. Can someone just briefly outline what's meant to happen
>> next? I've tried including the search string immediately after the
>> byte-size, separated by various combinations of \n\r, but that always
>> gives me a "Missing LF after literal size" error.
>
> Your example, assuming your search text is "aéb":
>
> . UID SEARCH CHARSET UTF-8 TEXT {4}
> +OK
> aéb[CRLF]
> * SEARCH XXX
> . OK
>
> Literal length is the number of octets in the string - not the number
> of characters - so not sure if that was tripping you up.

Hi Michael,

Well that's embarrassing, I could have sworn that was the first thing I
tried. I knew about the octets, and had tried inputting a\303\251b as
the search string, but was sure I'd also tried the plain old search
string. Thanks!

While I've got you here, I hope you'll answer one more question: what's
the format for searching multiple terms with non-ascii strings? Is it
possible in one run to find a utf-8 encoded subject, and a utf-8 encoded
body?

Thanks again,
Eric