[dovecot/core] 3cb867: lib-fts/fts-tokeniser-generic: move to container_o...

GitHub noreply at github.com
Wed Oct 10 09:30:07 EEST 2018


  Branch: refs/heads/master
  Home:   https://github.com/dovecot/core
  Commit: 3cb8678634b6567304c4f24e462974a78529fb7c
      https://github.com/dovecot/core/commit/3cb8678634b6567304c4f24e462974a78529fb7c
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/fts-tokenizer-generic.c

  Log Message:
  -----------
  lib-fts/fts-tokeniser-generic: move to container_of() for type-safety

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: a5550d93dd532c387be3c2f019f810b47b4117c1
      https://github.com/dovecot/core/commit/a5550d93dd532c387be3c2f019f810b47b4117c1
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/fts-tokenizer-generic-private.h
    M src/lib-fts/fts-tokenizer-generic.c

  Log Message:
  -----------
  lib-fts/fts-tokenizer-generic - rename state variables - cosmetic

These contain types, not letters, no functional changes.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: fbc8ab8aef235b30bc76540d8e85db399406a02e
      https://github.com/dovecot/core/commit/fbc8ab8aef235b30bc76540d8e85db399406a02e
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/fts-tokenizer-generic-private.h
    M src/lib-fts/fts-tokenizer-generic.c

  Log Message:
  -----------
  lib-fts/fts-tokenizer-generic - rename more state variables - cosmetic

No need for a suffix now we've renamed the type variables.
Patch best viewed with: git show --color-words='[[:alnum:]_]+'

Unfortunately this is very churny, but there are no functional changes.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: a664a335e314f4a6aa9f4e179e862059c1f8a878
      https://github.com/dovecot/core/commit/a664a335e314f4a6aa9f4e179e862059c1f8a878
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/fts-tokenizer-generic.c

  Log Message:
  -----------
  lib-fts: tokenizer-generic - move state history setting into helper

We can read the value directly, but for encapsulation it's best to do
the shifting of the token type history into a helper in a similar way
as how it is done for tr29 tokenising.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: eeb61afbfde2dd5eed9c5afd8886e58544b604ff
      https://github.com/dovecot/core/commit/eeb61afbfde2dd5eed9c5afd8886e58544b604ff
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/fts-tokenizer-generic.c

  Log Message:
  -----------
  lib-fts: tokenizer-generic - move related helpers together in file

They're logically related, and not specifically simple or tr29 related,
so keep them together, so either tokeniser can use them. Cosmetic only,
no functional changes.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: d2a3c68552206e3c02f13a04f4c41c143565d023
      https://github.com/dovecot/core/commit/d2a3c68552206e3c02f13a04f4c41c143565d023
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/fts-tokenizer-generic.c

  Log Message:
  -----------
  lib-fts: generic simple tokeniser - distinguish "letters" from non-"letters"

prev_type is only compared against SINGLE_QUOTE, so there will be no
behavioural differences. However, maintaining the state that we've just
seen something we are prepared to search for (very loosely, a "letter")
rather than something that we threw away (word breaks) will be important
when it comes to explicit prefix query parsing.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: 64599a8260313203adf3878ab82f466001dc5ba4
      https://github.com/dovecot/core/commit/64599a8260313203adf3878ab82f466001dc5ba4
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/test-fts-tokenizer.c

  Log Message:
  -----------
  lib-fts/test-fts-tokenizer - have different possible test inputs

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: 336ddc224efa86d9b2f51243a134f25857d53521
      https://github.com/dovecot/core/commit/336ddc224efa86d9b2f51243a134f25857d53521
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/fts-tokenizer-generic-private.h
    M src/lib-fts/fts-tokenizer-generic.c

  Log Message:
  -----------
  lib-fts: tokenizer-generic - recognise request for explicit prefix searching

Just store a flag in the tokenizer when the setting is seen, nothing more.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: c59424ddcb5447c9b8f1709d10fd0d3419e73aaf
      https://github.com/dovecot/core/commit/c59424ddcb5447c9b8f1709d10fd0d3419e73aaf
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/fts-tokenizer-generic.c

  Log Message:
  -----------
  lib-fts: tokenizer-generic - add more history to break detection

For example, going from non-word to non-word is a different type
of break (not really a break) from the transition from a word to
a non-word. Presently, that distinction isn't needed, but it will
be for explicit prefix searches.

Make the tok parameter const too, whilst there.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: e6a17422de488f4b21bad5b16cb50421119378ee
      https://github.com/dovecot/core/commit/e6a17422de488f4b21bad5b16cb50421119378ee
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/fts-common.h
    M src/lib-fts/fts-tokenizer-generic-private.h
    M src/lib-fts/fts-tokenizer-generic.c

  Log Message:
  -----------
  lib-fts: tokenizer-generic - simple explicit prefix search logic

Logic is that words followed by a '*' create a prefix search token.
A new token is begun immediately after that. So "foo*bar" is 2 tokens
"foo*" and "bar", when in explicit prefix search tokenisation mode.

Only active in 'simple', not 'tr29'.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: a55dc1dff1892f88b3c355d0727bbf90cf7f3db1
      https://github.com/dovecot/core/commit/a55dc1dff1892f88b3c355d0727bbf90cf7f3db1
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/test-fts-tokenizer.c

  Log Message:
  -----------
  lib-fts: test-fts-tokenizer - explicit-prefix tests

Note that the special handling of '*' only kicks in when in
"search" and "explicitprefix" mode (as passed in through the
settings), and currently, only for the simple mode, not tr29.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: 001bcbdabcd9112f913167ba3b900e628cb87247
      https://github.com/dovecot/core/commit/001bcbdabcd9112f913167ba3b900e628cb87247
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/fts-tokenizer-generic.c
    M src/lib-fts/test-fts-tokenizer.c

  Log Message:
  -----------
  lib-fts: tokenizer-generic - tr29 explicit-prefix parsing

Similar logic to before - any wordlike sequence that ends with a * is
considered a prefix search, and immediately begins a new token.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


  Commit: 94a30fa2a478367b9be138e3f09148bfa815e371
      https://github.com/dovecot/core/commit/94a30fa2a478367b9be138e3f09148bfa815e371
  Author: Phil Carmody <phil at dovecot.fi>
  Date:   2018-10-10 (Wed, 10 Oct 2018)

  Changed paths:
    M src/lib-fts/test-fts-tokenizer.c

  Log Message:
  -----------
  lib-fts: replace repeated explicit hex utf8 with cleaner macro in tokeniser test

utf8 is too line-noisy, this improves readability.

Signed-off-by: Phil Carmody <phil at dovecot.fi>


Compare: https://github.com/dovecot/core/compare/694365517b13...94a30fa2a478
      **NOTE:** This service has been marked for deprecation: https://developer.github.com/changes/2018-04-25-github-services-deprecation/

      Functionality will be removed from GitHub.com on January 31st, 2019.


More information about the dovecot-cvs mailing list