Update elastic analyzer for autocomplete search #3

Open
opened 2019-06-02 12:12:31 +00:00 by KevinMidboe · 0 comments
KevinMidboe commented 2019-06-02 12:12:31 +00:00 (Migrated from github.com)

Our mapping and search using match_phrase_prefix works very well, but there are some improvements that can be made.

Currently we don't have support for typing queries that do not match the entire beginning for the result.
E.g.

  • "interste" would match "Interstellar", because both start the same
  • "interset" would not match "Interstellar", because the two last characters are swapped
  • "terstellar" would not match "Interstellar", because the start is missing from the query

What looks like a good solution to this is to analyze as ngrams. A ngram analyzer will create many more terms for each input, all pointing to the same document. Taking in account multiple matching terms pointing at the same document could improve hit-rate for slightly misspelled queries.

Research:

Our mapping and search using `match_phrase_prefix` works very well, but there are some improvements that can be made. Currently we don't have support for typing queries that do not match the entire beginning for the result. E.g. - "interste" would **match** "Interstellar", because both start the same - "interset" would **not match** "Interstellar", because the two last characters are swapped - "terstellar" would **not match** "Interstellar", because the start is missing from the query What looks like a good solution to this is to analyze as ngrams. A ngram analyzer will create many more terms for each input, all pointing to the same document. Taking in account multiple matching terms pointing at the same document could improve hit-rate for slightly misspelled queries. Research: - https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
Sign in to join this conversation.
No description provided.