X
X
Xokare2282021-03-02 18:39:51
elasticsearch
Xokare228, 2021-03-02 18:39:51

How to add a character to the whitelist of a tokenizer?

There is a text field, a custom analyzer is assigned to it, it has a standard tokenizer, this tokenizer gets rid of all punctuation marks, this is what I need, but it also counts numbers separated by a slash as two morphemes, for example 23/45 is two tokens, "23 " and "45", but I need them to be counted as one token, i.e. "23/45", otherwise the behavior of the tokenizer suits me. How can this tokenizer behavior be changed? I tried to replace / with a word with a filter, but then I can't get it back. Thanks in advance

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
siri0s, 2021-03-19
@siri0s

Replaced with "word" and must be returned back. And why in the token return the "word" back to "/" ? The token is a search-only object, let the "word" be there. When searching, a phrase with the same "word" will be compared with the value in the token.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question