How to add a character to the whitelist of a tokenizer?

X

Xokare2282021-03-02 18:39:51

elasticsearch

Xokare228, 2021-03-02 18:39:51

There is a text field, a custom analyzer is assigned to it, it has a standard tokenizer, this tokenizer gets rid of all punctuation marks, this is what I need, but it also counts numbers separated by a slash as two morphemes, for example 23/45 is two tokens, "23 " and "45", but I need them to be counted as one token, i.e. "23/45", otherwise the behavior of the tokenizer suits me. How can this tokenizer behavior be changed? I tried to replace / with a word with a filter, but then I can't get it back. Thanks in advance

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

siri0s, 2021-03-19
@siri0s

Replaced with "word" and must be returned back. And why in the token return the "word" back to "/" ? The token is a search-only object, let the "word" be there. When searching, a phrase with the same "word" will be compared with the value in the token.