B
B
BonBon Slick2021-07-06 22:55:59
Database
BonBon Slick, 2021-07-06 22:55:59

Why does the Trigram (or Trigraph) concept use exactly 3 and not 2 or 4+?

Example
https://www.postgresql.org/docs/current/pgtrgm.html
It would be possible to split into 4 characters or more, as well as 2.
Intuition suggests that the point is accuracy, with long or too short pieces, the accuracy drops, but Is it so?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
H
hint000, 2021-07-07
@BonBonSlick

Empirically found the "golden mean". Natural languages ​​are different. For English, with its typical short words, 2 might be fine, but for German, 4+ would probably be better. We experimentally found that on average 3 for different languages ​​gives a good result.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question