A
A
Artem2016-01-14 10:20:59
Oracle
Artem, 2016-01-14 10:20:59

How to use Russian morphology in Oracle Text (in DBMS 11g R2)?

Hello colleagues.
The official documentation for Oracle Text 11 g Release 2 (11.2) ( link ) says that Russian morphology is supported at the word stem level (stemmer). Tried this in practice:
Table:

select * from docs;
 
        ID TEXT
---------- --------------------
         1 читать
         2 читаю
         3 читал
         4 чтение
         5 sing
         6 sang
         7 singing
         8 sung

Create a lexer:
exec ctx_ddl.create_preference('MYLEXER', 'world_lexer');

Create an index:
create index i_docs on docs (text) indextype is ctxsys.context
   parameters ('LEXER MYLEXER stoplist CTXSYS.EMPTY_STOPLIST');

The following query for English text produces the correct result (4 entries):
SELECT SCORE ( 1 ), text
  FROM docs
  WHERE CONTAINS (text, '$sing', 1 ) > 0
  ORDER BY SCORE ( 1 ) DESC;

And a request for Russian text produces only one entry:
SELECT SCORE ( 1 ), text
  FROM docs
  WHERE CONTAINS (text, '$читать', 1 ) > 0
  ORDER BY SCORE ( 1 ) DESC;

At the sql.ru forum I read that the original encoding of the instance itself is probably to blame. My parameters:
NLS_CHARACTERSET     CL8ISO8859P5
NLS_NCHAR_CHARACTERSET    AL16UTF16

Tried on the 12th version of the DBMS (on UTF8 in 12.1.0.2), the result is the same.
Has anyone faced a similar issue? Did you manage to connect Russian morphology?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question