L
L
Luhashan2022-02-04 19:16:38
PostgreSQL
Luhashan, 2022-02-04 19:16:38

PostgreSQL full text search inside JSON, index inside JSON?

There is a table in postgres - table.
The table has a data (jsonb) field.
Data contains json objects of the type:
{
"obj": [
{
"TEXT": "any text here",
"key1": any value,
"key2": any value
},
{
" TEXT ": " any text here " ,
"key1": any value,
"key2": any value
},
...
]
}
Trying to do a full text search on " TEXT ".
Question:
1. Is it possible to create a gin-index (tsvector) for " TEXT " without unpacking json into separate tables/fields and how?
2. Is it possible in principle to do a full-text search on " TEXT ", which will not slow down too much (the base is about 5 gigs) without unpacking json into separate tables/fields?
Many thanks in advance to everyone!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
G
galaxy, 2022-02-05
@luhashan

It is possible to do something. But it's better to have at least a separate field for tsvector.

create table t (col jsonb);
create index ix_t_col using gin(to_tsvector('pg_catalog.english', jsonb_path_query_array(col, '$.obj.TEXT')));

insert into t values('{
"obj": [
{
"TEXT": "These functions act like their counterparts described above without the _tz suffix, except that these functions support comparisons of date/time values that require timezone-aware conversions.",
"key1": 77,
"key2": "a"
},
{
"TEXT": "Returns target with new_value inserted. If the item designated by the path is an array element, new_value will be inserted before that item if insert_after is false (which is the default), or after it if insert_after is true. If the item designated by the path is an object field, new_value will be inserted only if the object does not already contain that key",
"key1": 99,
"key2": false
}
]
}');


select ts_headline(col, 'function'::tsquery),
       ts_headline(arr, 'function'::tsquery)
  from (
    select col, jsonb_path_query_array(col, '$.obj.TEXT') arr,
           to_tsvector('pg_catalog.english', jsonb_path_query_array(col, '$.obj.TEXT')) tsv 
      from t
  ) q
 where tsv @@ 'function'::tsquery;

jsonb_path_query_array(col, '$.obj.TEXT')- selects everything that you have by the TEXT key into an array. It is then indexed

L
Luhashan, 2022-02-05
@Luhashan

Well, who cares, I found this https://www.depesz.com/2017/04/04/waiting-for-post...
As far as I understand, you can only index the entire json as text, it seems that you can’t do an index by keys will be:
"Nice. It found all words in all values ​​in the json (it didn't index keys of objects)".
Although it could be done, because postgres makes it possible to get to the keys in the request, which means that some kind of json tree is being built, especially considering that a very high percentage of developers often do not take a steam bath and hack semi-crazy json objects into one field, even though relational the base is used :( :)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question