N
N
nurzhannogerbek2019-06-27 11:12:24
PostgreSQL
nurzhannogerbek, 2019-06-27 11:12:24

How to find all similar records in a table?

Hello! Please help me figure it out.
The PostgreSQL database has a table with a list of questions and the table looks like this:

| QUESTION_ID | QUESTION_TEXT                                    |
|-------------|--------------------------------------------------|
| 1           | What is your favorite movie, cartoon and series? |
| 2           | What is your favorite movie cartoon and series   |
| 3           | what is your favorite Movie, Cartoon and Series  |
| 4           | Do you like apple?                               |
| 5           | do you like Apple                                |
| 6           | What is your favorite city?                      |

As you can see, there are similar entries in the table. For example, the text in records 1, 2, and 3 are similar to the human eye, but these records are different to the machine.
If you execute this query, it returns only 2 records with an exact match.
select
    *
from
    questions
where
    question_text in (
        'What is your favorite movie, cartoon and series?',
        'Do you like apple?'
    )

I need to find all similar records for these two questions that are mentioned in the above request. Similarity should be 90 percent. It seems that there is a pg_trgm module for such purposes, in which there is a similarity function . It returns a similarity value from 0 to 1.
What I did:
1) I enabled the pg_trgm extension:
CREATE EXTENSION pg_trgm;
2) Created an index on the question_text column:
CREATE INDEX questions_trgm_idx ON questions
  USING gin (question_text gin_trgm_ops);

3) This query returns all records for one sentence. How to make a similar search for several sentences?
select
  question_text,
  similarity(
    question_text,
    'What is your favorite movie, cartoon and series?'
  )
from
  answers
where
  question_text % 'What is your favorite movie, cartoon and series?'
  and similarity(
    question_text,
    'What is your favorite movie, cartoon and series?'
  ) >= 0.9;

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
Valery, 2019-06-27
@Desay

where ....like '%your_text%'

Z
zhaar, 2019-06-27
@zhaar

And you can also remove all punctuation marks, because they will also create unnecessary problems

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question