B
B
Barakud2015-11-19 15:50:24
PHP
Barakud, 2015-11-19 15:50:24

How to find a list of exact matches of phrases from a database in a given text using PHP?

There is a database with 100k keywords, such as:

большой дом
кафельная плитка
зеленая машина в саду

A text comes to us for processing, of the form:
В ноябре в Лондоне большой человек купил большой дом для своей большой семьи. В доме была кафельная панель.

It is necessary to find in this text all full matches from the database, that is, in our case:
большой дом
I tried FULLTEXT INDEX, a query like:
SELECT * FROM `phrases` WHERE MATCH(`phrase`) AGAINST('В ноябре в Лондоне большой человек купил большой дом для своей большой семьи. В доме была кафельная панель.')

Returns:
большой дом
кафельная плитка

I tried using IN BOOLEAN MODE and comparing the number of words in the phrase in the database with the number of matches found, but then I don’t get any results at all.
Can this be done somehow on an indexed search engine (be it MySQL or Sphinx) and yes, how?
UPD. Judging by the answers, the question was asked incomprehensibly: I have a database of short phrases (100,000 phrases of 1-2 words each) and I receive a text (1000 words) as input. I'm looking for text by phrases, and not vice versa. The text is not in the database and is not indexed. I receive it from the outside and cannot control it. I need, having received the text, to issue suitable phrases for it from the database.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
B
Barakud, 2015-11-19
@Barakud

At the moment I settled on a solution of the form:
Functionally, it does exactly what is needed, but the performance of such a request is weak, and it looks more like a crutch. I will try again with sphinxql.

D
Dimonchik, 2015-11-19
@dimonchik2013

Bloom filter try the
original sort and 2-3-4 word sets and each set - into the filter,
incoming - cut into 4-3-2 and run through the filters
, I think it would be easier to quickly stick the incoming text into the RT Sphinx index and for all phrases
php.net/manual/en/sphinxclient.setmatchmode.php SPH_MATCH_PHRASE

I
Immortal_pony, 2015-11-19
@Immortal_pony

It is necessary to find in this text all full matches from the database.

Then use a simple search, not a smart one:
SELECT * 
FROM `phrases`
WHERE 
    'В ноябре в Лондоне большой человек купил большой дом для своей большой семьи. В доме была кафельная панель.' 
    LIKE CONCAT('%', phrases.`phrase` , '%')

X
xmoonlight, 2015-11-19
@xmoonlight

dev.mysql.com/doc/refman/5.7/en/fulltext-query-exp...

A
Adamos, 2015-11-19
@Adamos

Tools for such purposes, as far as I know, look something like this:
- all phrases are divided into words, they are searched for a morphological basis
- all words and phrases in which they occur are stored in the database (more precisely, already the id of the word in the word table and the id of the phrase in the phrase repository - the phrase is not necessarily stored there, but it can be restored from there)
- you do the same for the incoming phrase - parse it into words and find phrases in the database in which these words occur.
Then on this rather limited one, you can apply arbitrarily sophisticated search to the sample.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question