A
A
Anton Medvedev2013-07-04 18:01:19
SQL
Anton Medvedev, 2013-07-04 18:01:19

Organization of a database of sentences and keywords

It is necessary to organize a relational database for storage and search in it of the following.
There are applications with keywords:
Sentence1(keyword1, ks2, ks3)
Sentence2(ks2, ks3, ks4)
...
Next, you need to select those sentences by the set of keywords where the most matches were found.
For example, on request: ks2, ks4:
Sentence2, matches: 2
Sentence1, matches 1

How do you think it is better to organize such a structure? What is the best and fastest way to search?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
U
Urvin, 2013-07-04
@Elfet

offers
offer_id
offer_name
words
word_id
word_name
matching
matching_id
offer_id
word_id

SELECT
  offer_name,
  matching_count
FROM
  (
    SELECT
      offer_id,
      offer_name,
      COUNT(*) AS matching_count
    FROM
      words
      JOIN matching USING(word_id)
      JOIN offers USING(offer_id)
    WHERE
      word_name = 'KC1'
      OR word_name = 'KC2'
    GROUP BY
      offer_id,offer_name
  ) AS mt
ORDER BY
  matching_count DESC

If you want very quickly - you can try the sphinx

A
AgentSIB, 2013-07-04
@AgentSIB

I'll try to chew.
You have an offer, each offer is one record in the offers table
set of minimum fields
offer_id - identifier
offer_name - name
Next, you can add one word field and list from with a separator. But in this case, you will have to look for sentences, taking into account the separators in the words text field. This is clearly not correct. Therefore, it is best to make another table in which the words are associated with the sentence words
word_id
offer_id
word_name
It’s already better and it’s quite simple to count the number, the sample will be faster. But the word_name field is redundant. That is, we will store extra information (the same words). Therefore, it is best to create two tables - a list of words (tags) and the correspondence of sentences and tags.
words
word_id
word_name
matching
matching_id
offer_id
word_id
Thus, we have come to the very first option. So it's clearer why it was advised that way?)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question