M
M
Marat Dolotov2020-01-24 08:26:25
Sphinx
Marat Dolotov, 2020-01-24 08:26:25

How does facet work when grouping data?

There are tables vacancy, prof_area, company and company_address - a company can have many vacancies and addresses i.e. this table is one-to-many related to the vacancy and company_address tables. And thus, a vacancy can have several addresses. company_address has two float coordinate fields latitude and longitude. Also, a vacancy can have several professional areas, respectively, the vacancy table is linked to the prof_area table one-to-many.
The task is to index all vacancies in sphinx along with professional areas, addresses and coordinates. So that by coordinates you can implement a search by radius using the GEODIST method, also display FACET data for all fields on the filter form. Since the vacancy has many professional areas and addresses (MVA), I originally planned to use sql_attr_multi. Everything went fine with the prof_area_ids field BUT there was a problem with the coordinates, sql_attr_multi does not support float data. Therefore, I decided to index all the data simply through JOIN and got the following data in the index:

mysql> select id,vacancy_id,latitude,longitude,prof_area_ids from vacancyIndex;
+------+------------+----------+-----------+---------------+
| id   | vacancy_id | latitude | longitude | prof_area_ids |
+------+------------+----------+-----------+---------------+
|    1 |        917 | 0.973178 |  0.743566 | 11,199,202    |
|    2 |        916 | 0.973178 |  0.743566 | 17,283,288    |
|    3 |        915 | 0.973178 |  0.743566 | 17,288        |
|    4 |        914 | 0.973178 |  0.743566 | 30,482        |
|    5 |        919 | 0.825153 |  0.692837 | 15,243        |
|    6 |        919 | 0.825162 |  0.692828 | 15,243        |
|    7 |        918 | 0.825153 |  0.692837 | 8,154         |
|    8 |        918 | 0.825162 |  0.692828 | 8,154         |
|    9 |        920 | 0.958914 |  1.282161 | 17,283,288    |
|   10 |        920 | 0.958915 |  1.282215 | 17,283,288    |
|   11 |        924 |  0.97333 |  0.658246 | 12,208        |
|   12 |        924 | 0.973336 |  0.658237 | 12,208        |
|   13 |        923 |  0.97333 |  0.658246 | 21,365        |
|   14 |        923 | 0.973336 |  0.658237 | 21,365        |
|   15 |        922 |  0.97333 |  0.658246 | 20,359        |
|   16 |        922 | 0.973336 |  0.658237 | 20,359        |
|   17 |        921 |  0.97333 |  0.658246 | 19,346        |
|   18 |        921 | 0.973336 |  0.658237 | 19,346        |
|   19 |        926 |  0.88396 |  2.389868 | 12,17,208,292 |
|   20 |        925 |  0.88396 |  2.389868 | 12,208        |
+------+------------+----------+-----------+---------------+
20 rows in set (0.00 sec)

As you can see, some vacancies are repeated several times, differing only in coordinates and id (ID actually does not apply to the vacancy as it is generated by "row_number() OVER () AS id" , the vacancy has a separate vacancy_id field).
For data search, everything works fine, if I need to select all vacancies in professional area 199, I simply group the data by vacancy_id:
mysql> select id,vacancy_id,latitude,longitude,prof_area_ids from jobVacancy where prof_area_ids=199 group by vacancy_id;
+------+------------+----------+-----------+-----------------+
| id   | vacancy_id | latitude | longitude | prof_area_ids   |
+------+------------+----------+-----------+-----------------+
|    1 |        917 | 0.973178 |  0.743566 | 11,199,202      |
|  191 |       1004 | 0.925335 |  2.768874 | 11,196,199      |
|  313 |       1072 | 0.963968 |  1.070624 | 1,11,60,197,199 |
|  318 |       1136 |  0.96071 |  1.448998 | 11,196,199      |
|  374 |       1097 | 0.785255 |  0.678504 | 11,199          |
+------+------------+----------+-----------+-----------------+
5 rows in set, 1 warning (0.01 sec)

Everything is great! BUT the problem occurs when I add FACET to the end:
mysql> select id,vacancy_id,latitude,longitude,prof_area_ids from jobVacancy where prof_area_ids=199 group by vacancy_id facet prof_area_ids;
+------+------------+----------+-----------+-----------------+
| id   | vacancy_id | latitude | longitude | prof_area_ids   |
+------+------------+----------+-----------+-----------------+
|    1 |        917 | 0.973178 |  0.743566 | 11,199,202      |
|  191 |       1004 | 0.925335 |  2.768874 | 11,196,199      |
|  313 |       1072 | 0.963968 |  1.070624 | 1,11,60,197,199 |
|  318 |       1136 |  0.96071 |  1.448998 | 11,196,199      |
|  374 |       1097 | 0.785255 |  0.678504 | 11,199          |
+------+------------+----------+-----------+-----------------+
5 rows in set (0.00 sec)

+---------------+----------+
| prof_area_ids | count(*) |
+---------------+----------+
|           202 |        1 |
|           199 |       12 |
|            11 |       12 |
|           196 |        5 |
|           197 |        3 |
|            60 |        3 |
|             1 |        3 |
+---------------+----------+
7 rows in set (0.02 sec)

As you can see in the results of the search for vacancies where prof_area_ids=199 there are only 5 data and in FACET 12 it shows. That is, if I displayed the faceted data in the form of a filter, the client sees that the professional area with the identifier 199 in base 12, he selects and in fact the system shows him only 5.
Can SPHINX be used to ensure that the faceted data is displayed correctly? Maybe I indexed incorrectly? In general, tell me how such issues are actually resolved. I don't think I'm the only one with these questions.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Marat Dolotov, 2020-01-28
@bemulima

It turned out that when grouping data by some field in FACET, there is no way to group, so you need to make an additional request:

select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy
    where prof_area_ids=199 group by vacancy_id;

SELECT GROUPBY() AS prof_area_id, COUNT(DISTINCT vacancy_id) FROM jobVacancy
    WHERE prof_area_ids=199 GROUP BY prof_area_id;

Only it is necessary to send these requests at once and not each one separately. multi request.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question