K
K
Kizzeon2020-05-08 10:28:27
JavaScript
Kizzeon, 2020-05-08 10:28:27

What is the most productive search for JSON objects at the moment of 2020?

I have a fairly large file with JSON data (150 thousand json objects - ~50 megabytes in total) and I need to quickly search through them (in < 150 ms), sorting the options with the most matching N characters in the text. A total of 10 found objects should be displayed.

JSON also has custom characters (hieroglyphs and other characters).

The Json structure looks like this:

{"item": 643, "name": "Уникальное название товара `кастомный символ здесь`", "type": "Кастомный символ"}


Each object has the same properties (item, name, type...) and only their values ​​differ.

What productive search options for regular (English / Russian) and custom (for example, Chinese characters) characters in JSON do you recommend?

Answer the question

In order to leave comments, you need to log in

6 answer(s)
T
ThunderCat, 2020-05-08
@ThunderCat

Sphinx is a classic search in large files, but you don’t have a set of letters, but structured data, so I think it would be more logical for you to create a parameterized model and transfer everything to the database, at least to SQLite (it seems you have a fairly simple data model). But here, too, there are nuances, well, and perhaps it just doesn’t suit you for a number of reasons. Then the sphinx is the best option for you, although a 50 mb file is somehow small, usually the sphinx is set to work with gigabyte files, but it will be "for growth".

D
Dimonchik, 2020-05-08
@dimonchik2013

there are no miracles
all fast searches - these are pre-created indexes
explicitly or implicitly
in your case - if we are talking about the front - everything rests on the JSON parser
in Python the fastest ujson, binding over C, in Go - also a couple of options
in JS, there will hardly be anything faster than native
, but with a local database, at least levelDB is already more fun

X
xmoonlight, 2020-05-08
@xmoonlight

An index is needed (it was in the books, at the end).
From it we make a "tree": A, B, C, .... AA, AB, AB, ..., and from each then: ABA, ABB, .... etc. over all "strings" of letters found in the words of the index.
When searching, we descend the "tree" for each word.
The more words of the search query fell into one entry, the more relevant it is to the query.
It will definitely be less than 150ms.

P
profesor08, 2020-05-08
@profesor08

Well, you can cut off the parsing of the JSON itself, the creation of objects, etc., and work like with a string. Compose a regular expression that will return the necessary pieces of the string, and then you translate them into objects. You can sort if you want. As for performance, I can not say anything, it is necessary to check.

H
hzzzzl, 2020-05-09
@hzzzzl

I have a fairly large JSON file ---- is there only one, or are there many of them and they will be added?
maybe it’s easier to throw everything into some kind of database, especially since objects have a certain structure, everything will be searched instantly and flexibly

A
Alex, 2020-05-09
@mr_ko

Neavno solved a similar problem, a test task. As the guys above say, without additional indexes it will not work quickly. In my case, the search was only in two fields, so I recreated the array for myself using these keys and searched, that was enough. In the process of parsing, I came across such a package https://www.npmjs.com/package/json-index , there are analogues, it can help. But I haven't personally tried it.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question