T
T
transcend2020-09-11 10:59:01
Database design
transcend, 2020-09-11 10:59:01

How to quickly search for text in a large database?

Good afternoon!

There is a database of domain names. Master records are domain names. The database is supplied as a CSV file. The base is updated daily, deleted domains are deleted, newly registered ones are added.

Volume ~160 million records.

The task is to organize a fast and stable search in this database. For example, you want to find all domains containing *google* (i.e. this word can appear anywhere in the domain name). For optimization, the minimum number of characters can be 4.

Questions:
1) What technologies should be used to organize the search? Specifically, what database or other features
2) What server configuration is required?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
Dr. Bacon, 2020-09-11
@bacon

1. Install postgresql on an available computer, upload these domains there.
2. Index the domain name, but due to "the word can occur anywhere in the domain name", regular indexes will not work, but pg_trgm will help
3. Using explain analyze to see what the search results there, it is possible to tweak the query and database settings.
4. Carry out load testing, based on the results, make a decision on the server configuration.
5. If everything is bad, test how it will work if the index is moved to a separate software like sphinx or elasticsearch.

D
Dimonchik, 2020-09-11
@dimonchik2013

not for service, but for friendship - try clickhouse
yes, there you will have to set data idiotically, but in the lines it’s no faster than the well-known Sphinxes / Manticore, and even more elastics
and from memory norms

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question