Ideal database for storing large numbers of unique rows?

M

Mikhail Yurievich2017-04-13 15:50:53

Database

Mikhail Yurievich, 2017-04-13 15:50:53

Continuing to develop our project https://spyserp.com/ru/
,
an interesting task appeared
. , expected growth in the range of 5-10 billion
- now stored in PostgreSQL, in the format id(primary key), link bytea(btree index), occupies 22 GB and the index is 32 GB
Task:
- choose a more optimal storage, reduce the physical size of the index, improve performance
From the requirements for the new database:
- it is desirable to have the maximum sharpened and optimized storage for this type of data (unique string + its id)
- quick search both by key (link) and getting a link by its id
- possibility of horizontal scaling
- disk storage (it is clear that for the best performance it is better to put everything in memory, but at the moment this option is not considered)
From what was tried:
- all key/value storages (leveldb, rocksdb, etc.) - no search by value (in this case, key is a link, value is id)
- there was an active period of googling, but unfortunately no suitable solution was found.
Discuss? I'd love to hear the opinions of those who faced a similar problem and how they solved it.

Reply

Answer the question

In order to leave comments, you need to log in

8 answer(s)

L

lega, 2017-04-13
@lega

1) Use the hash from the link as an id, then the index by the link will not be needed
2) Instead of btree, take the hash index, you don’t need sorting there
3) Variable length data is not efficiently stored in tables, because it is split up there and partially reserved, in total it takes more and works more slowly.

L

LORiO, 2017-04-20
@LORIO

clickhouse from Yandex. And the base works well with the url, since it was originally developed for the metric.

E

Eugene Khrustalev, 2017-04-13
@eugenehr

CouchDB

X

xfg, 2017-04-14
@xfg

You can also see https://ru.wikipedia.org/wiki/HBase

#

#, 2017-04-20
@mindtester

see this https://habrahabr.ru/company/yandex/blog/303282/

P

Philipp, 2017-04-13
@zoonman

Store in MongoDB as a document:
{_id: 'http://your/url'}

S

Sergey, 2017-04-14
@begemot_sun

If you have links, then you can compress them very well
using prefix search. Those. look for the maximum string that can be addressed and replace the entire string (prefix) with an ID. That. you can save significant resources.

D

dummyman, 2017-04-14
@dummyman

I used to use the Pastukhov base very often.
Now I have enough keywords collected by my own labor.
But the principle of storage has not changed - a text file rules!
Later I added indexes to the first 6 bytes in each line and to the first 2 bytes in each word (cp1251 encoding).
In short, it is convenient to store, copy, use on different computers by running directly from a USB flash drive, maximum speed!