In which database is it better to store relationship data of this kind entity_id + [another_entity_id, ..., another_entity

K

konchober2016-03-10 16:06:47

MySQL

konchober, 2016-03-10 16:06:47

In which database is it better to store relationship data of this kind entity_id + [another_entity_id, ..., another_entity_id]?

Hello, now I am using a table for entity relation on MySQL MyIsam.
The structure is the simplest: entity_id, another_entity_id
+ 2 indexes: entity_id + another_entity_id (unique) and another_entity_id + entity_id Queries
are used for the table:
1) Select all another_entity_id by entity_id = X
2) Select all entity_id by another_entity_id = Y
3) COUNT by the above selections
4 ) Perhaps some analytical queries on the intersection of IDs
For each entity_id, there can be up to 10 million another_entity_id.
In principle, requests are processed quickly, but with 300 million records, the data itself takes up 4GB + indexes (!!!) 11 GB.
The task is to reduce the space occupied while maintaining or even increasing productivity.
The first thing that came to my mind was to use document-oriented databases.
1) MongoDB fell off immediately, because has a physical limit on the size of one BSON document of 16mb, which is only enough for a document of the form: entity_id + [an array of 1 million another_entity_id]. Performance and data size as a result of such a setup could not be estimated.
2) PostgreSQL and the jsonb type can store documents of any size, also has the ability to search by document, GIN indexes take up much less space, but jsonb search performance is disgustingly slow. Probably it is necessary to wait for release of indexes like VODKA. Also very slow inserts. Slight hard drive savings.
2.1) PostgreSQL and an array of ints (integer[]) are comparable in performance to jsonb searches, but faster inserts.
So far, it turns out that MyIsam is doing everyone in terms of performance. What do you advise, comrades?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

R

res2001, 2016-03-10
@res2001

Take PostGRE and leave the structure the same.
With indexes you have something messed up - in fact 2 identical indexes, do you think the DBMS itself will not guess to swap the fields if necessary?
I would make a unique clustered index on entity_id + another_entity_id, and separate additional indexes on each field.
Try for a start on MySQL with indexes to understand - the place will be released. Maybe stop there.
To increase performance - transfer the database to an SSD drive.