How to effectively store "connectivity" between users?

E

Enuriru2015-03-01 02:39:51

Database

Enuriru, 2015-03-01 02:39:51

Greetings!
There is an algorithm that calculates an indicator (number) that characterizes the relationship of user A to user B.
The algorithm is transitive, symmetric, i.e. A->B = B->A.
As the number of users grows, it will be necessary to store the results and recalculate them (for example, if some parameters of user A change, it is necessary to recalculate its relation to all others).
The question arises, how to efficiently store, quickly update and read such data? SQL, NoSQL? Which engine/base is best? After all, there will already be 1,000,000 records for 1,000 users.
Thank you!

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

U

uvelichitel, 2015-03-01
@uvelichitel

Transitivity in my understanding means
A->B + B->C = A->C
Among thousands of users, hardly everyone is connected.

In such premises, we are talking about the connectedness of the graph. Hierarchical data structures (tree, graph) are foreign to SQL relational algebra. Known solutions Adjacency list and Nested sets . Adjacency list requires WITH RECURSIVE. Nested sets involves a lot of recalculation during operations (especially INSERT).
It would be more convenient to implement the data model you need in a hierarchical DB (for example, just hierarchical key value, Redis or levelDB will do) or graph DB or documentary DB (for example, just in XML which is inherently hierarchical)

O

OnYourLips, 2015-03-01
@OnYourLips

NoSQL?

You just said that this setting has two dependencies. NoSQL is not a choice.

After all, there will already be 1,000,000 records for 1,000 users.

First, it won't: not all of these users will have connections.
Secondly, these are small numbers.

S

SeptiM, 2015-03-01
@SeptiM

If the algorithm specifies a metric, and are willing to sacrifice accuracy, it is possible to map all users into an O(log n)-dimensional space. In the worst case, the distances will deteriorate into O(log n), but only O(n log n) bits of information can be stored.

M

Mikhail Vasilyev, 2015-03-01
@mickvav

Do you really need to keep this number for everyone? If only for those who have it less / more than a certain threshold, then this is O (N) or O (NlogN), which will already fit into regular SQL, and not O (N ^ 2), which is really bad.

Y

YV, 2015-03-28
@targetjump

I think neo4j is exactly what you need