E
E
Enuriru2015-04-10 16:50:02
Clustering
Enuriru, 2015-04-10 16:50:02

How to perform clustering?

There is a fairly large database that includes documents, users, and multiple document->user relationships (that is, several users "work" on each document).
It is required to identify microgroups of users who most actively work together.
In this case, the same user can be in several microgroups at once.
What algorithms can be used? Perhaps there are ready-made solutions or libraries?
I would also like some controllability, with the help of which it would be possible to adjust the sizes of the found groups, to find both large and small ones.
Example:
Document1 - Vasya, Kolya, Sasha
Document2 - Vasya, Kolya, Nina
Document3 - Nina, Sasha, Vasya
In this case, the groups Vasya + Kolya (worked on 1 and 2) and Vasya + Nina (worked on 2 and 3) should be selected
. any similar libraries?)
Well, looking ahead - documents have categories, when visualizing, I want to display groups working in the same category side by side.
I will be grateful for your help!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
Z
zaplokee, 2015-04-14
@zaplokee

In R, you can scavenge data from MySQL, csv, xls, json, txt. And there, quite quickly (well, of course, depending on the amount of data), you can make a dendrogram for hierarchical clustering with the calculation of distances by default using the Euclidean method (or others if desired). It seems that dendrograms are even painted. Graphs from R are exported, though not very elegant (limited number of formats). However, you can find another tool. But hierarchical clustering will just allow you to display the connections of different levels and regardless of the number of people. If, for example, 3 people most often work with each other - they will be on the same level.

U
uvelichitel, 2015-04-11
@uvelichitel

Kruskal algorithm on the Union-Find set of Heap links optimized by link weight. If the weight is parameterized by criteria, then set it to Hamming distance. And I'm not joking, I did it in code on a real problem.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question