U
U
un1t2013-06-18 09:32:13
Algorithms
un1t, 2013-06-18 09:32:13

Multiple problem, please help

There is a set of

set(1, 2)
set(3)
set(4, 5)
set(3, 2, 6)
set(6)
set(7, 8)
set(9, 8)


We need to merge all the sets that intersect.

So for example set(1, 2) needs to be combined with set(3, 2, 6) and with set(3). Do the same for all sets.

The result should look like this:

set(1, 2, 3, 6)
set(4,5)
set(7, 8, 9)


Additional terms.
The number of sets is 5 million.
The sets are chosen in such a way that one set is combined with 1-5 other sets on average. We will assume that the number of sets with which any set from this set can be combined never exceeds 10.

I can think of some algorithms, but they all have complexity n^2. And this takes a very long time. I would like to get a linear algorithm.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
[
[email protected]><e, 2013-06-18
@un1t

Denote: n is the number of sets, m is the total number of elements in the sets, k is the number of unique elements in the union of all sets.
Solution in O(m):
Create k sets from one element. Next, we go through the original sets, considering each set as a union request. Namely, we go through the elements of the set and combine the set corresponding to the current element with the set corresponding to the previous one. To do this quickly, the structure of the disjoint set system will come in handy . This structure allows you to perform the necessary operations on average so quickly that you can consider them O (1) (although, strictly speaking, the asymptotics there is not constant, see the article).
As a result of such a pass, we will process each of the m elements once, spending ≈O(1) time on it. This yields an O(m) estimate.
In the worst case, such an algorithm will work in O(n*k) (it is possible to get rid of repetitions in each set, so there will be no more than k elements in it).
It seems to me that the proposed algorithm is optimal: in any case, you need to consider each element of each set, which gives an estimate of at least O (m) operations.

T
turboNOMAD, 2013-06-18
@turboNOMAD

Build a graph where each element is a vertex, and the edges indicate belonging to the same set. Moreover, if several sets contain the same pair of elements, one edge is sufficient. Such a graph is built in one pass.
The desired sets are connected subgraphs of the resulting graph.
By the way, for working with graphs, I can advise the appropriate library from boost, if C ++ is meant.

M
Mephi1984, 2013-06-18
@Mephi1984

Here is my idea.
We take all numbers from all sets - we get N numbers, and we make a graph with N vertices out of them.
Then for each set, we connect the vertices. That is, for example, the set (7,8,9) connects three vertices with two edges: 7 - 8 - 9.
Then, from the resulting graph, we calculate the connectivity components. Wikipedia writes that the execution time is linear.

U
Urvin, 2013-06-18
@Urvin

1. Write out unique numbers, put the indices of the sets in front of them.

1 - 0
2 - 0, 3
3 - 1, 3
4 - 2
5 - 2
6 - 3, 4
7 - 5
8 - 5, 6
9 - 6

2. Omit unique digits where the number of set indices is less than 2
2 - 0, 3
3 - 1, 3
6 - 3, 4
8 - 5, 6

3. Match indexes of shortlisted sets
0, 3, 1, 4
5.6

4. Find out the index of missing sets
2

5. Combine sets in response. Ready.
Not very linear, yes. Maybe push...

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question