C
C
chonduhvan2014-05-23 12:49:06
Python
chonduhvan, 2014-05-23 12:49:06

How, using Python, to merge lists when at least one value matches?

The input has a set of unique ID collections. You need to combine them so that one identifier is in only one collection. Within a collection, all identifiers are unique. Within a set, all collections are unique. The set size is quite large (20000+) so speed matters. Need a Python solution.
Example:

changeset = set([               
   frozenset([101,102]),        
   frozenset([103,104]),        
   frozenset([101,105]),        
   frozenset([106,107,109,103]),
   frozenset([110,135]),        
   frozenset([111,136]), ])

The output should work.
changeset = set([               
   frozenset([101,102,105]),        
   frozenset([103,104,106,107,109]),
   frozenset([110,135]),        
   frozenset([111,136]), ])

Answer the question

In order to leave comments, you need to log in

2 answer(s)
0
0re1, 2014-06-03
@chonduhvan

For O(N * log(n)), where n is the number of unique identifiers, N is the sum of the cardinalities of the sets.
Disjoint set system :

def union(superset):
    parent = {}
    
    def make_set(a):
        if a not in parent: parent[a] = a
    
    def find_set(a):
        if parent[a] == a: return a
        parent[a] = find_set(parent[a])
        return parent[a]
        
    def union_sets(a, b):
        a = find_set(a)
        b = find_set(b)
        if a != b: parent[a] = b
    
    for subset in superset:
        prev = None
        for v in subset:
            make_set(v)
            if prev: union_sets(prev, v)
            prev = v

    lists = {}
    for uid in parent.keys():
        pr = find_set(uid)
        if pr not in lists: lists[pr] = []
        lists[pr].append(uid)

    return set([ frozenset(l) for l in lists.values() ])

>> union(changeset)
{frozenset({111, 136}),
 frozenset({101, 102, 105}),
 frozenset({110, 135}),
 frozenset({103, 104, 106, 107, 109})}

M
mrstrictly, 2014-05-23
@mrstrictly

Sorry, I'm not a Python developer. Below is my head-to-head solution, which may prompt thoughts about optimization. :)

def join_intersections(aSet):
    aSetCopy = aSet.copy()
    for l, r in [[l,r] for l in changeset for r in changeset]:
        if not l.isdisjoint(r) and l != r:
            aSetCopy.discard(l)
            aSetCopy.discard(r)
            yield l | r
    for rest in aSetCopy:
        yield rest

>>> target_changeset = set(join_intersections(changeset))
>>> target_changeset
set([frozenset([136, 111]), frozenset([105, 101, 102]), frozenset([110, 135]), frozenset([103, 104, 106, 107, 109])])

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question