Answer the question
In order to leave comments, you need to log in
How, using Python, to merge lists when at least one value matches?
The input has a set of unique ID collections. You need to combine them so that one identifier is in only one collection. Within a collection, all identifiers are unique. Within a set, all collections are unique. The set size is quite large (20000+) so speed matters. Need a Python solution.
Example:
changeset = set([
frozenset([101,102]),
frozenset([103,104]),
frozenset([101,105]),
frozenset([106,107,109,103]),
frozenset([110,135]),
frozenset([111,136]), ])
changeset = set([
frozenset([101,102,105]),
frozenset([103,104,106,107,109]),
frozenset([110,135]),
frozenset([111,136]), ])
Answer the question
In order to leave comments, you need to log in
For O(N * log(n)), where n is the number of unique identifiers, N is the sum of the cardinalities of the sets.
Disjoint set system :
def union(superset):
parent = {}
def make_set(a):
if a not in parent: parent[a] = a
def find_set(a):
if parent[a] == a: return a
parent[a] = find_set(parent[a])
return parent[a]
def union_sets(a, b):
a = find_set(a)
b = find_set(b)
if a != b: parent[a] = b
for subset in superset:
prev = None
for v in subset:
make_set(v)
if prev: union_sets(prev, v)
prev = v
lists = {}
for uid in parent.keys():
pr = find_set(uid)
if pr not in lists: lists[pr] = []
lists[pr].append(uid)
return set([ frozenset(l) for l in lists.values() ])
>> union(changeset)
{frozenset({111, 136}),
frozenset({101, 102, 105}),
frozenset({110, 135}),
frozenset({103, 104, 106, 107, 109})}
Sorry, I'm not a Python developer. Below is my head-to-head solution, which may prompt thoughts about optimization. :)
def join_intersections(aSet):
aSetCopy = aSet.copy()
for l, r in [[l,r] for l in changeset for r in changeset]:
if not l.isdisjoint(r) and l != r:
aSetCopy.discard(l)
aSetCopy.discard(r)
yield l | r
for rest in aSetCopy:
yield rest
>>> target_changeset = set(join_intersections(changeset))
>>> target_changeset
set([frozenset([136, 111]), frozenset([105, 101, 102]), frozenset([110, 135]), frozenset([103, 104, 106, 107, 109])])
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question