P
P
pcdesign2021-01-19 09:21:13
Python
pcdesign, 2021-01-19 09:21:13

How to calculate the "similarity" of two dictionaries?

For example, here are two dictionaries:

Апельсин 1
Яблоко 2
Банан 3


Апельсин 2
Яблоко 2
Инжир 1


How much does one invoice "similar" to another and get the similarity coefficient?
I understand that most likely it is necessary to solve this using https://ru.wikipedia.org/wiki/Jaccard_Coefficient
How long will the calculation be just with a piece of paper and a pen without a computer?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
O
o5a, 2021-01-19
@pcdesign

If the key values ​​are not important for comparison, then it is enough to use keys()

d1 = {'Апельсин': 1,
'Яблоко': 2,
'Банан': 3
}

d2 = {'Апельсин': 2,
'Яблоко': 2,
'Инжир': 1
}

print(d1.keys())
# общие ключи
print(d1.keys()&d2.keys())

Judging by the link, this is enough to calculate your coefficient.

I
isol, 2021-01-19
@isol

Look at Bloom Filter . Perhaps useful. If you build a bloom filter for each set, then you can compare how similar the filters are.

W
Wataru, 2021-01-19
@wataru

If the absence of a word in the dictionary is equivalent to a word with a weight of 0, then you can consider any measure from vectors of numbers. At least the root of the sum of squared differences for each word.
In your example it would be (1-2)^2+(2-2)^2+(3-0)^2+(1-0)^2 = 11.
The smaller this number, the more similar the dictionaries. You can somehow normalize it by dividing by, say, the number of unique keys in both dictionaries. Or the number of all kinds of words.
If your language/structure allows you to traverse the dictionary in lexicographic order, then you can calculate such a measure in linear time by doing something like merging sorted lists. Initially 2 pointers to the minimum elements (by dictionary) in each dictionary. If two elements have the same key, then calculate the difference between the two weights and move both pointers. Otherwise, read the difference of the weight with the minimum key and 0 and move only this pointer. The case where one of the dictionaries is already empty is the same as the second case.
In python, it allows you to traverse keys in order OrderedDict.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question