K
K
kicherov_maxim2020-04-25 09:21:54
Python
kicherov_maxim, 2020-04-25 09:21:54

How to calculate statistics on data from Django database?

Good afternoon. I have a database with ip addresses and "fingerprints" for each ip. I need to calculate the statistics, how many times the same fingerprint was received for different ip addresses. Lines with the same ip should be discarded, and the remaining selection should be divided into groups of 100, 200, 300, etc. elements.

The problem is that if my code is run multiple times it gives different values.

def collisions(items):
    return len(items) - len(set([x['tlsh'] for x in items])) # Кол-во повторяющихся

all = FingerPrint.objects.all().values('ip', 'tlsh')
print("Всего:", len(all))

unique_ip = list(set([x['ip'] for x in list(all)]))
print("Уникальных:", len(unique_ip))

unique_items = []
for ip in unique_ip:
    unique_items.append(all.filter(ip=ip)[0]) 

# Для графика
x = [100, 200, 300, 400, 500, 600, 700, 738]
y = [collisions(unique_items[:100]),
     collisions(unique_items[:200]),
     collisions(unique_items[:300]),
     collisions(unique_items[:400]),
     collisions(unique_items[:500]),
     collisions(unique_items[:600]),
     collisions(unique_items[:700]),
     collisions(unique_items[:738]),]
print(y)


Conclusion:
5ea3d6e49af53672627340.png

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dr. Bacon, 2020-04-25
@kicherov_maxim

set does not preserve the order, it's probably the easiest thing to sort unique_ip

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question