Answer the question
In order to leave comments, you need to log in
How to count the number of unique rows greater than 1 million while typing?
I can not solve the problem in any way, maybe someone knows. The input is N < 1m (one million) number and then N lines of length less than 1k.
I need to output the number of unique rows as a result. But an important limitation is that you can only use the numpy library, the time is no more than 5s and 5 MB of memory (!)
The script below took me 97ms and 12 MB of memory:
import numpy as np
N = int(input())
a = np.array([])
for i in range(N):
x = input()
if not np.any(a == x):
a = np.append(a, x)
print(len(a))
N = int(input())
results = np.empty(N, dtype=object)
for i in range(N):
results[i] = input()
print(len(np.unique(results)))
Answer the question
In order to leave comments, you need to log in
N = int(input())
s = set()
for i in range(N):
s.add(input())
print(len(s))
N = int(input())
s = set()
for i in range(N):
s.add(hash(input()))
print(len(s))
The problem has no solution within the stated constraints. If someone's solutions are rolled, then the editors are cheating with test data. Here's a demo for that. You can finish it by throwing out the excess and replacing it randrange
with hash(input())
, and try to shove it as a solution.
from numpy import zeros, uint32
from random import randrange
from sys import getsizeof
N = 10 ** 6
hashes = zeros(N, uint32)
print(f'hashes занимает {getsizeof(hashes)} байт')
control = set() # здесь считаем по-честному
for i in range(N):
# вместо строк я использую большие случайные числа
r = randrange(0x4000000000000000)
control.add(r)
# сохраняем последние 4 байта r - больше не лезет
hashes[i] = r & 0xffffffff
hashes.sort()
a, cnt = hashes[0], 1
for b in hashes:
if a != b:
a = b
cnt += 1
print(f'control - целых {getsizeof(control)} байт (для строк длиной до 1к было бы больше)')
print(f'{cnt:8} разных хэшей\n{len(control):8} разных чисел')
bitmap, cnt = bytearray(0x400000), 0
for _ in range(int(input())):
h, f = hash(input()), 0
for _ in range(16):
m = b'\x01\x02\x04\x08\x10\x20\x40\x80'[h & 7]
h = ((h >> 4) ^ i) | ((h & 15) << 60)
if not bitmap[h & 0x3fffff] & m:
bitmap[h & 0x3fffff] |= m
f = 1
cnt += f
print(cnt)
Have you tried using multiples? Do you have an example input?
If the main thing is the use of memory and time is not important, then you can hash the strings and store the hashes in the set. Memory must be less than 5 MB. By time 2 - 3 seconds
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question