S
S
Shabunoff2021-09-19 10:20:56
Python
Shabunoff, 2021-09-19 10:20:56

How to speed up iteration of list elements?

There is a set of sequences of 15 digits. These numbers are 1, 2 or 3. For example:
121231321223212

You need to compare each of the sequences with each and "remember" the number of matches between them.

For example, the sequences:
111111111111121
111111111111112
have 13 matches.

Now it's implemented like this:

diff=2
matrix = [[0] * len(list) for i in range(len(list))]
for i in range(len(list)):
for j in range(i, len (list)):
dis = distance.hamming(list[i], list[j])
if dis < diff:
matrix[i][j] = 1
matrix[j][i] = 1


The piece of code compares each line of the list list of type '111132123123213' with each other, and builds a square matrix,
at the intersection of rows and columns of which 1 is written (if the rows differ by less than a given number (diff) of differences), or 0 (if vice versa).

The task is to speed up this piece of code.
With this implementation, with a list of 1340 lines, it runs on my PC in 2.5 seconds, but I would like to work with lists from 20,000 to 100,000 lines

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)
W
Wataru, 2021-09-20
@Shabunoff

You can optimize it several times by rewriting it in C ++ with vectorization (SSE is all there). Then the comparison of two strings will be 1-2 instructions instead of 15. You can also parallelize the calculations. But all these optimizations will still not allow you to quickly count for 100,000 rows. What can we say, the matrix itself will take at least 10 gigabytes, if you write carefully in C or use typed numpy arrays. And so it will easily take all 20 and 40 gigabytes.
And nothing much faster will work here, if you need the entire matrix - you will have to calculate it for a couple of minutes at best. Now, if you use it for some other calculations (for example, you count for each row how many close to it are in the list), then you can fundamentally speed up the solution without calculating the matrix, but by counting the desired number at once.

D
dmshar, 2021-09-19
@dmshar

Those. It's been over a year and you haven't made any progress?
https://qna.habr.com/q/740387
And most likely, they didn’t even read the answers that were given to you there.
This perfectly characterizes the author of the question and tells everyone that there is no point in answering it.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question