C
C
codercat2015-10-07 15:40:14
PHP
codercat, 2015-10-07 15:40:14

How to work with large arrays?

What is the right way, for example, to find common values ​​in huge arrays (100-500k values) so that it does not take up a lot of resources?
If you do this in the usual way, then the output is a very loaded process. Maybe you should choose another language (I don't know how php is adapted for such tasks) or design it differently, for example, divide it into many small arrays?

$arr1 = [];
$arr2 = [];

for($i = 0; $i <= 400000; ++$i) {
  $arr1[] = rand(1000000, 100000000);
  $arr2[] = rand(1000000, 100000000);
}

$out = array_intersect($arr1, $arr2);

e020c38a25164271a0daa59b2af582b4.jpg

Answer the question

In order to leave comments, you need to log in

6 answer(s)
L
lyeskin, 2015-10-07
@codercat

Well, another structure suits well for such cases - sets. They are in python, for example. They work much faster due to the disorder and uniqueness of values.

Y
yuras666, 2015-10-07
@yuras666

php has SplFixedArray - it will eat less memory. IMHO you are trying to hang on php what it is not supposed to do. I would implement this logic on the side of the one who stores this data, for example mysql. If you need to find intersections of sets, it is very convenient to use intersections in redis .

A
Alexander Melekhovets, 2015-10-07
@Blast

Add values ​​like $arr[$i] = true and do key-based intersection, array-intersect-key seems to be. In fact, this is the implementation of the set.

S
shagguboy, 2015-10-07
@shagguboy

write in any language (well, or a library - for Theano python) that supports threaded processor operations, taking into account the work of these very operations.

C
codercat, 2015-10-07
@codercat

I tried to do the same in python on the advice of lyeskin , maybe wrong. The result is better, of course, but still I would not call it satisfactory.e7e29cb53b094a7e8cd01f70c0dd2bd5.png

import random
set1 = set()
set2 = set()
i = 0

while i < 400000:
        set1.add(random.randint(1000000, 10000000))
        set2.add(random.randint(1000000, 10000000))
        i = i+1;

print set(set1).intersection(set2)

S
SeptiM, 2015-10-08
@SeptiM

Sort and write a linear algorithm to intersect sorted arrays . array_intersect most likely works as a square and therefore loads everything.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question