How to speed up updating 700K records from a file?

K

kaxa32012019-06-08 10:12:57

PHP

kaxa3201, 2019-06-08 10:12:57

There is a file with about 700,000 records, I need to periodically check it, if there are changes, then I update to the database, if not, then I write to the database. We need to speed up this process somehow. and I don't understand how to keep this case in my cases. stuck on this moment. do not judge strictly I am a beginner. I hope the code explains better.

DB::table('resources')->select('hash')->orderBy('id')->chunk(50000,
            function ($resources) use ($file, $updatedDate) {
                $lineCount = 1;
                echo 1 . '<>';

                while (!feof($file)) {
                    $line = iconv('cp1251', 'utf-8', fgets($file));
                    $csv = str_getcsv($line, ';');

                    if (count($csv) === 6) {
                        $ipPool = explode('|', $csv[0]);
                        foreach ($ipPool as $ip) {
                            $date = new \DateTime($csv[5]);
                            $hash = md5($csv[1] . $csv[2]);

                            foreach ($resources as $resource) {
                                if ($hash === $resource->hash) {
                                    DB::table('resources')->where('hash', $hash)->update([
                                        'version_date' => $updatedDate,
                                    ]);
                                    echo $lineCount++ . "<br>"  ;
                                }
                                // тут нужно записать если такой записи нет
                            }
                        }
                    }

                    $lineCount++;
                }
            });

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

D

David, 2019-06-08
@kaxa3201

Firstly, why are you pulling out records of exactly 50,000? The sensible figure is random.
In short, super optimization, I guarantee an increase many times over - get all the records from the database, collect a temporary hashmap, while removing unnecessary orderBy that you do not need

$map = [];
DB::table('resources')->select('hash')->chunk(50000,
            function ($resources) {
                    $map[$resource->hash] = $resource; // сюда можнок класть только действительно необходимые данные, чтобы память не засрать
                }
            });

then iterate over the entire file

while (!feof($file)) {
                    $line = iconv('cp1251', 'utf-8', fgets($file));
                    $csv = str_getcsv($line, ';');

                    if (count($csv) === 6) {
                        $ipPool = explode('|', $csv[0]);
                        foreach ($ipPool as $ip) {
                            $date = new \DateTime($csv[5]);
                            $hash = md5($csv[1] . $csv[2]);
                            
                            if (isset($map[$hash]) {
                                // запись существует, делаете апдейт
                            } else {
                                // записи нет, добавляете в свою таблицу
                            }
                        }
                    }

Just imagine that you have a perfect hash function with no collisions...

A

Alex-1917, 2019-06-08
@alex-1917

first the file is downloaded from another source, then I need to check the records in those in the file with their records,

here is the key!
if I understood correctly from your chaotic text - you are comparing two FILES ?? oooh
WHY?
faster and easier and more conducive to automation - compare in DB!