What is the best way to compare a database and a text file (article-price, more than 2000 lines)?

B

billybons20062015-03-30 01:20:35

PHP

billybons2006, 2015-03-30 01:20:35

I'm sorry for the simple question, but I couldn't formulate it more precisely: what is the best way (so as not to overload the hosting) to compare prices from a text file (2000-3000 lines) and from the database, based on the article?
Well, I counted articles and prices from the file into an array.
$data[0][acticle]
$data[0][price]
And then how to compare the database and the resulting array?
Do not run a new SELECT type for each article from the file:

foreach ($data as $key) {
  // само собой, ниже просто идея, а не запрос
  SELECT * FROM table WHERE article LIKE %$key['article']%
  ...
  
  //Ну и дальше что надо - цену обновить, например или не обновлять, в зависимости от условий...
  
  if цена_из_бд > $key['price']
    UPDATE table SET price=$key['price'] WHERE article=$key['article']
}

Or just one way or the other? It's 2000-3000 requests to be done. Brad, apparently.
I'm a little afraid if this is stressful for hosting and in general, maybe this is the wrong idea at the root?
It seems that now 2000 lines are nonsense, but after all, they (lines) can become 10,000 in a year ...

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

F

FanatPHP, 2015-03-30
@FanatPHP

A few things are purely for information, so that you can start to navigate a little in the surrounding space. Because, like all beginners, you are thinking about the wrong thing.
1. 2000 lines is not a base, but a sandbox with Easter cakes. And 10 too. You can start thinking about loads starting from hundreds of thousands.
2. The MOST hell in your reasoning is LIKE, which you consider to be a regular means of searching by article. With such ideas, even a small base can be staked.
3. The standard way of working in such cases is to read the file into an array so that the articles are the keys. Then you request all the rows from the database, and reading one by one, you look for the article in the array.
4. It is not clear why, in the current formulation, to get and compare at all. Why can't you just update 2000 rows in a loop, specifying the conditions in the request?
5. Why is there nothing about deleted/added lines in your question? If you don’t need to follow this, then the task is not worth a damn. Or have you not thought about it yet?

D

Dmitry Skogorev, 2015-03-30
@EnterSandman

Well, you're not going to do it every minute, are you?
it's not stressful and it's normal

D

Dmitry Evgrafovich, 2015-03-30
@Tantacula

php.ru/manual/ref.array.html
PHP, of course, eats a lot of memory for arrays, but not so much to pull out one line from the database in order to save memory. Take all the rows from the database in one request, convert to the desired array, process it, then update the database with the second request. When the number of elements in the array exceeds one hundred thousand, read this article habrahabr.ru/post/141093 and google it, there are some pretty interesting optimization tips about this ... although after all of them, the only reasonable option seems to be a python script. But this is already when there will be under a million elements in the arrays.

A

Anton Korzunov, 2015-03-30
@kashey

Even if such a request takes 5 minutes, this is perfectly normal for an operation that is performed once a day.
You can improve the situation through storages or start using https://php.net/manual/ru/mysqli.multi-query.php
On the other hand, you have completely missed the point about indexes, and the conditions in the first and second queries are different.
What can be done to make it "very good"
1. Set the type of the article field to char[exact size]. In general, "fixed" tables, where the row size can always be calculated in advance, work better.
2. Add a UNIQUE index to the article. Is he unique?
3. We go through "new records" and "INSERT INTO table(...)... ON DUPLICATE KEY UPDATE price=newprice. In general, insert rows,
3.1 If there are no newlines at all, we only do updates
3.2 If there are, we start one more field - "dirty". At the beginning we set it to 0, in inserts and updates we set it to 1. After the end of the rows that were not updated, it will be 0.
4. You can also remember that any string indexes work poorly, no matter how you ask them. You can add one more field crc=CRC32(article), which will convert the string into a number
4.1 Total UNIQUE is hung on two fields - crc(first), article
4.2 In WHERE expressions, you can do WHERE crc=CRC32(?) and article=? - this will work very much faster and without collisions.In
general, you can not really worry about it, for example, update the data "on the forehead" on a local computer, and then transfer the dumps.