Answer the question
In order to leave comments, you need to log in
Will multi_curl improve performance in my case and what are the ways to optimize parsing?
Hello,
There is a server with a large number of IP addresses and a small PHP 7 parser script. There are also several online stores with a huge assortment, ~ 15,000 positions, one position = one page. When a visitor enters the product page, if there is no actual data in the cache, a request is sent to the parser, it processes it and returns the result, which is cached for 30 minutes.
Actually, the parser script itself is straightforward. We get the product ID, parse the required information using curl and return it. At the moment, the parsing server is coping with its task, but the idea that its work can be somehow optimized does not leave. I saw examples with multi_curl, but there is work with an already existing, generated list of addresses, according to which the work is parallelized. I saw an example with Pthreads , in my opinion, this is also not right. Dear experts, what other methods of performance optimization can be carried out? :)
Answer the question
In order to leave comments, you need to log in
Have you tried to do this not at the request of the user (visiting the page), but as a background script that would itself check the relevance and already request data from a third-party server?
I see the following bottleneck:
Suppose 100 users visit the site at the same time and all direct their requests to the same / different pages, the data for which has already expired. Will you send 100 requests to an external server, and then refresh the data in the cache 100 times? And if the service increases the timeout? Well, for example, up to 0.5 seconds per request and you will completely fill up your Internet channel?
Suggested option:
You get the page and save it in the cache, let's say for 1 day and write data on expired for 30 minutes / 1 hour. The background script checks the relevance of these pages once every 30 minutes/1 hour, and if the page has been updated (!) - updates the data, updating the cache and increasing expired. In this script, you can use multi-curl and threads / processes and whatever you want to speed up (at least write in go). If it so happened that for some reason the script did not reach the page for 1 day, then the script sends a corresponding notification to the administrator and manually pulls the page as in the previous case (and still saves it in the cache).
In this case, no matter how many users visit the page, they will always see up-to-date data, and in a clumsy case, you will always get a fallback to the current version and then you will need to figure out what the reason is (ideally, the case with a fallen background script should not generally come, but if it happens - there is a falback)
Dear experts, what other methods of performance optimization can be carried out?
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question