Answer the question
In order to leave comments, you need to log in
How to write large parsers of directories, sites in php?
Good afternoon!
I write parsers for xls, csv, yml catalogs, I use links to parse information from the site. Parsers share characteristics, create options, save pictures, and so on.
At first I did everything in one script with saving to the database, now I separate it - first by directory, then from the site I save everything in json, and only after that I run a separate script for saving to the database from json.
The question is this:
I would like to ask your methods and architecture for the work of parsers, directory parsers and yml.
How to deal with saving memory and buffer, mb there are simple methods of a multi-threaded parser and are they needed at all? Maybe someone divides it into stages of loading and after the completion of the script of the 1st stage, the second one is launched.
In general, I would like some new information in this area, who will advise what. I don’t particularly like libraries from github where everything is ready. I would like to write everything myself and understand every line of code.
Now the task is to parse yml with 6k products, at the same time follow the link to their website and from there save the description and links to the pictures. How can you speed up the addition and reduce the load?
Answer the question
In order to leave comments, you need to log in
It is worth thinking about using queues and building an architecture around them.
this bike is already a ton of years old...
1. if possible, parse on a separate host
2. console parser is free from some restrictions, such as runtime
3. parser - parse, model - store, and I'm handsome :)
4. parse if possible data not from the front - use sitemap, prices, ajax controllers returning json
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question