K
K
kapai692015-02-03 12:09:53
PHP
kapai69, 2015-02-03 12:09:53

How to organize the logic of the parser?

For example, there is a table
d3e94e9174148e20126f4b1f1f0dd90d.png
The goal is to parse the pages, in small chunks, like 5 at a time.
What is the easiest way to mark already parsed urls in order to select the next block?
They should be parsed endlessly, that is, they all parsed, started from the beginning, several times a day.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
I
IceJOKER, 2015-02-03
@IceJOKER

Add one more column - status.
Choose 5 values ​​that have status 0, if there are none, then update all status to 0 and start over, and if there are, then parse and write to status 1.
And of course, set up cron to run the script every 5 minutes or as you wish

E
entermix, 2015-02-03
@entermix

If you need to parse each URL only N times in a certain time, then add the "created" field with UNIX Timestamp (or parse from the date that already exists) and check how much time has passed since the last check, update the date accordingly, and set the cron, for example every minute, then everything will be up to date at the specified time)

Q
Quattro Vias, 2018-09-26
@Quattro_Vias

I'm the only one who doesn't understand, why is the cycle bad? (make a timer after every 5 - if you need it that way)
Working time is the same, the result is the same ... and put it in krones
Depends on the project and what information you get + for what.
For example, make a database with an id and a link to sites that need to be parsed. (By id there will be a cycle).
So it will be: List of links with id + task in the crown. Easy)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question