P
P
PO6OT2015-11-09 20:06:01
PHP
PO6OT, 2015-11-09 20:06:01

Where to find or from what is it easier to cut a PHP crawler library with 3 simple functions?

Prompt a library with 3 functions - to index a page, get a list of pages for indexing and get the result of a query.
Without PCNTL, semaphores and other special requirements (that is, without the loops themselves, which endlessly run the indexer). MySQL is of course needed.
Everything I found doesn't work:
phpcrawl.cuab.de :

In order to run phpcrawl in multi-process-mode, some additional requirements are needed:
The multi-process mode only works on unix-based systems (linux)
Scripts using the crawler in multi-process-mode have to be run from the commandline (PHP cli)
The PCNTL-extension for php (process control) has to be installed and activated.
The SEMAPHORE-extension for php has to be installed and activated.
The POSIX-extension for php has to be installed and activated.
The PDO-extension together with the SQLite-driver (PDO_SQLITE) has to be installed and activated.

Not suitable.
www.sphider.eu is too cumbersome, but it performs its function exactly as I need it. It is very difficult to cut out the parts that I need from it.
The fact is that the cycle that should index the pages is organized not with the help of for or while (since I am going to run everything on the hosting, and set_time_limit is prohibited there), but very unusual (the script is executed, then pings itself and exits when the next copy of it is started by that ping), which requires changing the architecture of the engine or, much easier, using a simple library and indexing one page in each iteration.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
C
Cat Anton, 2015-11-09
@27cm

https://github.com/search?l=PHP&q=php+crawler&type...
Hosted search engine? Who needs it anyway?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question