R
R
Ruslan Saifullin2015-09-12 18:21:51
PHP
Ruslan Saifullin, 2015-09-12 18:21:51

Why might a parsing (simple_html_dom) timeout occur?

I'm new to parsing.
The situation calls for using php. I am using php library simple_html_dom.
Connected correctly, the functions are visible.

require_once 'simple_html_dom.php';
$html = file_get_html('http://www.kommersant.ru/');

Only 2 lines of code and already there are problems. When you go to the page, it takes quite a long time to load, and as a result, the timeout for script execution is exceeded and an error 500 occurs.
As I understand it, the dom tree of this site is too large and the script does not have time to work out in the allotted time.
Has anyone had experience with this library?
How to make the script have time to work out?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
Vitaliy Orlov, 2015-09-12
@Shapito27

As you said, this is a timeout for the execution of the script by your server.
Try running the script from the command line:
# php parser.php
The second option is to increase the running time of the scripts, but I wouldn't recommend doing that. Parsers are best run from the command line.
In addition, note that this library has a method: $dom = str_get_html($html) (it seems so), respectively, you can first download the page using file_get_contents or Curl and then work with the content. This will help to separate the logic directly into loading and parsing content, which in turn will help to deal with each problem separately.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question