Why is a blank page returned when parsing a page?

R

ryzhak2014-07-20 17:01:45

PHP

ryzhak, 2014-07-20 17:01:45

there is a php code that parses 2 pages:

$url1 = 'http://www.championat.com/football/_england/773/calendar/date.html';
$url2 = 'http://www.championat.com/football/_england/1042/calendar/tour.html';

//Вернет пустую страницу
echo HtmlDomParser::file_get_html($url1);

//Вернет уже страницу с контентом
echo HtmlDomParser::file_get_html($url2);

So when we parse $url2, it is parsed normally, but the link from $url1 is not parsed, that is, an empty rusaltate is returned, and not the desired page. Why? Where to dig?
Thanks in advance
UPD:
Found what the problem is. I used simple_html_dom from this package https://packagist.org/packages/mgargano/simplehtmldom. There are lines in the file_get_html function in the class code:

$contents = file_get_contents($url, $use_include_path, $context, $offset);
    if (empty($contents) || strlen($contents) > MAX_FILE_SIZE)
    {
        return false;
    }

That is, if the length of the content that we are parsing is greater than the maximum length, then we do nothing. In general, we change the value of the MAX_FILE_SIZE constant from 600000 to 6000000 and everything works. It should also be taken into account that when updating the composer in the project, the sources will be overwritten by the new version.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

V

Vladimir Fokin, 2014-07-20
@vfokin

Check which headers come from the first link.

A

Alexey Pavlov, 2014-07-20
@lexxpavlov

try
writing instead

$dom = HtmlDomParser::file_get_html($url1);
var_dump($dom);

So you can see what the command actually returns. Will there be an object of class simple_html_dom or not.