I
I
Insayt2014-04-06 19:48:40
PHP
Insayt, 2014-04-06 19:48:40

Is PhpQuery acting weird?

I'm trying to parse the habr page habrahabr.ru/company/genue/blog/210610 using phpQuery.
The code is like this:

$url = 'http://habrahabr.ru/company/genue/blog/210610/';
 $html = file_get_contents($url);
 $doc = phpQuery::newDocument($html);
 echo $doc;

But at the output I get only such a piece of the page cMfvV3fR.png
. If you parse some specific div, for example with an article, then everything will be fine. The whole document does not want to be parsed.
Why do I need all this:
The task is to parse any page, insert a certain tag into the markup and save the markup to the database. PhpQuery easily manipulates the markup, but the parsing turned out to be a jamb ((

Answer the question

In order to leave comments, you need to log in

5 answer(s)
I
Insayt, 2014-04-06
@Insayt

In general, the issue is resolved.
Apparently phpQuery breaks off on a large number of home nodes, so I did this:
1.) Parse the head content from the enemy site
2.) In my case, add the desired tag to the head
3.) Cunning regular expression (stolen from foreign Internet) get the body content
4.) We glue and display
As a result, we have something like this

$url = 'http://habrahabr.ru/company/genue/blog/210610/';
$html = file_get_contents($url);
$doc = phpQuery::newDocumentHTML($html);
$doc['head']->prepend('<base href="'.$url.'" target="_blank"></base>');

preg_match("/<body[^>]*>(.*?)<\/body>/is", $html, $matches);

$new = '<html><head>'.$doc['head'].'</head><body>'.$matches[1].'</body></html>';

echo $new;

E
Eugene, 2014-04-06
@Nc_Soft

<?php
include 'phpQuery.php';
$url = 'http://habrahabr.ru/company/genue/blog/210610/';
$html = file_get_contents($url);
$doc = phpQuery::newDocument($html);
echo pq('html')->html();

N
Nikolay, 2016-08-20
@zzzmaikzzz

Add more - phpQuery::unloadDocuments();

S
Sergey, 2014-04-06
Protko @Fesor

check what file_get_contents returns, there may be a problem at this stage.

E
Eugene, 2014-04-06
@Nc_Soft

Another option

<?php
include 'phpQuery.php';
$url = 'http://habrahabr.ru/company/genue/blog/210610/';
$html = file_get_contents($url);
$doc = phpQuery::newDocument($html);
echo (string)$doc;

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question