R
R
Ruslan722014-09-18 08:03:42
MySQL
Ruslan72, 2014-09-18 08:03:42

Need a reliable php site parser?

You need to extract some information from real estate sites, there are several sites, you will have to parse a lot of pages.
You need a flexible, simple and heavy-duty parser, that is, you want a ready-made solution in the form of a php framework.

Answer the question

In order to leave comments, you need to log in

7 answer(s)
I
idShura, 2016-09-12
@idShura

the question is not entirely clear, does the updated_at field before deleting null or contain a value?

SELECT DATE_FORMAT(`created_at`, '%y-%m-%d') as`created_at`,
       SUM(CASE WHEN updated_at is null THEN 1 ELSE 0 END) AS "created_at_count",
       SUM(CASE WHEN updated_at is not null THEN 1 ELSE 0 END) AS "updated_at_count"
FROM `table` 
GROUP BY DATE_FORMAT(`created_at`, '%y-%m-%d')

or like this:
SELECT DATE_FORMAT(`created_at`, '%y-%m-%d') as`created_at`,
       SUM(1) AS "created_at_count",
       SUM(CASE WHEN updated_at is not null THEN 1 ELSE 0 END) AS "updated_at_count"
FROM `table` 
GROUP BY DATE_FORMAT(`created_at`, '%y-%m-%d')

D
Dmitry Entelis, 2014-09-18
@DmitriyEntelis

I do not recommend using phpQuery and others like them.
In several projects, ~400,000 pages were parsed daily.
Tried Simplehtml and phpQuery. On the one hand, yes, queries are written easily and conveniently.
On the other hand, the average processing time for a 500kb page was a few seconds, most of the time was spent building the DOM.
6 threads loaded the powerful Xeon at 100% + ate memory like hell.
A self-written solution through regexp processed the same page in ~ 30ms, which would take ~ 40 threads to load the processor.

A
Alexey Likhachev, 2014-09-18
@Playbot

choose here , otherwise you will have to finish and solve the problems of the ban almost manually

A
Alex Danilin, 2014-09-18
@alexdspb

phpQuery makes it easy to parse websites using the DOM. The way it works is similar to jQuery.

N
Novomirskoy, 2014-09-18
@Novomirskoy

I will recommend one of the Zend Framework 2 components - Zend\Dom\Query .

A
Alexander Taratin, 2014-09-18
@Taraflex

The fastest is regular expressions, or look towards other languages. For example D (the standard library already has support for curl).

M
mov2608911, 2015-11-24
@mov2608911

I would recommend that you use the services of a trusted company, as I used to be an online toy store, so as not to hire a person for posting, I ordered a parser from parsing.by , they did everything relatively quickly, and most importantly - as I needed! Therefore, I recommend if anything)))

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question