Which parser is the fastest?

H

hrvasiliy2015-06-05 15:55:43

PHP

hrvasiliy, 2015-06-05 15:55:43

It is planned to receive information from sites with CURL'om or PhantomJS'om (if you have a better suggestion, I will be glad to hear it) and it needs to be processed with something, regular expressions are not considered. I used PHP Simple HTML DOM before, but this library is not capable of handling huge pages, and I'm not sure about its speed. Could you recommend any powerful and lightweight library for processing received information?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

E

evnuh, 2015-06-05
@hrvasiliy

The task is to quickly parse the html page. Thinking out a condition - a certain page, the content of which is known and more or less does not change. In this case, the answer is obvious - take html as text and work with it as text. IndefOf, substr etc. This is the fastest option.
It is followed by regexp in terms of performance, it is more convenient to write, but no more correct than picking a line.
The best way is to use the library. The performance loss here is enormous, but it is correct and safe.

B

beduin01, 2015-06-05
@beduin01

forum.dlang.org/thread/[email protected]

I

impass, 2015-06-06
@impass

If the size of the pages being processed is not measured in tens or hundreds of megabytes and the available memory is not very limited, then a regular DOMDocument using native libxml should be enough.
When talking about sampling from XML/HTML, always think of XPath first. In PHP, DOMXPath comes in handy in conjunction with DOMDocument .