V
V
Vladislav2014-07-11 09:35:30
Python
Vladislav, 2014-07-11 09:35:30

What is the best way to parse html?

Prompt tool for parsing html.
BS and pyQuery do not offer!

Answer the question

In order to leave comments, you need to log in

8 answer(s)
V
Valery Ryaboshapko, 2016-03-24
@valerium

Apparently, the text of the message comes in unicode, and the encoding of the file with the code is set as ascii, that is, the file name.eml for appending in text mode (mode='a') also opens in ascii. And when trying to convert unicode to ascii, the program encounters characters that are in the first encoding, but are missing in the second.
You can either transfer the file with the code to unicode mode, or open the file for storing letters in binary append mode (mode='ab').

R
ring0za, 2014-07-27
@Hateman31

What is an allergy to bs? Especially if lxml is already installed, we simply pass it as a parameter to the constructor, and it uses it by default further. Personally, I liked working with bs.

A
Andrey K, 2014-07-11
@mututunus

lxml.de/lxmlhtml.html

R
RPG, 2014-07-11
@RPG

beautifulsoup

D
Dmitry Entelis, 2014-07-11
@DmitriyEntelis

From the point of view of performance - it is best to parse with regular expressions.

H
Heafy, 2014-07-11
@Heafy

html.parser - python3
HTMLParser - python2

R
Ranwise, 2014-07-11
@Ranwise

grab

S
skomoroh, 2014-07-13
@skomoroh

lxml

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question