Answer the question
In order to leave comments, you need to log in
What is the best way to parse html?
Prompt tool for parsing html.
BS and pyQuery do not offer!
Answer the question
In order to leave comments, you need to log in
Apparently, the text of the message comes in unicode, and the encoding of the file with the code is set as ascii, that is, the file name.eml for appending in text mode (mode='a') also opens in ascii. And when trying to convert unicode to ascii, the program encounters characters that are in the first encoding, but are missing in the second.
You can either transfer the file with the code to unicode mode, or open the file for storing letters in binary append mode (mode='ab').
What is an allergy to bs? Especially if lxml is already installed, we simply pass it as a parameter to the constructor, and it uses it by default further. Personally, I liked working with bs.
From the point of view of performance - it is best to parse with regular expressions.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question