Answer the question
In order to leave comments, you need to log in
There is a news site parser. He collects articles, but how to make sure that he does not collect unnecessary paragraphs?
There is a parser, it goes through the navigation page, collects urls, follows the urls and copies news articles. The whole problem is that the articles have tags that are also parsed, but I don't need them. How can this be bypassed. The unwanted tag has an "insert" class, but the strip() method doesn't help, it removes all the tags.
Answer the question
In order to leave comments, you need to log in
You need either regular expressions or dom crawler. Look for an unnecessary tag and remove it along with the content.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question