K
K
Kirill Petrov2018-07-29 12:04:22
Python
Kirill Petrov, 2018-07-29 12:04:22

There is a news site parser. He collects articles, but how to make sure that he does not collect unnecessary paragraphs?

There is a parser, it goes through the navigation page, collects urls, follows the urls and copies news articles. The whole problem is that the articles have tags that are also parsed, but I don't need them. How can this be bypassed. The unwanted tag has an "insert" class, but the strip() method doesn't help, it removes all the tags.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Mikhail Bobkov, 2018-07-29
@mike_bma

You need either regular expressions or dom crawler. Look for an unnecessary tag and remove it along with the content.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question