How to parse text with Content-Encoding: gzip?

M

MeinJun2020-06-04 16:17:19

Python

MeinJun, 2020-06-04 16:17:19

Here is the soup function:

def get_content(html):
            soup = BeautifulSoup(html, 'html.parser')
            items = soup.findall('div', class='column')

            for item in items:
                info = {
                    'title': item.find('h1', class_='page__title title').get_text(),
                    'timedate': item.find('span', class='date-display-single').get_text(),
                    'textstat': item.find('div', class='field-item even').get_text(),
                }
                news.append(info)
            return news

And here is what is output to the console:

{'title': 'Doctors warned about the danger of a sleepless night', 'time_date': '01.01.2020 at 17:34', 'text_stat': '01.01.2020 at 17:34' }]

i.e. instead of the text of the article, the date is displayed, although all classes have been checked a thousand times, everything goes where it should.
In the element code, the class is marked like this - div class="field-item even" property="content:encoded"
in the header of the network - Content-Encoding: gzip

Is it possible to somehow parse such text in the article? And in general, is this a snag?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

M

MeinJun, 2020-06-04
@MeinJun

Okay, sorry for wasting your time. My mistake, I still got confused in thousands of classes and did not see the right one