R
R
r4khic2019-09-04 10:57:34
Python
r4khic, 2019-09-04 10:57:34

How to parse site date?

Greetings to all. I parse the date of the news from this portal . And after these received dates I convert to the format I need using the dateparser library . And such a dilemma occurred. This portal has two types of dates. That is, like this: 1 date type 2 date type
And now the 1st type has such a date 5d6f68e406903889304898.png
And the 2nd type has such a date 5d6f690db3602916189272.png
2nd type of dates is parsed without problems. But type 1 displays None because I use the dateparser
library Here is the format to which I convert using the dateparser library

2019-09-03 13:54:23

I think 1 date type outputs None because the dateparser library needs a year to convert to the format I need.
Here is a piece of code that parses dates:
the code
# < Собираем даты с страницы.
def get_item_datetime(item_page,datetime_rule,datetime1_rule):
    if item_page is None:
        return
    soup = BeautifulSoup(item_page, 'lxml')
    item_datetime = soup.find(datetime_rule[0],{datetime_rule[1]:datetime_rule[2]})
    if item_datetime is not None:
        item_datetime = soup.find(datetime_rule[0],{datetime_rule[1]:datetime_rule[2]}).text
        item_datetime = dateparser.parse(item_datetime, date_formats=['%d %B %Y %H'])
    else:
        if (len(datetime1_rule) == 3):
            item_datetime = soup.find(datetime1_rule[0],{datetime1_rule[1]:datetime1_rule[2]}).text
            item_datetime = dateparser.parse(item_datetime, date_formats=['%d %B %Y %H'])
        else:
            item_datetime = ''
    return item_datetime

How to parse 1 date type? And convert to this format
2019-09-03 13:54:23

Answer the question

In order to leave comments, you need to log in

2 answer(s)
I
Ivan Yakushenko, 2019-09-04
@r4khic

Both pages have the format meta tag
Take it parse withdateutil

import requests

from lxml.html import fromstring
from dateutil import parser as dtparser

r = requests.get(url, headers=header)
html = fromstring(r.text)
dt_string = html.find('.//meta[@name="mediator_published_time"]').get('content')
dt_obj = dtparser.parse(dt_string)

>>> datetime.datetime(2019, 9, 2, 22, 51, tzinfo=tzoffset(None, 21600))

Y
Yura Khlyan, 2019-09-04
@MAGistr_MTM

How about adding a year? And what will the date look like for last year then?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question