How to parse site date?

R

r4khic2019-09-04 10:57:34

Python

r4khic, 2019-09-04 10:57:34

Greetings to all. I parse the date of the news from this portal . And after these received dates I convert to the format I need using the dateparser library . And such a dilemma occurred. This portal has two types of dates. That is, like this: 1 date type 2 date type
And now the 1st type has such a date
And the 2nd type has such a date
2nd type of dates is parsed without problems. But type 1 displays None because I use the dateparser
library Here is the format to which I convert using the dateparser library

2019-09-03 13:54:23

I think 1 date type outputs None because the dateparser library needs a year to convert to the format I need.
Here is a piece of code that parses dates:

the code

# < Собираем даты с страницы.
def get_item_datetime(item_page,datetime_rule,datetime1_rule):
    if item_page is None:
        return
    soup = BeautifulSoup(item_page, 'lxml')
    item_datetime = soup.find(datetime_rule[0],{datetime_rule[1]:datetime_rule[2]})
    if item_datetime is not None:
        item_datetime = soup.find(datetime_rule[0],{datetime_rule[1]:datetime_rule[2]}).text
        item_datetime = dateparser.parse(item_datetime, date_formats=['%d %B %Y %H'])
    else:
        if (len(datetime1_rule) == 3):
            item_datetime = soup.find(datetime1_rule[0],{datetime1_rule[1]:datetime1_rule[2]}).text
            item_datetime = dateparser.parse(item_datetime, date_formats=['%d %B %Y %H'])
        else:
            item_datetime = ''
    return item_datetime

How to parse 1 date type? And convert to this format

2019-09-03 13:54:23

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

I

Ivan Yakushenko, 2019-09-04
@r4khic

Both pages have the format meta tag
Take it parse withdateutil

import requests

from lxml.html import fromstring
from dateutil import parser as dtparser

r = requests.get(url, headers=header)
html = fromstring(r.text)
dt_string = html.find('.//meta[@name="mediator_published_time"]').get('content')
dt_obj = dtparser.parse(dt_string)

>>> datetime.datetime(2019, 9, 2, 22, 51, tzinfo=tzoffset(None, 21600))

Y

Yura Khlyan, 2019-09-04
@MAGistr_MTM

How about adding a year? And what will the date look like for last year then?