How to organize the JSON parsing algorithm more efficiently in this case?

U

uniquepeero2020-04-23 12:18:16

Python

uniquepeero, 2020-04-23 12:18:16

Greetings. Performed JSON processing test task.

Exercise

Вывести json в котором каждый элемент это:
- неудаленная новость из списка новостей из файла news.json.
- для каждой новости указано кол-во комментариев этой новости из файла comments.json
- для каждой новости указана дата и время последнего (самого свежего) комментария

В списоке новостей должны отсутствовать новости, дата публикации которых еще не наступила.
Даты в файле хранятся в формате ISO 8601(%Y-%m-%dT%H:%M:%S) и должны отдаваться в том же формате.

Формат ответа:

news: [
        {
            id: int,
            author:	str,
            publishedAt: str,
            image:	str,
            teaser: str,
            isDeleted: bool,
            lastComment: str,
            commentsCount: int
        }
    ]

Part of the news.json file

{
    "news": [
        {
            "author": "Cynthia Pruitt",
            "content": "Democra.",
            "id": 90,
            "image": "https://",
            "isDeleted": false,
            "publishedAt": "2019-02-23T00:33:00",
            "teaser": "Successful without."
        },
        {
            "author": "Lorraine Lewis",
            "content": "Finish doctor .",
            "id": 78,
            "image": "https://",
            "isDeleted": false,
            "publishedAt": "2019-03-05T14:54:33",
            "teaser": "It during "
        },
        {
            "author": "Michael Ramirez",
            "content": "Recent seat.",
            "id": 29,
            "image": "https://",
            "isDeleted": false,
            "publishedAt": "2019-03-18T19:48:47",
            "teaser": "Entire respond."
        },
        {
            "author": "Lisa Johnson",
            "content": "Simple wide ",
            "id": 60,
            "image": "https://www",
            "isDeleted": true,
            "publishedAt": "2019-03-06T02:26:43",
            "teaser": "Brother can window"
        },

        ...

     ]
}

Part of the comments.json file

{
    "comments": [
        {
            "comment": "Perhaps as .",
            "newsId": 5,
            "publishedAt": "2019-03-12T18:50:47",
            "user": "scott17"
        },
        {
            "comment": "Everybody ",
            "newsId": 64,
            "publishedAt": "2019-03-15T16:22:50",
            "user": "keithstanle"
        },
        {
            "comment": "Total Democrat .",
            "newsId": 23,
            "publishedAt": "2019-03-22T16:01:13",
            "user": "kathleendouglas"
        },
        {
            "comment": "Yes American.",
            "newsId": 33,
            "publishedAt": "2019-03-03T03:52:55",
            "user": "nicholasjohnson"
        },

        ...

    ]
}

My decision

# news = news.json
# comments = comments.json
def foo():
    good_news = []
    for news_item in news['news']:
        if datetime.now().isoformat() > news_item['publishedAt'] and not news_item['isDeleted']:
            news_id = news_item['id']
            comments_counter = 0
            last_comment = datetime.strptime('2000-01-01T00:00:00', '%Y-%m-%dT%H:%M:%S')

            for comment in comments['comments']:
                if comment['newsId'] == news_id:
                    comments_counter += 1
                    comment_time = datetime.strptime(comment['publishedAt'], '%Y-%m-%dT%H:%M:%S')
                    if comment_time > last_comment:
                        news_item['lastComment'] = comment['publishedAt']
                        last_comment = comment_time

            news_item['commentsCount'] = comments_counter
            good_news.append(news_item)    

    return {"news": good_news}

The code is working, they just made a comment about the complexity of the algorithm. "The cycle is in a cycle, it can be much better. The total complexity is N*M, where N is the number of news, M is the number of comments"

I thought at first to parse the comments, collecting news IDs into a dictionary with keys, but there I got the same cycle in cycle N*M

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

K

kn1ght_t, 2020-04-23
@uniquepeero

"At first I thought about parsing comments, collecting news IDs into a dictionary with keys, but there I also got a cycle in the N*M cycle"
why? first we parse all the comments into a dictionary
, this is M operations,
then we go through all the news - these are N operations,
total N + M
where is the cycle in the cycle?

V

Vadim Shatalov, 2020-04-23
@netpastor

When processing arrays, bad practice filter inside the loop

for news_item in news['news']:
    if datetime.now().isoformat() > news_item['publishedAt'] and not news_item['isDeleted']:

replace with

news_filter = lambda i: datetime.now().isoformat() > i['publishedAt'] and not i['isDeleted']
for news_item in filter(news_filter, news['news']):
   ...

Same with the inner loop.