How to parse large XML files?

A

AleDv2016-07-05 09:08:52

Laravel

AleDv, 2016-07-05 09:08:52

Given: Large XML feed with properties in YML format, used by Laravel. The question is: how to parse such files more competently?
I can parse, for example, 5-10 ads in one run (writing to the database + uploading images to my host), so as not to exceed the script execution time. But the next time I start, I will read the same ads again if the feed is not replenished with new ones.
I was thinking of creating a table in the database in which to write down the news ids from the feed, so that when parsing, I check whether I watched the current ad or not. But then, it seems to me that this table will quickly begin to swell.
No other ideas came to mind. And I would be grateful if you could tell me the more efficient way to parse heavy feeds.
UPD. You must first download the entire feed, and then download new records as the feed is updated.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

A

Andrey Shilov, 2016-07-05
@Dry7

You can save the last ID and check whether it is greater or not. Or a date.
But in general it depends on the task, perhaps it is better to save the ID, and then take them with one request at the beginning of the script and compare them with xml.

D

Dimonchik, 2016-07-05
@dimonchik2013

parse efficiently with Python
lxml, for example
, and in general, do everything efficiently with Python

V

Vyacheslav Plisko, 2016-07-05
@AmdY

Create a team, hang up on crowns. In console mode, there is no time limit, they pulled out the feed, went through xmlreader, since it's big. no freezes needed. if it’s really tight, you can use the queues, since they are out of the box in the stall