A
A
Adik Izat2020-12-15 23:17:25
PHP
Adik Izat, 2020-12-15 23:17:25

How to correctly read data from a large XML file?

Good day, forum users! I ran into a problem when integrating goods from 1C to the site. We do so. The 1C programmer uploads the entire database with goods and offers via ftp. The files are relatively large 60Mb (products) and 45Mb (offers). The algorithm of my actions was as follows:
1) took xml files through file_get_contents and read XML as a string into variables;
2) processed the received variables with the simplexml_load_string(<variables from step 1>) function;
3) turned the received objects into an array using the json_decode(json_encode(<object from step 2>), true) construct, this item takes a lot of RAM performance.
4) from the array obtained in step 3, pulled out the necessary data in cycles and transferred it to the model to fill the base.
On previous projects, where there were a maximum of 10,000 products, this algorithm worked quite successfully. However, in the project received now, the number of goods is more than 30,000. When I try to unload, that is, execute the script described above, in this project, I get either

Fatal error: Out of memory (allocated 137887744) (tried to allocate 20975616 bytes)
, or an error 500/502/504, or the page hangs stupidly and does not end (tried to wait up to 2 hours). The site is located on VDS with 2GB of RAM. What solution can you suggest for processing large XML files? While I'm looking in the direction of XMLReader, however, using this method, it is clear that the code will turn out to be a complete mess. Another option that I keep in mind: read two files separately in different controllers, write only the data you need to a separate file. And later link products and offers by id. Please help kind people!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Adamos, 2020-12-15
@JaxAdam

Common problem. For large files, XMLReader is used, not SimpleXML.
It does not try to chew the entire file at once, but reads it line by line.

R
rPman, 2020-12-16
@rPman

3) turned the received objects into an array using the json_decode(json_encode(<object from step 2>), true) construct, this item takes a lot of RAM performance.
your problem is here!
Why are you doing it? associative arrays are objectively slower than working with objects, and even if necessary, you can always write (array)$obj at a specific level and work with an array of object fields, just as foreach works fine with object fields as with array elements.
ps and most importantly, do not work with xml on the site, convert the data to a form more convenient for php, for example, serialize or even var_export (php code makes it initialize an array, you can just include or eval it), do it at the time of loading the xml file on site admin, and when working with data, upload the file in a convenient way.
Well, the classic - the data must be stored in the database (loading is slower, but it works the fastest)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question