How to optimize writing to .xls file?

N

Never Ever2021-07-24 01:22:13

PHP

Never Ever, 2021-07-24 01:22:13

At the moment, there are problems with writing to a file, since the results of queries from the database have become VERY large, during file generation, memory consumption reaches about 2-3 GB. and this is already critical for the project, it must be reduced to at least 200 mb. You can sacrifice speed (now the recording takes about 10-15 seconds, so for now it's normal). I looked towards generators and SPL , but I can’t figure out how to use it in my case.
I seem to understand that it is not necessary to pull all the data at a time and store it in memory, but if you start taking partial data, then there are many heavy queries to the database (about 3-4 seconds), but how to connect all this xs.
I would also like to know if it is possible to somehow solve this problem with the help of the doctrine / symphony?
So what to do?
Below is an example of my code in a very simplified form

// берем данные
public function getData(array $data): array {
    $queryBuilder = 
        //..
        //..
        //..
    return $queryBuilder->getQuery()->getArrayResult();
}
// тут уже просто запись в файл
 public function generate($data) {
  $result = $this->getData($data);
   foreach ($result as $row) { 
     $xls->write($row['id']);
   }
  }

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

M

Maxim, 2021-07-24
@Target1

In addition to the answers above, I want to note that for such situations, data is more often prepared, and not “collected” on the fly.

The first thing you need to start with is to analyze which storage is right for you, maybe it won’t even be a relational database, but some kind of NoSQL. For more information about choosing a storage, you can read an article on Habr: https://m.habr.com/ru/post/487498/
Even if you don't want to change the MySQL storage to something else, then you should consider denormalizing the data into separate tables. Thanks to this, you do not have to make complex JOIN queries, and all the data will be ready-made from your tables. Such data will be uploaded to reports many times faster, but you will also need to monitor the relevance of the data. You can collect this data in different ways: by events at the stage of creation / modification / deletion, by cron, by request.
By optimizing queries and performing denormalization, you can go further and cache all queries. Due to this, the data will be taken not from the database, but from the cache, and this is always faster. At the same time, you should not forget that you need to update the data in the cache.
For greater acceleration, you should abandon objects in favor of arrays. Especially if the queries go to the database not on simple SQL, but through some kind of ORM like Doctrine. The doctrine maps data to objects, which greatly slows down work with data.
These are all common optimizations. You need to know more information about the project, about the problem, in order to analyze and come to some right decision.

G

Gennady S, 2021-07-24
@gscraft

And Excel calmly absorbs such a number of rows? There seems to be about a million rows, but Excel is not designed for this, if the data array is very large, then it is better to work with it through the DBMS.
There are many ways to optimize, but there are not so many universal ways, without knowing the business requirements, it is difficult to say the data structure more precisely. From the general. Firstly, yes, take data in portions, for example, 10k or 50k rows. Secondly, do not take the data again, cache (it is unlikely that you change all the data every time), saving Excel slices or denormalization. Third, optimize the data structure and/or queries. In this regard, Doctrine or another engine hardly plays a role, especially since you are using a query builder. Fourthly, if the memory loads the record in Excel, you can abandon the engine (again, it is not known if you use one?), write manually into the XML template, packing it into Excel again with your hands.

A

Anton Shamanov, 2021-07-24
@SilenceOfWinter

1. https://solutioncenter.apexsql.com/how-to-import-a...
2. as an option, you unload the database in csv format, write it to a file and convert it to xls.

SELECT order_id,product_name,qty
INTO OUTFILE '/data/orders.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM orders
WHERE foo = 'bar';

3. https://github.com/box/spout is pretty well optimized for big data

S

sl0, 2021-07-24
@sl0

I have my own exporterFactory that accepts qbProvider and Writer.
qbProvider returns iterate(), which is passed through foreach and gives the data as a generator. For every 1000 iterations of em->clear(). Writer simply writes everything through Spout. Perhaps this is not the best option, but it generates files up to 350 MB without problems.
ps If large files are required, then it may make sense to find out why? Once the customer asked to add so much data that it was necessary to generate gigabyte files. It turned out that he then simply counted lines in Excel for the report. The problem was solved by simply providing ready-made calculated data.