V
V
Vladislav2021-10-26 14:10:12
Python
Vladislav, 2021-10-26 14:10:12

How to pack a dataframe into a zip archive containing several csv files?

There is a dataframe, let's say for 1 million lines, you need to save the result in one zip archive containing 100 csv files, each with 10k lines. Implemented a function with pandas that divides the dataframe into these files, but archives each one separately

def result_writer(data):
        chunk_size = 10000 #по сколько строк делить файл
        counter = 0
        for chunk in pd.read_csv(data, chunksize=chunk_size):
            counter = counter + 1
            chunk.to_csv(f'file_{str(counter)}.csv.gz',compression='gzip',index=False)

How can I fix it so that I get one zip archive, and not a hundred?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
Roman Mirilaczvili, 2021-10-26
@2ord

The ZIP format supports streaming of these files. You just need to first generate a header for each CSV, and then give a line with data to the CSV generator . That, in turn, must fill the Deflate streaming buffer.
Do not confuse GZip and ZIP. They are completely different formats. ZIP is a container (archive) for many files and supports various types of compression. And GZip is a format for streaming only one file.

P
pfg21, 2021-10-26
@pfg21

if you need a zip archive then use it.
gzip is different, it's purely a compressor that only contains "one file"

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question