Answer the question
In order to leave comments, you need to log in
How to pack a dataframe into a zip archive containing several csv files?
There is a dataframe, let's say for 1 million lines, you need to save the result in one zip archive containing 100 csv files, each with 10k lines. Implemented a function with pandas that divides the dataframe into these files, but archives each one separately
def result_writer(data):
chunk_size = 10000 #по сколько строк делить файл
counter = 0
for chunk in pd.read_csv(data, chunksize=chunk_size):
counter = counter + 1
chunk.to_csv(f'file_{str(counter)}.csv.gz',compression='gzip',index=False)
Answer the question
In order to leave comments, you need to log in
The ZIP format supports streaming of these files. You just need to first generate a header for each CSV, and then give a line with data to the CSV generator . That, in turn, must fill the Deflate streaming buffer.
Do not confuse GZip and ZIP. They are completely different formats. ZIP is a container (archive) for many files and supports various types of compression. And GZip is a format for streaming only one file.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question