How to get one file from ZIP archive on s3 (boto3 + python)?

V

Vadim Apenko2018-09-25 18:48:11

Python

Vadim Apenko, 2018-09-25 18:48:11

The zip is stored on Amazon S3. Large. There is no way to download it all. You need to get one small xml file from it into memory (yes, you can not download it, you only need the contents). How can this be done using python + boto3 + zipfile + io + maybe something else.
You need to open a connection to this file through Boto in a stream or download it in chunks. and run through zipfile, get exactly where the file I need is stored, and download and unzip the necessary piece.

conn = S3Connection(a,b)
bucket = conn.get_bucket(bucket)
key = bucket.get_key(key_path)
byte_block = key.get_contents_as_string(headers={'Range': 'bytes=22000-23000'})

So you can fish out a piece of the file ... I already understood this .... But maybe this is not the right decision?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

Roman Mirilaczvili, 2018-09-25
@2ord

There was a similar question:
Is it possible to extract a file from a zip archive over SMB?
I think the point is clear.
And you can extract a single file using Info-zip unzip in this way using the key -p:
That is, all this simple matter can be wrapped in Bash and redirected the output where necessary.

I

Ivan Shumov, 2018-09-25
@inoise

Without downloading the file, this cannot be done because s3 is not a file, but an object store. That is, not block storage.
For your solution, you can set up an EC2 instance in the same region, download the zip archive to it, unzip it there and upload the desired part back to s3 to download.