Get a list of files from a ZIP

StMechanus2013-11-11 14:43:37

Java

StMechanus, 2013-11-11 14:43:37

There is a bytecode of the zip file. You need to get a list of files in the archive. I looked at the specification - so far everything is very chaotic. I decided to look at the code in a hex editor, I noticed that all the files in the archive are described at the end of the archive. Here is an excerpt from the test archive code:

Code:
04 00 00 80 08 00 00 0C 00 18 00 00 00 00 00 00 00 00 00 B4 81 00 00 00 00 50 65 72 73 6F 6E 2E 63 6C 61 73 73 55 54 05 00 03 8A 44 71 52 75 78 0B 00 01 04 E8 03 00 00 04 E8 03 00 00 50 4B 01 02 1E 03 14 00 00 00 08 00 53 71 61 43 77 E4 43 50 16 7D 0A 00 BC C4 0A 00 08 00 18 00 00 00 00 00 00 00 00 00 B4 81 76 04 00 00 74 65 73 74 2E 67 69 66 55 54 05 00 03 3E 9A 73 52 75 78 0B 00 01 04 E8 03 00 00 04 E8 03 00 00 50 4B 05 06 00 00 00 00 02 00 02 00 A0 00 00 00 CE 81 0A 00 00 00

Text representation:

...................´..... Person.class UT....DqRux.... è....è...PK..........SqaCwäCP.}..¼Ä..............´.v... test.gif UT ...>.sRux....è....è...PK.......... ...Î… Tell me

how to process an archive to get a list of files in it

Answer the question

In order to leave comments, you need to log in

12 answer(s)

Alexey Solovey, 2013-11-13
@StMechanus

Well, actually the example above, parsed in a little more detail:


50 4B 03 04 //сигнатура хедера локального файла, читать как 0x04034b50
14 00 // version needed to extract, зависит от фич, использованных при 
      // создании архива, в основном различные алгоритмы сжатия
      // 0x14 соответствует версия 2.0
  
00 00 // general purpose bit flag - свалка различных флагов. Некоторые используются
      // разными алгоритмами, некоторые не используются вообще, некоторые зарезервированы
      
08 00 // метод сжатия, в данном случае Deflate
0D 9D и  5E 43 // DOS-время последней модификации файла, см. структуру FILETIME в WinAPI

33 C3 3A 4D // CRC-32 файла
30 04 00 00 // размер сжатого файла, 1072 байт
80 08 00 00 // размер исходного файла, 2176 байт
0C 00 // совершенно верно, длина имени файла
1C 00 // длина дополнительного поля
50 65 72 73 6F 6E 2E 63 6C 61 73 73 // собственно искомое имя файла. Никакого юникода, старый добрый ASCII

I hope this helped, if you have any questions - ask, I once had a lot of headaches with zip files.

mrstrictly, 2013-11-11
@mrstrictly

Something like this:

import java.io.FileInputStream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public class LsZip {
    public static void main(String[] args) throws Exception {
        byte[] zipBytes = ...;
        ZipInputStream zis = new ZipInputStream(new ByteArrayInputStream(zipBytes));
        ZipEntry entry;
        while ((entry = zis.getNextEntry()) != null) {
            // здесь делайте то, что вам требуется с entry.getName()
        }
    }
}

lexdevel, 2013-11-11
@lexdevel

docs.oracle.com/javase/7/docs/api/java/util/zip/ZipFile.html#entries ()
www.javamex.com/tutorials/compression/zip_individual_entries.shtml#.UoDGCfnxo3M

LuckyStarr, 2013-11-11
@LuckyStarr

Use the ZipFile class . Here is an example.

betal, 2013-11-11
@betal

I recommend hex editor neo
there are templates of popular formats, you can understand which bit is responsible for what.
And so there is nothing left but to read the detailed documentation.
By zip (although I can confuse with rar) if my memory serves me, first a brief description of the files, at the end their contents. (I don’t remember where the name is mentioned, maybe at the end)
Moreover, you can delete a brief description of a particular file at the beginning (by correcting a bunch of crc), and leave it at the end, this way you get a hidden file.
Why don't you use ready-made libraries?

StMechanus, 2013-11-11
@StMechanus

Let me clarify, guys. I want to work with bytecode directly. It is very interesting to write your own solution - the so-called training for yourself. I just can't figure out the specification.

StMechanus, 2013-11-11
@StMechanus

An interesting point, according to the header specification for a file in the central directory must correspond to 0x02014b50, I tried to find such a sequence of bytes in several archives - and did not find any matches. However, I noticed the sequence 0x504B0102. There are exactly the same number of such sequences in the archive as there are files (I checked on several archives). With what it can be connected. Knowledgeable people, help me understand

StMechanus, 2013-11-11
@StMechanus

The same for the local header, it should be 0x0x04034b50, but in fact we have 0x504b0304. Again, the number of such sequences is equal to the number of files

StMechanus, 2013-11-11
@StMechanus

Continuation of the observation: after such a signature "local header'a" its structure itself retains its authenticity. I'll give you an example.
50 4B 03 04 14 00 00 00 08 00 0D 9D 5E 43 33 C3 3A 4D 30 04 00 00 80 08 00 00 0C 00 1C 00 50 65 72 73 6F 6E 2E 63 6C 61 73 73
What corresponds to such a display in the hex editor
PK..........^C3Ã:M0...........Person.class
So, based on the local header'a format, before the file name there are 2 bytes indicating the length of the additional field, and before them 2 bytes indicating the length of the file name.
The actual value of 0С (12) corresponds to the length of the Person.class name.
I'm waiting for discussions on this topic

Alexey Solovey, 2013-11-13
@asolovey

Judging by the comments above, you have already figured out everything (except the byte order). In general, everything is correct, you need to go through all the local file header and get the name from them. As an example, I would advise you to read the minizip source code, which is part of zlib.

In particular https://github.com/madler/zlib/blob/master/contrib/minizip/miniunz.c#L234 and more about header parsing in https://github.com/madler/zlib/blob/master/contrib /minizip/unzip.c#L1136

This is not Java but C, but given that things are quite low-level, you can understand.

Boris Vanin, 2013-11-20
@fogone

If we talk about parsing meta-info from Java, I would first of all look at the source of the finished solution - the method creates an entry with basic information

aryank4564, 2020-03-14
@aryank4564

Learn how to compare two ArrayList in Java