JSON compression

S

sdevalex2012-10-10 02:20:33

JavaScript

sdevalex, 2012-10-10 02:20:33

JSON compression

Are there ready-made libraries that remove keys from a collection of identical objects and write them separately? (in the form of pack, unpack functions, and maybe something else)

Ie .

[{ "data1": 1, "data2": 2, "data3": 3 },
 { "data1": 1, "data2": 2, "data3": 3 },
 { "data1": 1, "data2": 2, "data3": 3 }, ...]

AT

{
   data: [[1, 2, 3], [1, 2, 3]. [1, 2, 3], ...],
   keys: ['data1', 'data2', 'data3']
}

Or is it saving on matches?

Reply

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)

A

Alexander Lozovyuk, 2012-10-10
@sdevalex

I think you are talking about this: www.cliws.com/e/06pogA9VwXylo_GknPEeFA/

N

neyronius, 2012-10-10
@neyronius

If you enable GZip compression when uploading from the server, you will get the same thing, but transparently - the archiver will build a frequency tree for repeating strings and replace each of them with several bits.
Using the approach you found also restricts the data format - all records must have the same structure.

S

Sergey, 2012-10-10
Protko @Fesor

First of all, this is too specific optimization. I suppose I have never met the need for such a thing. This is still not even an optimization, but a way of aggregating data.
Secondly - yes, this is saving on matches. Even if you have megabytes of this data, you will still have to process the data on the client / server. I would rather save CPU time.

O

Oronro, 2012-10-10
@Oronro

This "optimization" is more dependent on how this data will be processed and used.
The first option has the right to life, since it allows you to transfer table rows in “pieces” of the stream (the data structure and the proposed optimization are more like a table description), moreover, it allows you to skip the values of specific “columns”, considering that the handler will automatically substitute `null there ` values - no data.
The second is a quite logical optimization, however, such a “table” cannot be transferred by a stream, since the structure does not provide for this by design and `null` values can no longer be skipped, otherwise the order of the values of the columns will be lost.
Total: it depends. The need for this "optimization" does not depend on savings, but on the way the data is processed.
PS If you need to get a more compact representation of JSON data, while not being afraid of the binary format and do not want to mess with gzip - you can try UBJSON - with full compatibility, the size is usually less than 20-40 percent, especially with a small number of ascii strings and an abundance of unicode and numerical values.

P

pletinsky, 2012-10-10
@pletinsky

No, this is not saving on matches. This is much worse - combining responsibility for data format with responsibility for data compression. This is fraught with a lot of non-obvious problems associated with the processing of such data.
If we consider this approach as a kind of content-dependent archiving, then it has the right to life. But its effectiveness must be compared with other solutions - for binary data (content-independent) and text. It is necessary to compare the gain in compression and the speed of archiving and unzipping. Even if you win - of course only in a specific json - it's unlikely in a typical one. And in order for the game to be worth the candle (that is, the difference with standard archivers was of business importance on the project) - in my opinion, some kind of exotic conditions are needed.
But I could be wrong - you can research and verify.
Or use ready-made solutions for compressing text data.
For xml, unlike json - there is compression specific to xml over text compression - it's much easier there.