M
M
mts20502020-02-07 14:11:37
JSON
mts2050, 2020-02-07 14:11:37

What program can quickly convert json to csv?

Please tell me which program can quickly convert complex and large (3 GB) JSON to CSV.
I tried to do it through Excel, it does well, but it takes a long time. Maybe there are other programs.
PS There are over 500 columns in the file.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Andrew, 2020-02-07
@mts2050

  • https://github.com/jehiah/json2csv - Converts a stream of newline separated json data to csv format
  • https://github.com/yukithm/json2csv - Convert JSON to CSV. CSV header is generated from each key path as a JSON Pointer. json2csv can be used as a library and command line tool.


In the first one, you need to manually specify the columns, the number of columns is large.
The second one fails to install. Some mistakes.

I think on Github there is the right one
https://github.com/search?q=json+to+csv

S
Sergey Pankov, 2020-02-07
@trapwalker

import csv
import sys
import typing
import json


def deep_walk(j, path=()):
    if isinstance(j, dict):
        for k, v in j.items():
            yield from deep_walk(v, path + (f'.{k}',))
    elif isinstance(j, list):
        for i, v in enumerate(j):
            yield from deep_walk(v, path + (f'[{i}]',))
    else:
        yield path, j


def json2csv(data: typing.Union[dict, list], dest: typing.TextIO = None):
    field_set = set()
    records = []
    if isinstance(data, dict):
        data = [data]

    for item in data:
        record = {''.join(path).lstrip('.'): value for path, value in deep_walk(item)}
        records.append(record)
        field_set.update(record.keys())

    w = csv.DictWriter(dest or sys.stdout, fieldnames=list(sorted(field_set)))
    w.writeheader()
    w.writerows(records)


if __name__ == '__main__':
    if sys.argv[1:]:
        with open(sys.argv[1]) as f:
            json2csv(json.load(f))
    else:
        json2csv(json.load(sys.stdin))

The input json can be passed as a filename as the first parameter, or sent to the program via stdin.
I posted to show that such things are not "rocket science". You just need to think and do a little.
But how to make the program not need to load all the input json into memory before starting to write to the output - this is already much more interesting.
For example, you can set a lot of spaces at the beginning of the file (so that there is probably enough for the header), and then, after the output of the entire file, draw the header to the beginning in the desired order. But this is such an idea. Smells like crutches.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question