A
A
aronsky2014-03-07 18:01:58
PHP
aronsky, 2014-03-07 18:01:58

What algorithm can parse json in php (critically low memory consumption)?

I’ll clarify right away: the task has many other solutions - increasing the memory limit for the script, using MongoDB, using a different record syntax, and so on. For various reasons, these options are not suitable. In addition, there is a sports interest.
So: in one project, logs are stored in the form of json records in files. This decision was made to simplify the storage and retrieval of information data, which can be both arrays and objects.
Naturally, there was a problem with using json_decode: even a small log file (10mb) takes more than 128mb of memory (at the peak) for decryption. The entire file must be read in its entirety. The analysis tool has functions for sorting and filtering records.
What is a log file:

{
  "timestamp":"2014-03-04T13:16:13+01:00",
  "message":"start exec test",
  "priority":1,
  "priorityName":"ALERT"
},{
  "timestamp":"2014-03-04T13:16:13+01:00",
  "message":"got logname",
  "priority":2,
  "priorityName":"CRIT",
  "info":"cronLogTest"
},{
  "timestamp":"2014-03-04T13:16:14+01:00",
  "message":"Some additional info",
  "priority":7,
  "priorityName":"DEBUG",
  "info":[
    {
      "Type":"rec",
      "Name":"name",
      "Description":"desc",
      "Lang":"EN"
    },{
      "Type":"rec",
      "Name":"name2",
      "Description":"desc2",
      "Lang":"DE"
    }
  ]
},{
  "timestamp":"2014-03-04T13:16:15+01:00",
  "message":"stop exec test",
  "priority":1,
  "priorityName":"ALERT"
},

The absence of a container guarantees the integrity of the file during a crash, the container is added just before parsing.
So, the first thing that comes to mind is to parse only the top elements: they all have the same fields and parsing them will take less memory than recursively traversing the entire structure with json_decode (or another parser that has been tested and did not show great efficiency) and these fields will be sufficient for sorting and filtering. Nested records can be decoded just before the information is sent to the frontend (a paginator is used, so there is no need to decode all the information at once).
So 2 questions:
1. How would you approach the issue of manual parsing of such a structure? I have ideas for algorithms, but I'm interested in an outside perspective. Naturally critical resource is memory consumption (speed in the background).
2. Do you think this option is acceptable? Changing the format or storage of logs is not good because of the need to read a large number of already created logs on live - maybe there is another option that I did not notice?

Answer the question

In order to leave comments, you need to log in

3 answer(s)
R
rumkin, 2014-03-07
@rumkin

If you solve the problem without changing the conditions, then you need to read the file in pieces, cut the pieces by '},{' and try to parse until the first smallest valid block is found, add the rest to the next iteration. The json_decode function does not throw any exceptions or error messages, so you can safely pass invalid data to it. This is the easiest and most effective way to solve the problem, without third-party solutions.
If you change it a little (in the case when there is no guarantee that the file will be formatted, as in the example), then between the log objects (or large enough blocks) I would insert a separator, like this:
},"--delimiter--",{Then I would read the file in pieces, split it by the separator and parse it with native json_decode. The separator needs to be made more universal, but that's another matter. This will be the closest to the standards solution.
In general, this option for storing logs combines all the shortcomings of the technologies used, incl. and php itself. So I advise you to avoid such decisions in the future - few of my colleagues will appreciate it and want to support it.

V
Vitaly, 2014-03-07
@xytop

Here is a ready lib: https://github.com/janeklb/JSONCharInputReader
Parses as it arrives and sends callbacks

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question