How to parse logs with bash?

M

Morty Rick2020-01-20 21:14:53

linux

Morty Rick, 2020-01-20 21:14:53

How to parse logs with bash?

Good afternoon.
There was a need for a log of the form:

RESULT=xxxxx, TIME=2020-01-20 18:43:12, HOST=xxxxxxxxxxx, NAME=xxxxxxxx

so that the time is recorded in the TIME value in an absolutely moronic form - TIME = 18:43:12.000 +0700 Mon Jan 20 2020. Billing does not accept otherwise.
Quickly and straight in the forehead, I cod such a monster:

#!/bin/bash

date_prefix=`date -d  '1 hour ago' "+ %z %a %b %d %G"`

cat LOG.csv | while read line
do
date_setup=`echo ${line} | grep -o TIME=................... | awk -F" " -v var="${date_prefix}" {'print $2".000" var'}`
echo ${line} | sed "s/TIME=.................../TIME=$date_setup/g" >> LOG_new.csv
done

The log file of 1000 lines perekolbasil for 6 seconds. But in a battle there will be 100,000 logs - 10 minutes to wait for it all to convert, too long.
Tell me how else to approach this issue? Perhaps perl or python can handle this task faster.
Perhaps this "debility" format of times is generally some kind of standard and you can convert it with one command.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

S

Sergey Pankov, 2020-01-21
@trapwalker

Here such piece will work approximately with a speed of megabytes per second.
Measured like this:

yes "RESULT=xxxxx, TIME=2020-01-20 18:43:12, HOST=xxxxxxxxxxx, NAME=xxxxxxxx" \
  | pv \
  | py -x "', '.join(['='.join((k, datetime.datetime.strptime(v, '%Y-%m-%d %H:%M:%S').strftime('%H:%M:%S.000 +0700 %a %b %d %Y')) if k == 'TIME' else (k, v)) for k, v in ((kv.split('=') for kv in x.split(', ')))])" \
  > /dev/null

But an order of magnitude clearer and you can humanly correct the format.

K

Karpion, 2020-01-21
@Karpion

At you on each line of a broad gull - some programs/processes are started. And the output file is opened and closed every time. It is not surprising that this perversion slows down.
It should be something like this:
Remove cat. awk does everything.
The "grep -o TIME=..." function can be ported to awk, it has a nice tool for that.
Run "date" - also done in awk, parse the date manually.
Well, or at least remove ">> LOG_new.csv" from the loop - this is perfectly done from the outside, after "done"; in the worst case - it will be necessary to take this case into brackets.
You would give an example of a log, it would be easier. It would be nice to have an explanation.