Z
Z
zdravnik2016-07-26 18:27:58
Regular Expressions
zdravnik, 2016-07-26 18:27:58

How to splice lines in logstash that are out of order?

There are multi-line postgresql logs of the following form:

Jul 22 17:03:27 my.host example.com[24977]: [137-1] 2016-07-22 17:03:27.339 MSK User: username Database: my_db Host: 192.168.0.52(38494) Proc ID: 24977 etc1
Jul 22 17:03:27 my.host example.com[24977]: [137-2] 2016-07-22 17:03:27.339 MSK User: username Database: my_db Host: 192.168.0.52(38494) Proc ID: 24977 etc2
Jul 22 17:03:27 my.host example.com[24597]: [2953-1] 2016-07-22 17:03:27.339 MSK User: username Database: my_db Host: 192.168.0.52(38053 ) Proc ID: 24597 etc
Jul 22 17:03:27 my.host example.com[3637]: [3779-1] 2016-07-22 17:03:27.340 MSK User: username Database: my_db Host: 192.168.0.52 (17809) Proc ID: 3637 etc
Jul 22 17:03:27 my.host example.com[24977]: [138-1] 2016-07-22 17:03:27.339 MSK User: username Database: my_db Host: 192.168.0.52(38494) Proc ID: 24977 etc1
Jul 22 17:03:27 my.host example.com[3637]: [3780-1] 2016-07-22 17:03:27.340 MSK User: username Database: my_db Host: 192.168.0.52(17809) Proc ID: 3637 etc
Jul 22 17:03:27 my.host example.com[24977]: [138-2] 2016-07-22 17:03:27.339 MSK User: username Database: my_db Host: 192.168.0.52(38494 ) Proc ID: 24977 etc2
Jul 22 17:03:27 my.host example.com[24977]: [139-1] 2016-07-22 17:03:27.340 MSK User: username Database: my_db Host: 192.168.0.52 (38494) Proc ID: 24977 etc
Jul 22 17:03:27 my.host example.com[24597]: [2954-1] 2016-07-22 17:03:27.340 MSK User: username Database: my_db Host: 192.168.0.52(38053) Proc ID: 24597 etc1
Jul 22 17:03:27 my.host example.com[24597]: [2954-2] #011 SELECT count(*) FROM table#015

To further parse logs using grok, you need to glue them together using logstash. The sticker should look like
line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4 : ...[3780-1]...
line 5: ...[138-1] and [138-2]...
line 6: ...[139-1]...
line 7: . ..[2954-1] and [2954-2]...

the order of the lines is not important in principle, because the result is still tied to time, it is important that the lines with labels [x-1], [x-2], [x-3], etc. were collected in one line [x-1] [x-2] [x-3].
All the clues you can rely on are a label with a line number (eg [139-1], etc.) and a process pid in square brackets after the hostname (eg [24977]). Other variables as a support for gluing are not suitable because they do not guarantee that the lines will not be mixed up, only pid guarantees that there will be no confusion within one pid, and the line number itself also gives a guarantee.
As I understand it, codec multiline and / or combinations using the if statement, etc. are suitable for solving the problem. Unfortunately, I tried a bunch of options over the past couple of days, but I have not found an answer yet.
Unfortunately, the options
multiline {
pattern => "... \[\d+-1\]"
negate => true
what => "previous"
}

They do not roll because in this case the gluing is mixed, i.e. at the output I get
line 1: ...[137-1] and [137-2]...
line 2: ...[2953-1]...
line 3: ...[3779-1]...
line 4 : ...[138-1]...
line 5: ...[3780-1] and [138-2]...
line 6: ...[139-1]...
line 7: . ..[2954-1]...

Help out colleagues.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
Z
zdravnik, 2016-08-11
@zdravnik

Here is the solution to my problem:

grok {
match => [ "message", "%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST:logsource} %{SYSLOGPROG}: \[%{INT:line}-%{INT:part_of_line}\] %{GREEDYDATA:ostatok }" ]
}
aggregate {
task_id => "%{line}%{pid}"
code => "
map.merge!(event) if map.empty?
map['full_message'] ||= ''
map['full_message '] += event['ostatok']
"
timeout => 10
push_map_as_event_on_timeout => true
timeout_code => "event.tag('aggregated')"
}
if "aggregated"not in [tags] {
drop {}
}

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question