Answer the question
In order to leave comments, you need to log in
Why is Python many times inferior to Perl in terms of speed and memory consumption when parsing logs?
It became interesting to test python and pearl for speed when working with large files. For the test, 2 small scripts were written, the essence of the scripts is to read the log and create a hash in which the key is ip (the first field in the log line), and the value is all other requests from this ip.
The nginx log was used as a log, in which the fields are separated by :%%:.
The log file size is 1GB.
#!/usr/bin/perl -w
open(F, "</var/logs/access.log");
while (<F>) {
($ip, $d) = split(/:%%:/, $_, 2);
if (!(exists $host{$ip})) {
$host{$ip} = {};
$host{$ip}{data} = '';
}
$host{$ip}{data} .= $d;
}
#!/usr/bin/python3.2
fd = open('/var/logs/access.log', 'r')
host = {}
for line in fd:
ip, d = line.split(r':%%:', 1)
if ip not in host:
host[ip] = {}
#host[ip]['data'] = []
host[ip]['data'] = ''
#host[ip]['data'].append(d)
host[ip]['data'] = host[ip]['data'] + d
Answer the question
In order to leave comments, you need to log in
I'm a bit of a python expert, but I guess the main resources are wasted here:
host[ip]['data'] = ''
...
host[ip]['data'] = host[ip]['data'] + d
host[ip] = ''
...
host[ip] = host[ip] + d
you can try this with a list
from collections import defaultdict
host = defaultdict(list)
for line in open('1.24gb.log'):
ip, d = line.split(' ', 1)
host[ip].append(d)
As I understand it, in python, the operation of merging a string into a longer one is very expensive, so expensive that it needs to be abandoned. And it is not clear why the memory overflows in the end, and in pearl everything is ok - 20% (because 4 GB of RAM).
But using a list doesn't solve the problem either.
#!/usr/bin/python3.2
from collections import deque, defaultdict
host = defaultdict(deque)
with open('/var/logs/access.log', 'r') as f:
for line in f:
ip, d = line.split(r':%%:', 1)
host[ip].append(d)
Strings in python are immutable, so
a new string is created and copied on each call.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question