U
U
un1t2013-02-07 13:29:13
Python
un1t, 2013-02-07 13:29:13

Performance Scala vs Python

There is a Python script, its task is to parse large xml files - 1.5Gb or more. At some point, I ran into Python's performance. Python is not known to be a very fast language, but usually its speed is over the top.
I decided that I needed to rewrite the program in a faster language. I looked at all sorts of performance tests, judging by them, Scala is on average 10 times faster than Python.

Rewrote the program on the Rock. Scala turned out to be faster than CPython, but slower than PyPy.
Python 7 min 40 sec
PyPy 3 mm 58 sec
Scala 4 min 20 sec

The result surprised me a bit. This is my first program on the Rock. I have a program on the Rock in the form of a script, which I run like this
$ scala parser.scala
Will it run faster if compiled to .jar? Or maybe you can specify some compilation optimization options?

CONTINUED:
The speed of the program on the Rock did not suit me at all either. I went further - C ++. I wrote a parser without using any xml parsers and regular expressions. Only the standard library, the result is 40 seconds.
The result is cool, but the idea went further that if you also use only low-level string manipulation tools on the Rock, the result is 50 seconds. And of course, after that, I couldn’t help but go to Python and throw out all the regular expressions from the code.

CPython 4m12.204s
PyPy 2m47.724s
Scala 0m56.901s
C++ 0m46.801s

Answer the question

In order to leave comments, you need to log in

1 answer(s)
I
ivnik, 2013-02-07
@ivnik

It seems to me a performance problem in the xml parser. As already asked above, are you using DOM or SAX? If performance / large volume of xml files is critical, then you should use SAX.
And to speed up the scala program, it makes sense to play around with the jvm parameters, for example, enable "AggressiveOpts":
Run like this:

$ JAVA_OPTS="-XX:+AggressiveOpts" scala parser.scala

PS Can you show the source code of the scala parser?
PPS and yet, to measure performance, it is desirable to “warm up” the jvm, the performance at the first start is much lower than after a while, because. the classes have not yet been loaded by the classloader and the hotspots have not been compiled to native code by the jit compiler. For a more accurate estimate, make an "infinite" loop in which you run the parser, and print the parser's running time to the console, then wait a while until this number stabilizes.
PPPS I hope you measure time inside the program (scala)?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question