Answer the question
In order to leave comments, you need to log in
Performance Scala vs Python
There is a Python script, its task is to parse large xml files - 1.5Gb or more. At some point, I ran into Python's performance. Python is not known to be a very fast language, but usually its speed is over the top.
I decided that I needed to rewrite the program in a faster language. I looked at all sorts of performance tests, judging by them, Scala is on average 10 times faster than Python.
Rewrote the program on the Rock. Scala turned out to be faster than CPython, but slower than PyPy.
Python 7 min 40 sec
PyPy 3 mm 58 sec
Scala 4 min 20 sec
The result surprised me a bit. This is my first program on the Rock. I have a program on the Rock in the form of a script, which I run like this
$ scala parser.scala
Will it run faster if compiled to .jar? Or maybe you can specify some compilation optimization options?
CONTINUED:
The speed of the program on the Rock did not suit me at all either. I went further - C ++. I wrote a parser without using any xml parsers and regular expressions. Only the standard library, the result is 40 seconds.
The result is cool, but the idea went further that if you also use only low-level string manipulation tools on the Rock, the result is 50 seconds. And of course, after that, I couldn’t help but go to Python and throw out all the regular expressions from the code.
CPython 4m12.204s
PyPy 2m47.724s
Scala 0m56.901s
C++ 0m46.801s
Answer the question
In order to leave comments, you need to log in
It seems to me a performance problem in the xml parser. As already asked above, are you using DOM or SAX? If performance / large volume of xml files is critical, then you should use SAX.
And to speed up the scala program, it makes sense to play around with the jvm parameters, for example, enable "AggressiveOpts":
Run like this:
$ JAVA_OPTS="-XX:+AggressiveOpts" scala parser.scala
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question