What libraries to use for parsing text files (Python)?

T

Teabrew-py2022-04-09 17:04:00

Python

Teabrew-py, 2022-04-09 17:04:00

What libraries to use for parsing text files and will the numba module help speed up this process?

I need to parse (take information from) a huge number of text files. What modules will help me with this? I heard about numba , this module allows you to speed up the process itself, but the question is: Will it help to take information from a .txt file faster ? Maybe there are special tools for such tasks?

Also briefly about the action of the program itself for a better understanding of my goal.

There are a bunch of text files, I need to take data from them and move everything into one file. After that, this file will be filtered from unnecessary information. In general, standard processing

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

D

Dr. Bacon, 2022-04-09
@bacon

Awesome description of the task, no specifics, then just use the standard methods and that's it. And for "and move everything into one file" even a PL is not needed, I usually do this at the OS level.

In general, standard processing

In, the slogan came up with: "standard processing" - standard methods.

S

shurshur, 2022-04-09
@shurshur

Let's say we strain ourselves and parse a million files not in an hour and a half, but in an hour, having spent three hours developing and finalizing the code. And why? What are we saving here? Usually, for one-time operations, the running time is taken care of when it is really large. For example, if a month of continuous work is required, then I would also consider whether it can be reduced to a week.
The numba library is needed to speed up self-written computational algorithms. In parsing text files, the bottleneck is likely to be purely I/O, so it won't help much here.

Z

Zerg89, 2022-04-10
@Zerg89

If you want to speed up something in python, look towards cython, well, the basics are that it takes a long time to read the text, the binary is fast, since no extra conversions are required