L
L
LakeForest2021-12-15 01:37:35
Python
LakeForest, 2021-12-15 01:37:35

How to read files in python with multiple threads? How to speed up the following code?

In the text column, instead of text, there are paths to files with text, I want to rewrite all the texts in csv, but there are a lot of files. The code is processed longer than a day and still only 70%. Writes that about another 12 hours to wait. I really want to speed it up, but I have no idea what can be done with it ...

c = 0
for i, row in tqdm(df_new.iterrows(), total=df_new.shape[0]):
    if "texts" not in row.text:
        continue
    c += 1
    with open("../../" + row.text, "r") as f:
        text = f.readline()
        df_new.loc[i, "text"] = text.replace("\n", "")
        if c % 10000 == 0:
            df_new.to_csv("df_all.csv", index=False)
df_new.to_csv("df_all.csv", index=False)

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
Vindicar, 2021-12-15
@Vindicar

Several threads are unlikely to help here due to the nature of python.
Multiple processes - possible (multiprocessing module).
But in general, for starters, you should make sure that the plug is based on the CPU, and not on the performance of the disk.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question