Answer the question
In order to leave comments, you need to log in
How to read files in python with multiple threads? How to speed up the following code?
In the text column, instead of text, there are paths to files with text, I want to rewrite all the texts in csv, but there are a lot of files. The code is processed longer than a day and still only 70%. Writes that about another 12 hours to wait. I really want to speed it up, but I have no idea what can be done with it ...
c = 0
for i, row in tqdm(df_new.iterrows(), total=df_new.shape[0]):
if "texts" not in row.text:
continue
c += 1
with open("../../" + row.text, "r") as f:
text = f.readline()
df_new.loc[i, "text"] = text.replace("\n", "")
if c % 10000 == 0:
df_new.to_csv("df_all.csv", index=False)
df_new.to_csv("df_all.csv", index=False)
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question