Answer the question
In order to leave comments, you need to log in
Is it possible to run pymystem3 in docker using multiprocessing?
Hey!
I have a task: I need to lemmatize a large amount of text while in a docker container.
I'm trying to do this using multiprocessing, but I get an error:
OSError: [Errno 26] Text file busy: '/root/.local/bin/mystem'
At the same time, the code is processed locally on the computer normally.
Code itself:
script.py
import multiprocessing
import time
from pymystem3 import Mystem
def lemma(data):
process_n = data[0]
texts = data[1]
print("process_n", process_n, "Создали Mystem()")
m = Mystem()
lemma_text = m.lemmatize(' '.join(texts))
m.close()
print("process_n", process_n, "Закрыли Mystem()")
return lemma_text
def lemmatisation_text():
n_core = 4
texts = ['Мама моет раму, Рама держит маму.' for x in range(1000)]
# Добавим номера процессов
params = [[core, texts] for core in range(n_core)]
print('pool start')
pool = multiprocessing.Pool(n_core)
print('pool map')
res = pool.map(lemma, params)
print('pool close')
pool.close()
print('pool join')
pool.join()
print(len(res))
if __name__ == '__main__':
start = time.time()
lemmatisation_text()
print(time.time() - start)
FROM python:3.8-buster
WORKDIR /usr/src/app
COPY . .
RUN python3.8 -m pip install --upgrade pip
RUN python3.8 -m pip install --no-cache-dir pymystem3
CMD ["python3.8", "script.py"]
Answer the question
In order to leave comments, you need to log in
Try to run docker with 2 or more cpu, most likely it does not use other kernels for you - or there are none))
docker run --cpusets-cpus 0-4
Put the desired binary on the path /.local/bin when building the container, it should help
Download links
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question