Answer the question
In order to leave comments, you need to log in
What is the correct way to use the multiprocessing module with PostgreSQL?
As a result, under pressure from the public, I decided to abandon SQLite even for storing links to files. Moved everything to PostgreSQL.
Adding threads to the application did not give any sense. As I was advised here, I decided to try the multiprocessing module to spread the processes across the cores. My vod looks like this:
def select_single_file_for_processing():
#...
sql = """UPDATE processing_files SET "isProcessing" = 'TRUE' WHERE "xml_name"='{0}'""".format(xml_name)
cursor.execute(sql)
conn.commit()
def worker():
result = select_single_file_for_processing() # получаем файл для обработки
# ...
# processing()
def main():
# ....
while unprocessed_xml_count != 0:
checker_thread = threading.Thread(target=select_total_unpocessed_xml_count)
checker_thread.start() # проверяем есть ли еще данные для обработки
for i in range(10): # запускаем сами процессы
t = Process(target=worker)
t.start()
for x in range(1000):
for i in range(3):
t = Process(target=worker)
t.start()
t.join()
Answer the question
In order to leave comments, you need to log in
decided to abandon SQLite even for storing file links
Each of the processes will create a connection to the database. Creating a connection is a relatively expensive operation (due to latency) and may take more than the requests themselves in this case. Also, creating a new process is also an expensive operation. But, frankly, without profiling, answering this question is pointing your finger at the sky.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question