B
B
beduin012019-02-27 12:31:21
Python
beduin01, 2019-02-27 12:31:21

What is the correct way to use the multiprocessing module with PostgreSQL?

As a result, under pressure from the public, I decided to abandon SQLite even for storing links to files. Moved everything to PostgreSQL.
Adding threads to the application did not give any sense. As I was advised here, I decided to try the multiprocessing module to spread the processes across the cores. My vod looks like this:

def select_single_file_for_processing():
  #...
  sql = """UPDATE processing_files SET "isProcessing" = 'TRUE' WHERE "xml_name"='{0}'""".format(xml_name)
    cursor.execute(sql)
    conn.commit()  	

def worker():
    result = select_single_file_for_processing() # получаем файл для обработки    
    # ...
    # processing()

def main():
  # ....
    while unprocessed_xml_count != 0:

        checker_thread = threading.Thread(target=select_total_unpocessed_xml_count)
        checker_thread.start() # проверяем есть ли еще данные для обработки

        for i in range(10): # запускаем сами процессы
            t = Process(target=worker)
            t.start()

Unfortunately, I did not see any performance gain, on the contrary, the variant with processes began to work several times slower than the single-threaded variant. Where is the mistake?
How to organize access to the variable: unprocessed_xml_count correctly? And then I get every process pulling it and I'm sure it's not spelled correctly
. I tried to do a very stupid option in general. I have 1000 files in the db. Without any synchronization, I did this:
for x in range(1000):
            for i in range(3):
                t = Process(target=worker)
                t.start()
                t.join()

Type on three processes to start. But the speed is also wildly slow.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
Melkij, 2019-02-27
@melkij

decided to abandon SQLite even for storing file links

What for?
sqlite is an excellent most common database in the world (see android, firefox, chrome - sqlite everywhere)
I can hardly imagine sqlite on a terabyte scale, but on a scale of a couple of megabytes postgresql will be a clear overkill.
Well, on the topic - I'm not a pythonist and from the given code it's completely unclear to me what you are doing. Profile your application, look for where you are wasting time.

F
Fian, 2019-02-27
@Fian

Each of the processes will create a connection to the database. Creating a connection is a relatively expensive operation (due to latency) and may take more than the requests themselves in this case. Also, creating a new process is also an expensive operation. But, frankly, without profiling, answering this question is pointing your finger at the sky.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question