What is the best way to transfer millions of rows from one database to another using Pandas?

K

kukarekuu2019-10-10 20:45:06

Python

kukarekuu, 2019-10-10 20:45:06

There is a task - pouring several tables (~10 million rows each / ~ 20 columns) from MSSQL to Postgres. Right now I'm translating them with python+pandas.
I pour it in the standard way: I read from MSSQL (pd.read_sql()), and pour the resulting DataFrame into another database (pd.to_sql()). The method is working, but it works, in my opinion, for quite a long time - about an hour and a half to transfer 10 million records (MSSQL and Postgres servers are on different machines, but on the same network).
I thought in the direction of multithreading, and even implemented multithreading if you need to transfer from several CSVs to the database.
But in case of overflow from a DB in a DB does not leave to apply multithreading in any way.
Are there any libraries or methods in pandas that could speed things up?
Or, perhaps, someone has successfully implemented multithreading in such a task?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

K

Konstantin Tsvetkov, 2019-10-10
@kukarekuu

It is best to use "Import and Export SQL Server Data". Documentation .