D
D
DamskiyUgodnik2020-08-15 03:08:23
PostgreSQL
DamskiyUgodnik, 2020-08-15 03:08:23

Is bulk insert with uniqueness check possible in PostgreSql?

Hello!
Task:

  • Parse a large number of csv files with data, and upload everything to PostgreSql.

Additional terms:
  • Parser in python (csv, psycopg2)
  • Table structure - primary key, text field, numeric fields (about 10 pieces)
  • Text field must be unique
  • The text field has an average length of about 100 characters
  • There will be selections for numeric fields (indexes are needed)
  • Estimated data volume ~ 2.5-3 billion rows
  • At the time of writing the data, there will be no readings from the table (i.e., the data will be periodically uploaded in batches and then it is planned to build reports based on the updated data)

What I tried:
  • Now I made a decision on the forehead, checking for availability through SELECT and INSERT (filled in one record at a time) if necessary, because the solution is single-threaded, then this is quite enough from the point of view of logic (just in case there is an index with a uniqueness constraint).
  • I tried to do it without a select with a "rollback", I didn't notice much difference in speed, only a "goof" appears with the logging settings for queries that "rolled back" (maybe I just don't have enough experience in properly setting up postgres).

Problems:
  • After inserting ~50 million records, performance degrades greatly

Ideas:
  • Try a group insert with uniqueness control through an index, but it's not clear how this can be done at all? after all, for example, if we insert 50 terms and one of them is a double, the entire request is "rolled back".

Actually a question - how to make it quickly? There is a suspicion that the task is quite simple and often encountered, well, smart people have already come up with an elegant solution.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Sergey Gornostaev, 2020-08-15
@DamskiyUgodnik

upsert solves your problem.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question