Answer the question
In order to leave comments, you need to log in
How to work with Postgresql -> BigQuery?
And so, there is some application in Python that collects data into a Postgresql database. There is not much data (about a gigabyte = a million records) and there would be no problem if some data could not become outdated - that is, they need to be updated sometimes. The client wants to use BigQuery for data analysis. Apparently, he has some other data with which he wants to compare these data or work with them.
What is the best way to update data in the database and then in BigQuery?
I have a few options:
1. Python collects new data during the day, updates the data that it needs to update, at the time of completion of its work, the data is unloaded from postgresql to csv or json, then loaded into BigQuery through the google-cloud-bigquery library, the main table loaded the table is renamed.
Obvious disadvantages are the lack of transactions in Bigquery.
2. Python collects new data throughout the day, updates the data that it needs to update. Once or twice a day, data is uploaded from Postgresql to csv or json ... and so on, like the previous option, the minuses are the same, but there is another minus that the upload may coincide with the moment the script is running.
3. Python collects new data throughout the day, updates the data that it needs to update. Upon request to the API (roughly speaking, by clicking on the button), the data is unloaded from postgresql to csv ... and so on as before. The disadvantages are the same, but the narrow-mindedness of the client is added, which gets the opportunity to poke a button once a second, from which nothing good will come of it.
If the data did not need to be updated, everything would be obvious, but BigQuery is not designed to update data - such requests (UPDATE) can be made only 1000 per day, and this is quite small, because if at least once we update more than a thousand rows from million (and this is 0.1%), there is no point in working with data like that.
Maybe someone has already solved this problem? What are some interesting libraries for working with BigQuery other than google-cloud-bigquery?
Answer the question
In order to leave comments, you need to log in
If the data did not need to be updated, everything would be obvious, but BigQuery is not designed to update data - such requests (UPDATE) can be made only 1000 per day, and this is quite small, because if at least once we update more than a thousand rows from million (and this is 0.1%), there is no point in working with data like that.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question