D
D
denislysenko2022-03-22 13:04:44
SQL
denislysenko, 2022-03-22 13:04:44

By which columns to filter data that has already been loaded and load only new data?

I need to implement incremental data loading.

example:
A piece of the table looks like this

select * from order_states limit 20

+--------+-------------+-------------------+-------+-------------------+------+-----------+--------+-------+------------+
|order_id|ext_source_id|                 dt|  state|                 ts|msk_dt|api_user_id|state_id|comment|ext_state_id|
+--------+-------------+-------------------+-------+-------------------+------+-----------+--------+-------+------------+
|     919|            1|2021-01-29 16:48:27|   PROC|2021-02-12 00:12:20|  null|       null|       7|   null|          72|
|     920|            1|2021-01-29 16:48:30|PRINTED|2021-02-12 00:12:20|  null|       null|       1|   null|          73|
|     920|            1|2021-01-29 16:48:30|   DONE|2021-02-12 00:12:20|  null|       null|       1|   null|          74|
|     921|            1|2021-01-29 14:48:30|   SAVE|2021-02-12 00:12:20|  null|       null|       6|   null|          75|
|     921|            1|2021-01-29 14:48:30|   PROC|2021-02-12 00:12:20|  null|       null|       7|   null|          72|
|     922|            1|2021-01-29 14:48:00|    NEW|2021-02-12 00:12:20|  null|       null|       9|   null|          76|
|     923|            1|2021-01-29 14:48:31|PRINTED|2021-02-12 00:12:20|  null|       null|       1|   null|          73|
|     923|            1|2021-01-29 14:48:31|   DONE|2021-02-12 00:12:20|  null|       null|       1|   null|          74|
|     924|            1|2021-01-29 14:48:21|   SAVE|2021-02-12 00:12:20|  null|       null|       6|   null|          75|
|     924|            1|2021-01-29 14:48:22|   PROC|2021-02-12 00:12:20|  null|       null|       7|   null|          72|
|     925|            1|2021-01-29 14:48:52|PRINTED|2021-02-12 00:12:20|  null|       null|       1|   null|          73|
|     926|            1|2021-01-29 14:48:04|    NEW|2021-02-12 00:12:20|  null|       null|       9|   null|          76|
|     927|            1|2021-01-29 14:48:23|   SAVE|2021-02-12 00:12:20|  null|       null|       6|   null|          75|
|     925|            1|2021-01-29 14:48:52|   DONE|2021-02-12 00:12:20|  null|       null|       1|   null|          74|
|     928|            1|2021-01-29 14:48:04|    NEW|2021-02-12 00:12:20|  null|       null|       9|   null|          76|
|     927|            1|2021-01-29 14:48:24|   PROC|2021-02-12 00:12:20|  null|       null|       7|   null|          72|
|     929|            1|2021-01-29 13:48:39|PRINTED|2021-02-12 00:12:20|  null|       null|       1|   null|          73|
|     930|            1|2021-01-29 14:48:24|PRINTED|2021-02-12 00:12:20|  null|       null|       1|   null|          73|
|     929|            1|2021-01-29 13:48:39|   DONE|2021-02-12 00:12:20|  null|       null|       1|   null|          74|
|     931|            1|2021-01-29 14:48:04|PRINTED|2021-02-12 00:12:20|  null|       null|       1|   null|          73|
+--------+-------------+-------------------+-------+-------------------+------+-----------+--------+-------+------------+


as a waterpoint, I first chose the order_id column
and created a separate table where I stored the value of the waterpoint and displayed the missing data in this way:
select * from order_states where order_id > {значение вотерпоинта, например 1040}


But it seems to me that, judging by the data, this is not the most optimal way to filter the missing values.
Which columns are better to filter the missing values ​​so as not to lose any data during additional loading? By dt and order_id at the same time?

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question