D
D
Denis2021-02-18 13:52:44
PostgreSQL
Denis, 2021-02-18 13:52:44

PostgreSQL: inserting multiple (million) rows, how to see all rows that violate constrants?

UPD, I know how to select ONLY rows that do not violate constraints. I need to get all the errors, if possible

UPD2 after the first error, I'm no longer interested in what data will be inserted, it can be none, I need to know all the reasons why they will not be inserted. I understand that this is a bit of a theoretical exercise and this cannot be done in a decent DBMS. But perhaps there is still a tricky hack.


We transfer data from one database to another. When inserting rows with the "insert from select" command, several rows violate some kind of constraint, usually a FOREIGN KEY rule. As if when inserting, see not the first line that caused the error, but everything. for example

create table t_source (id int, ref int);
insert into t_source (id,ref) values (1,1),(1,2),(1,3),(1,4),(1,5);

create table t_dict(id int primary KEY, name varchar);
insert into t_dict (id,name) values (2,'two'),(3,'three');

create table t_target (
 id int, ref int,
 FOREIGN KEY (ref) REFERENCES t_dict(id)
);

insert into t_target (id,ref) select id,ref from t_source;


issues
SQL Error [23503]: ERROR: insert or update on table "t_target" violates foreign key constraint "t_target_ref_fkey"
Detail: Key (ref)=(1) is not present in table "t_dict".


And how would I see the lines 1,4,5
I know about where not in, but it doesn’t fit because the real query for rows is 150, 3 TB data, many different constraints for different data can be violated and I don’t need to ignore them, and it is necessary to send special people to fix in the source database. Now we do it with the help of the where not in series and repeated queries that select the difference for us, but this is long and expensive, but if we could see all the errors at once, this would greatly speed up the work.

Those. I really need to see all the errors, not find a workaround. And I also know about the cursor, but it doesn’t suit me. much slower with it.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
M
Melkij, 2021-02-18
@melkij

Add to insert ... select data validation yourself.

where exists(select from t_dict where t_dict.id = t_source.ref)

Then take, respectively, not exists list and go with this list to kick "where data"
Ignore the error of at least one line will be wildly expensive in terms of resources.

G
gram2005, 2021-02-25
@gram2005

SELECT t_source.ref FROM t_source
LEFT JOIN t_dict ON t_source.ref = t_dict.id
WHERE t_dict.id IS NULL;

t_source.ref must be indexed.

I
Immortal_pony, 2017-10-11
@IceDevil

Answer to the current question:
Something like this:

var objectList = [
    {id:40, prevId:22},
    {id:22, prevId:52},
    {id:4364, prevId:40},
    {id:4, prevId:'none'},
    {id:52, prevId:4}
];

function sortSequential(objectList, coupleFrom, coupleTo, minValue) {
    var sorted = [];
    var couplingFound = true;
    var couplingValue = minValue;
    
    while(couplingFound) {
        couplingFound = false;
        
        for (var i=0; i<objectList.length; i++) {
            if (objectList[i][coupleTo] === couplingValue) {
                sorted.push(objectList[i]);
                couplingValue = objectList[i][coupleFrom];
                couplingFound = true;
                break;
            }
        }
    }
    
    return sorted;
}

objectList = sortSequential(objectList, 'id', 'prevId', 'none');

I suspect that we are talking about an array with objects.
If this is not the case, then specify the structure of your object in more detail. Preferably with an example.
As for sorting an array of objects, the sort function (which is available for any array) takes a user-defined sorting function as an argument ( documentation ).
Usage example:
var objectList = [
    {id:1, parentId:2},
    {id:2, parentId:4},
    {id:3, parentId:1},
    {id:4, parentId:'none'}
];

objectList.sort(function(someObject, otherObject) { 
    if (someObject.parentId === 'none') {
        return -1;
    }
    
    return someObject.parentId >= otherObject.parentId ? -1 : 1;
});

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question