D
D
Dmitry2015-09-11 11:07:17
Java
Dmitry, 2015-09-11 11:07:17

How to quickly compare two lists?

There is:
- sqlite table with two fields: UID (unique string) and HASH (number)
- list of elements, also with two fields: UID and HASH.
It is necessary to:
- remove elements from the table that are not in the list (or get their list).
- get a list of UIDs of those elements that are in the list, but not in the table or that are in the table, but with a different HASH field.
It doesn't matter using SQL or java, the main thing is more or less optimal. There can be more than one hundred thousand elements in both lists.
This is necessary in order to keep the list of elements on the client up to date, updating only those elements that have changed on the server and deleting those that, respectively, have been deleted.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
Artur Polozov, 2015-09-11
@MadRogue

As I see it:
the task is to quickly find the difference between two large arrays (the absence or difference of a part of the data).
we had the task of comparing two matrices of 800k-1k records, finding the difference - what changed, or deleted, or added ...
the best method in terms of speed came out - union with grouping, at the output we get only the difference and we are already analyzing it. Those. out of 800kk, only 1k-2k are changed - we are already analyzing them.

SELECT 
        UserId, [Resource] = ResourceId, table_num,
        [Action] = CASE WHEN table_num = 1 THEN  ...  -- table_num тут индикатор из какого массива запись.
        FROM (        
            SELECT UserId, ResourceId, AssignmentId,table_num = min(table_num)  
            FROM (
                    SELECT DISTINCT UserId = M2.UserId    
                                    , ResourceId = ISNULL(M2.ResourceId,'00000000-0000-0000-0000-000000000000') 
                                    , AssignmentId = ISNULL(M2.AssignmentId, '00000000-0000-0000-0000-000000000000')
                            ,1 [table_num] from #newMatrix M2
                    UNION  
                    SELECT DISTINCT UserId = M2.UserId    
                                    , ResourceId = ISNULL(M2.ResourceId,'00000000-0000-0000-0000-000000000000') 
                                    , AssignmentId = ISNULL(M2.AssignmentId, '00000000-0000-0000-0000-000000000000')
                            ,2 from #oldMatrix M2
                    ) a
            GROUP BY UserId, ResourceId, AssignmentId
            HAVING COUNT(*) <> 2  -- одинаковые записи в обеих таблицах, останется только разница.
        ) S
<code>
</code>

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question