What can be used to store data on a computer with fast access?

U

up72019-12-09 15:51:22

Data storage

up7, 2019-12-09 15:51:22

It is necessary to organize the storage and retrieval of large amounts of data on a computer. Chose sqlite, but it works very slowly.
Essentially, data is a set of rows. Is it worth installing a DBMS like MySQL? There will be hundreds of millions of lines.
I will clarify. The base will be from short lines, then it will be replenished with them. Replenishment only unique, not duplicates.
In general, even puff with a muscle was disappointing (I needed the fastest merge with the removal of duplicates.
The winners were ordinary text files and hash tables (for eliminating duplicates).

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

M

mayton2019, 2019-12-09
@mayton2019

SQLite is one of the fast DBMS. If something slowly works for you that probably you so wrote request. Or built such a set of indexes. Or you really do not have enough SQLite capabilities. As far as I remember, some options were not supported there, such as CONNECT BY PRIOR, and possibly window and analytical functions too. In general, you should not immediately scold the instrument if you do not understand exactly what you are missing. Ato you will pass to OracleXE and there will be even more slowly.

X

xmoonlight, 2019-12-09
@xmoonlight

On files - you can, but you need to know how to properly work with concurrent / parallel read and write streams through a file descriptor.
(I think that you can figure it out here)
Next - about the logic of the database itself.
Sort the list before adding so that the shortest unique combinations of characters are at the very top, and the longest are just below.
At the very bottom of the list - the most repeating combinations of one character, then the 2nd, etc., and at the end - for consecutive combinations of repeating characters.
Create bigram-trigram clusters on repeated combinations.
When checking for a duplicate, you go down deep into the cluster "tree" (this is the index map of your data), using the file offset of the nodes of the "tree" (from node to node) and get an instant verdict: there is / is not a combination of characters to be checked (for example, a unique word or hash string) in the database.