What is the best way to save data?

S

synapse_people2018-05-22 14:30:54

Programming

synapse_people, 2018-05-22 14:30:54

Imagine that there is an array of 8 elements (may be more / less), each element is a string of arbitrary length.
How can it be written to a file in such a way that in the future it would be possible, say, to remove the element with index 4. Or change the line that is contained there.
We need an option that will provide the fastest possible reading and reliable writing (in terms of no data loss, albeit slow) ...
As I see it:
there will be 2 files, the first one is the data itself, and the second file is a mapping that contains information about what offset the line in the file starts with and what size it has.
There are also plans to split the data file into blocks of 16 bytes and store only the record ID and blocks in which it is recorded in the second file.
That is, it turns out 1 data file: db.dat and db.map
The first is marked into blocks of 16 bytes
The second file contains recordId + blockOffsets
Next, the record ID is subtracted and the necessary block offsets are selected, then fopen (db.dat) and these blocks are subtracted and the string is concatenated into the original string. When overwriting, blocks are demolished and allocated on a new one ... It may be necessary to save empty blocks in the file ... With this option, it turns out that if the ID is known in advance, the reading speed will be normal, but if you read all the lines at once, it will be a disaster .. And especially if they are removed from the database .. then the blocks will remain empty, and this affects the file size ... In general, help, please
) Do not offer ready-made options (SQLite), please, you need your own bike.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

A

Alex McArrow, 2018-05-22
@AlexMcArrow

Proceeding from the task:
You store KEY - VALUE. Neither KEY (data type and format - string or number) nor VALUE (its size)
are known to us in advance (in my head they are called like that) from the hash and add / create a file with the same name (this will be our key index)
2) write VALUE in the converted format (base64 or something like that) into the data file in the line corresponding to the next index.
The following stuffing is obtained :
--------------
Record creation:
We accept that the database is empty.
KEY = test
VALUE = datastring
HASH = sha256(KEY) = 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
OCTET = HASH[1-6] = 9f86d0 (three pairs)
index file = [OCTET].map (mapper)
write sequence file = [OCTET].pnt (pointer)
data file = [OCTET].dat (data)
Read: PNT (sequential pointer - line number in the data file) = read( [ OCTET ] .pnt
)+1 = 1 .dat[PNT] = VALUE (preliminarily convert - base64 or otherwise) = Write to a string equal to PNT in the file [OCTET].dat - this immediately gives both writing and updating data ------------- - Read entry: KEY = test
HASH = sha256(KEY) = 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
OCTET = HASH[1-6] = 9f86d0 (three pairs)
index file = [OCTET].map (mapper)
data file = [OCTET].dat (data)
]. map and look for HASH = get string pointer in data file = PNT
Read file [OCTET].dat[PNT] (read PNT string) = VALUE (convert to original format)
=============
I think , even if my idea is nonsense, it will at least give you food for thought.
PS\ as a better option - to store each block of data in a separate file with the name = hash from the key.

V

Vitaly, 2018-05-22
@rim89

json? where instead of deleting just change the status

R

res2001, 2018-05-22
@res2001

It is possible in one file.
Write blocks of a certain size to the file (for example, 2-4 MB). Write to the block:
1.byte of record flags (record deletion flag or block end sign)
2.Fixed-size record ID 3.Fixed-size
subsequent string length
4.Data string
until the end of the block, at the end of the block in step 1 set the end sign block and the offset of the next block of a fixed size.
You can make it so that the block is filled to capacity, and the remnants of the last record that did not fit into the block can be transferred to another block, you can leave empty pieces of blocks. In addition, this mechanism will be needed if 1 record can be larger than 1 block.
When adding a record - add to the end of the last block (you can also search for a block with the required amount of free space at the end), if there is not enough space, add a new block.
When deleting a record, set the deletion flag.
When changing a record, delete the existing one and add a new one.
Why you need to work with large blocks - it's much faster to read / write - the whole block at once, even if you need one record from it.
In addition to the basic functionality, it is necessary to provide for a compression operation that would physically delete deleted records.
For a quick search, you need an index file. The index contains a sorted list of IDs and the block address and offset in the record block.
You can also combine the index and data into one file.
To do this, you need to provide a file header containing the offset of the first data block and the offset of the first index block.
In the header, you can keep offsets not only of the first blocks, but also make a table of data blocks and a table of index blocks, of course, of a fixed size. At the end of the header, offset to the block of the next header.
At the start, you will need to read the index completely, and the data as needed.
When you change the record, you change the block in memory and overwrite it entirely at the same offset where it was in the file.
In the block, you can provide a block header in which you can write service information, for example, it would be useful to write the amount of free space at the end of the block there.
There might be something else.
PS: don't reinvent the wheel, take a ready-made DBMS.