E
E
EAVol2015-02-28 14:01:14
Programming
EAVol, 2015-02-28 14:01:14

How to correctly implement external sorting by natural merge with a threshold?

Hello, forum users! I was given the following task: to implement external sorting by natural merge. There is an approximate start of the program: randomly generated, for example, 10K structures with 5 fields (2 string and 3 numeric). Then, the resulting array is written to a file (test.txt). Next, a random number up to 1000 is generated - let's call it Z. Then you need to rewrite Z lines of the main file (test.txt) into another file (each next file is new), after sorting in ascending order. Number generation, sorting and writing are repeated until we rewrite all 10K structures. Then you need to start the process of merging files. Everything seems to be clear with Merge - a simple merging of 2 arrays. But I can't figure out how to describe Split. Tell me please.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
microphone, 2015-02-28
@microphone

and sorted in ascending order of the structure or string, or array elements inside the strings?
dimensions set?
if sorting, then the AVL tree is most likely suitable.

M
Mrrl, 2015-02-28
@Mrl

It is not very clear why Z is here. If 1000 structures fit in memory, and we can sort them, then it would seem that we read 1000 structures from the file 10 times (in order, in one pass), each time we sort the structures and in sorted order put in a new file. There are 10 files in total. Further - usual merge. By the way, you can merge all 10 files at once if they can be opened at the same time. If not, we merge the two smallest files until one remains (it is advantageous that the merged files are approximately the same size - like groups of characters in the Huffman code).
Perhaps Z was invented so that the files were not exactly the same, so that the algorithm did not rely on the same size. But then it would be better, for example, "a random number from 500 to 1000". In any case, if you do not take the maximum, then the algorithm only gets worse.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question