Answer the question
In order to leave comments, you need to log in
What is the best way to optimize such actions with arrays?
I'm still new to c++. Solving an olympiad problem. I solved it first in python. Everything is fine, but the speed is depressing. According to calculations, about two or three hours will be decided.
In short, the essence of the problem is that there is a file in which 100,000 words with errors are written per line, and there is a dictionary for 50,000 lines in which all these words are in the correct version. In response, you need to give a file in which, for each incorrectly written word, its analogue from the dictionary is selected (of course, the correct one).
The algorithm is optimized in terms of logic, but I decided to rewrite it in c ++ and faced a depressing loss in speed.
For example, only data preparation took about a second in c++, and the execution of the entire algorithm in python on 10 lines took 800ms.
I tried to fill different types of arrays, anyway, above 1 second for just filling an array of 100,000 rows could not jump out.
It is assumed that in the future data structures will be stored in the form of dictionaries, in which the key is a number - the length of the strings, and the value is an array of strings of such length. In theory, it will be used map<int, vector<string>>
, but will it not kill the speed of the crosses to zero?
How are such tasks solved, is python really going to be faster?
Answer the question
In order to leave comments, you need to log in
First, there may be an input problem. cin and cout are slow with large amounts of data. Or read through scanf, or you can disable synchronization with stdio.
Secondly, the data structure described by you will not kill the performance, but it is not clear to me how it will help in the task.
You will definitely need maps from a string to something else. In python, you probably used dictionaries (you used strings as a key in an array), and this is what it will be in c++. You can experiment std::unordered_map can be faster than std::map. In general, you can get a particularly fast solution using the data structure boron (aka trie). True, you will have to write it yourself.
What is the exact definition of the wrong word and how to define the right one?
What makes a word more incorrect than the absence of a letter? permutation? substitution? which one is more wrong? is there any difference in which position of the word the error occurred, in the first character or the rest?
For example, a list of words without errors:
sea
sea
march
Mar is sea or March?
mare - is it March or the sea?
so - this
Read files at once in large blocks (up to the entire file at once). For large blocks, you can use it with a pre-set vector size ( )
Then manually divide the read block into lines, apparently by replacing it with 0.
Immediately add all pointers to the found lines into the data structure used in the algorithm.
Don't use std::string because it reallocates memory for each sneeze, this leads to re-allocation of the same amount of memory but cut into small pieces and additional copying of lines. Use std::string_view (available in C++17) or raw C-strings in general as the fastest option.std::vector<char>
std::vector::reserve()
In general, all plus arrays (vector, array, "raw" dynamic and static arrays) work equally fast if we consider the operation of accessing an array element by index. But in a vector, many other operations can result in memory reallocation and array copying. In raw dynamic arrays, you cannot simply change the size of the array, this must be done explicitly using the realloc call, and therefore here you explicitly control this operation, but in vectors (as well as strings) this happens implicitly, so often developers do not attach it values, while calls to the memory manager are quite expensive in terms of performance.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question