How to write or in what direction to dig an algorithm in C ++?

L

Leonid Fedotov2016-02-15 18:56:37

C++ / C#

Leonid Fedotov, 2016-02-15 18:56:37

There is a large array of data, and they can be stored in any form, both in a TXT file and in a MySQL database for speed. The data are minute-by-minute measurements and look like this:
02/15/2016/18:49 13
02/15/2016/18:48 10
02/15/2016/18:47 11
02/15/2016/18:46 9
02/15/2016/18:45 27
. ..and so on, sorted by recording date, in descending order.
It is necessary to find similar sequences from previously saved data, and in this case, the sequence is understood to be more than two records in a row, and the better the match, the better it is considered. The similarity should not be 100% match, but the similarity coefficient should be as large as possible.
Perhaps somewhat chaotically, written, ready to answer comments

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

ADRian, 2016-02-17
@ADR

Algorithm . In fact, the standard deviation is used there.
Only you have to write a loop like:

for count = минимальное количество данних для кореляции
    to максимальное количество данних для кореляции
  for i = 0 to length(DataArray) - count
    correlationList.append(caclCorrelation(DataArray, i, count)

To find the most similar sequence.
For optimization, you can throw away most of the data at the first stage. (aka the Monte Carlo method). Usually 2% is enough.
And at the second stage (when you already have the "correct" ranges), look at all the data.
One of the methods for reading the error level is to measure 2 times more accurately - the difference between the measurements will be an estimate of the error level.
i.e. you can make an algorithm like:

dataCountCoeficient = 0.001 // 0.1%
needAccuracy = 0.01 // 1%
oldResult = calcCorrelation(dataCountCoeficient / 2);
newResult = calcCorrelation(dataCountCoeficient)
while Abs(oldResult - newResult) / Max(Abs(OldResult), Abs(NewResult)) > needAccuracy do
  dataCountCoeficient *= 2
  oldResult = newResult
  newResult = calcCorrelation(dataCoutCoeficient)