What is an "asymptotically exact estimate of the running time of an algorithm"?

A

Artem Parfenov2014-08-18 15:52:40

Algorithms

Artem Parfenov, 2014-08-18 15:52:40

Hello to all granite gnawing. I analyze Kormen "Algorithms. Construction and analysis". For a deeper understanding, I look through materials from other sources - some points are presented in a more primitive way and settle faster. I came across an article algolist.manual.ru/misc/o_n.php . I immediately got hooked on determining the exact estimate Θ():

The estimate Θ() exists only when O() and Ω() are the same and equal to them.

So, O() is the asymptotic evaluation of the algorithm on the worst input data, Ω() is on the best input data, Θ() is shorthand for the same O() and Ω().

two dubious points. Even though they sounded obviously wrong, I realized that I was confused and did not grasp the concept of an exact estimate ( Θ() ).
Kormen evasively says that sometimes giving an exact estimate (estimating the average running time of an algorithm) is problematic due to the fact that it is not always obvious which input data for a given problem will be averaged . He also says that an exact estimate is a set of functions enclosed between constant deviations from the estimated function. But as far as I understand, these boundaries do not exist.upper bound O() and lower bound Ω(). These are just constants that limit the set Θ(). The estimator Θ() can be used to obtain the estimators O() and Ω() and vice versa, while the following is true - Ω() belongs to Θ() belongs to O(). Here, obviously, I'm sailing because the wiki broadcasts the following:

Θ() gives both upper and lower bounds for the growth of the function

It is known that, for example, for sorting qsort, the average estimate for a random distribution of input data (it is also the best, for a fully balanced version) is Θ(nlogn), while the upper estimate (for specially selected non-optimal data) is O(n^2).
It is also known that insertion sort yields Ω(n) at best (for a presorted set), while average and worst scores are Θ(n^2) and O(n^2).
Cormen, if again I understand correctly, says that in practice, having upper and lower estimates, they get an exact estimate, making more strict / soft assumptions - this is academic.
Would it be correct to say that a really asymptotically exact estimate of an algorithm is given first of all on the basis of the peculiarities of the operation of a particular algorithm foraveraged input data (understanding the averaged data as a randomly distributed array of data), and in complex cases - based on the estimates from above O() and from below Ω()? Are the quoted statements from the article wrong?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

T

throughtheether, 2014-08-18
@throughtheether

My authoritative amateur opinion is as follows.
Firstly, it makes sense to read the primary source on this topic, namely the articleDonald Knuth. On page 19, it gives a convenient, in the author's opinion, definition of the relations Θ,O,Ω. These ratios are initially given as ratios of the values of some two functions. Estimating time and space complexity are applications. The purpose of introducing such a notation was to simplify the calculation of the number of operations required to execute the algorithm without losing quality characteristics, and also to get rid of possible dependencies on the architecture, compiler, etc. Roughly speaking, if the algorithm calculates 1000 units of input data per hour, then this notation helps to quickly estimate how long, for example, 2000 units will be calculated. Naturally, this notation "roughens" information about the values of the function, this is its purpose.

What is an "asymptotically exact estimate of the running time of an algorithm"?

If we are talking about Θ-notation, then this is a function (or set of functions) that grows as fast as the running time of the algorithm with an increase in the length of the input data.

The estimate Θ() exists only when O() and Ω() are the same and equal to them.

This position seems to me partially correct. If f(n)=O(g(n)) and f(n)=Ω(g(n)), then f(n)=Θ(g(n)), where g(n) is some function, for example, of the form nlogn. Another thing is that if f(n)=O(n), then it is also true that f(n)=O(n^2), that is, despite the fact that the function has a Θ-estimate, its O- and Ω-estimates may not coincide.

So, O() is the asymptotic evaluation of the algorithm on the worst input data, Ω() is on the best input data

If we define "best"/"worst" data as requiring the minimum/maximum time among input data sets of the same length, then this statement also seems to me partially correct. The number of operations that the algorithm performs in the worst, average and best cases are functions of the length of the input data. Each of these functions can be evaluated using each of the three (Ω,Θ,O) notations.
It seems reasonable to me that the estimates are:
f(n)=O(g(n)) - the function f(n) grows no faster than the function g(n)
f(n)=Ω(g(n)) - the function f(n ) grows no slower than the function g(n)
f(n)=Θ(g(n)) - the function f(n) grows as fast as the function g(n)
Try to draw a graph of some increasing function on a logarithmic scale along the y-axis, and imagine where the values of functions that correctly evaluate the original using Ω, O, Θ notations are located, using the definitions from Knuth's article and marking the constants C and n0 on the graph.
It is known that, for example, for sorting qsort, the average estimate for a random distribution of input data (it is also the best, for a fully balanced version) is Θ(nlogn),From my point of view, it would also be correct to say that the average score is also O(nlogn) or Ω(n).

while the upper bound (for tailor-made non-optimal data) is O(n^2).

and also equal to Θ(n^2).

Would it be correct to say that a really asymptotically accurate estimate of an algorithm is given primarily based on the specifics of the operation of a particular algorithm for averaged input data (understanding averaged data as a randomly distributed array of data), and in complex cases, starting from O() and below estimates Ω()?

From my point of view, if there are coinciding estimates O and Ω, the Θ-estimate is obtained in an elementary way. Another thing is that the "worst", "best", "average" computational complexity are functions of the length of the input data. For each of these functions, an estimate of the asymptotic rate of increase can be given, whether it be Ω, Θ or O. Talking about a "randomly distributed array of data", you can delve into mathematical statistics, which, in my opinion, will not simplify the task.
I would like to take this opportunity to recommend the coursera course by Tim Roughgarden. Relevant videos are on youtube.