Answer the question
In order to leave comments, you need to log in
How to explain the different execution speed of nested loops in different languages?
Good evening!
I read a post about how assembly language code runs faster than C.
thedeemon.livejournal.com/49226.html
There, the essence is that there are two int arrays with 256 elements:
The task is very simple: there are two arrays of 16-bit integers, find the sum of the squared differences. In the original, these were 16x16 blocks, and they had to be passed in a double cycle .... I run such a cycle 10 million times and look at the time, and also look at what kind of code was generated.
int main(int argc, char **argv)
{
int g = 0;
int a_list[] = {256 рандомных чисел от 1 до 100};
int b_list[] = {256 рандомных чисел от 1 до 100};
for (g = 0; g < 1000000; g++)
array_func(a_list, b_list);
return 0;
}
void array_func(int arr1[], int arr2[])
{
int sum = 0, j=0;
int x = 0, y = 0;
for (y = 0; y < 16; y++)
{
for (x = 0; x < 16; x++)
{
short v = arr1[j] - arr2[j];
sum += v*v;
j++;
}
}
}
def main_func(a_list, b_list):
summ = 0 #инициализация суммы
j = 0 #независимый счетчик от 0 до 256
for y in range(0, 16):
for x in range(0, 16):
p = a_list[j] - b_list[j]
summ += p * p
j += 1
for g in range(0, 1000000):
main_func(a_list, b_list)
function main_func(a_list, b_list) {
for (var g = 0; g < 10000000; g++) {
var summ = 0;
var j = 0;
for (var y = 0; y < 16; y++) {
for (var x = 0; x < 16; x++) {
var p = a_list[j] - b_list[j];
summ += p * p;
j++;
}
}
}
}
console.time('test');
main_func(a_list, b_list);
console.timeEnd('test');
Answer the question
In order to leave comments, you need to log in
I wasn't happy with how you assessed the performance of Python, so I took it upon myself to slightly tweak your example to show completely different results. Running my example, in addition to installing numpy ( pip install numpy
), will require the installation of another interesting library ( pip install numba
) with its installation, some difficulties may be associated on various operating systems (it also depends on llvm), but believe me, it's worth it, the resulting performance numbers should please.
To demonstrate the real speed of calculations in my example, a million iterations is quite small, the stone does not have time to warm up well. Please note that I replaced a million iterations with 100 million , so the result must be divided by 100for comparison with other languages . Here is the code itself:
from numba import jit
import numpy
@jit
def inner_func(a_list, b_list):
sum = 0
j = 0
for y in range(0, 16):
for x in range(0, 16):
p = a_list[j] - b_list[j]
sum += p * p
j += 1
return sum
@jit
def outer_func(a_list, b_list):
sum = 0
for g in range(0, 100000000): # 100 000 000 == 10^8 !!!
sum += inner_func(a_list, b_list)
return sum
def main():
maxint = numpy.iinfo(numpy.intc).max
a_list = numpy.random.randint(maxint, size=256)
b_list = numpy.random.randint(maxint, size=256)
sum = outer_func(a_list, b_list)
print(sum)
if __name__ == '__main__':
main()
I'm just starting to learn how to program, and I have a few questions:
In general, in such cases, everything comes down to interpretation (well, or the execution of machine code, if it is machine code; in fact, this is the same interpretation).
Namely, to two factors:
1) how well the interpreter algorithm is optimized (the code parsing algorithm, + is the interpreter doing too many unnecessary operations not related to code execution, for example, it can load something there)
> I measured the execution time with using the Linux time command.
That is, you did not look at the execution time of the algorithm, but the execution time of the program as a whole, from launch to completion? Well, in general, there are a lot of reasons, maybe the interpreter does not start immediately, and what it loads and does at startup, in general, some authors know.
2) how minimized (and generally optimized for interpretation) the interpreted code itself. This is why code in binary format (machine code, bytecode) is potentially faster than code in text format. Firstly, the binary is stupidly more compact, say, instead of the word function (which takes at least 8 bytes), there can be only 1-2 krakozyabras. Secondly, the binary format is strict, everything is more unambiguous there and there is no all this dregs with spaces, tabs, etc., the interpreter has less to think about what each byte means together with the bytes before it and the bytes after it. Minimizing the code with the help of utilities helps to make it more compact, but the interpreter still checks for tabs, spaces, etc., so the minified code in text format is still not always very fast anyway.
But in the case of C, by default, standard compilers simply mess up the code with all sorts of "rubbish", this is perfectly visible if you take OllyDbg and compare it at least in length with the code that assembler compilers give.
you forgot to do a multi-threaded calculation in C, so he lost to Java,
about libraries, there are a lot of them in C,
and different languages work differently with arrays, for example, a bolt scores on checking array boundaries, leaving it on the programmer’s conscience, there are still a lot depends on how the arrays are located in memory, alignment, I'm not talking about some cheats like unrolling the loop, here you have the length of the loop known in advance, the compiler should unroll such a loop as it doesn't work - fewer conditional jumps faster code
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question