Class/struct member of type uint8_t * or int8

F

floppa3222021-12-16 00:58:42

C++ / C#

floppa322, 2021-12-16 00:58:42

Class/struct member of type uint8_t * or int8_t *, optimization?

Hello everyone

Do I understand correctly that if the class has a member uint8_t * or int8_t * (char), for example, a pointer to the beginning of an array, and in the body of the method there are dereferences of this member and further assignments (and possibly even an assignment cycle), then in order for the compiler to apply the optimization, you need to cache the class member into a local variable ? Link to a discussion of a similar topic
That is, instead of:
uint8_t * buffer; // Член класса/структуры

buffer[0] = value1;
buffer[1] = value2;
buffer[2] = value3;
...
buffer[n] = valueN;

Write:

uint8_t * _buffer = buffer;
_buffer[0] = value1;
_buffer[1] = value2;
_buffer[2] = value3;
...
_buffer[n] = valueN;

And the second question: instead of making a local copy of this variable, can we just add __restrict to the method signature , ensuring that this does not change in the method body ? Because if I understand correctly, according to the 1st answer here

In that case, writing to this->target[0] would alter the contents of this (and thus, this->target).

can change the const pointer this according to the __restrict method signature ?

PS:

I measured the speed of this section of code with -O3 - with a small number of assignments, the speed differs by 10-15%, and if assignments occur, for example, in a loop with 10_000+ iterations, then the speed differs by 2 times
Well, if you replace uint8_t *, for example, with uint16_t *, then the version with variable caching does not give any performance gain

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

W

Wataru, 2021-12-16
@Lite_stream

The problem here is not that the class member is char*, but that the entry goes to char*, and because of the strict aliasing rules, it can be an entry anywhere, including in &this. Any member of the structure, of any type, would have to be cached.
The same problem can be reproduced on a smaller scale if you have a loop writing to int* and reading some other unchanging int variable. Especially when the variable is on the heap and the pointer came in as a function parameter. Here the compiler goes nuts and will load it into the register again at each iteration. Again, because well, he cannot understand that this pointer does not point to this variable.
Partially this can be solved by caching, you can try to change the types in some places. But you should not do this - this is the very premature optimization that Knuth wrote about. Better the algorithm is good and the data structures are correct in your program choose. And further, if profiling shows that this is where the bottleneck is, then you can look at the assembly code and think about how to convince the compiler to generate something faster.

R

res2001, 2021-12-16
@res2001

If there is no need for such a local variable, then there is no need to "cache".
At the assembler level, all memory accesses occur through registers, so in any case, the address from the pointer will be written to the register and this register will be indexed.
In general, such simple tests require millions of iterations, and the test must be run hundreds or thousands of times and the average running time calculated. Before each iteration, you need to take care of clearing the processor cache, otherwise the previous iteration will affect the speed of the current one. In other cases, measurements are meaningless.
It is often convenient to use a local variable, just because the pointer (reference) is somewhere in the third nesting inside the class - it's easier to get it right away and then use the short notation. But this is convenient for the programmer, and the compiler does not care.
In any case, whether you make caching in the code or not, if the compiler considers it necessary to cache a variable in a register, it will keep it in a register and caching will not affect performance in any way.
You can play around with restrict, but it doesn't exist in pure C++, as far as I know, but you can include extensions in gcc/clang and maybe it can be used in plus code. But it is better to apply it immediately to the local buffer, and not to this.
Finally: this type of code is executed quite quickly, there is nothing special to optimize here, it makes sense to bother only if it is very, very hot code in your application. Well, in terms of optimization for such cases, it is better to look for options on how to do without copying data than to hope that restrict will help you a lot.