U
U
Ukrainskiy2022-01-07 16:49:59
C++ / C#
Ukrainskiy, 2022-01-07 16:49:59

Most efficient way to search for a sequence of null bytes?

You need to find a sequence of zero bytes of a certain length, memchr does not look for zeros, I tried to make an additional buffer with zeros and look for it with memmem, it works quickly, but I'm not sure if this is an adequate way and I would not want to allocate additional for this. memory. What other ways are there?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
W
Wataru, 2022-01-07
@wataru

The most efficient way is to use SIMD instructions (some _mm256_cmpeq_epi8 and tricky bit arithmetic like _mm256_movemask_epi8 and the like on the result).
It is necessary to count the length of the sequence of zeros at the beginning and end of the block and remember the length of the sequence ending at the end of the previous block and continue it into the current block.
If you are looking for a short sequence, then you still need to check that it occurs in the current block. For example, to check that the bitmask M has a sequence of 5 bits, you can check that it (M & M >> 1) && ((M & M >> 1) >> 2) & M >> 4has non-zero bits. Log k shifts and bitwise ANDs are enough to find a sequence of K non-zero bits.
But you probably won't bother with it. The next option is to emulate SIMD manually in int64. Read from memory 8 bytes each (via memcpy to int64) and there already check through LSB / MSB how many zero bytes are at the end, at the beginning. A little more difficult if you need to search for a sequence shorter than eight bytes. Then it is necessary to separately check through bit shifts whether it is inside a block of 8 bytes.
Further, there is some non-zero chance that memmem implements the variant above, but this is unlikely. I think your case is quite frequent and manual implementation will be faster.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question