A
A
alienstone2014-12-20 10:59:05
linux
alienstone, 2014-12-20 10:59:05

Is this method of measuring the length of the hardware cache prefetch correct?

I was given the task to determine the length of the hardware cache prefetch. The prefetch is arranged according to the following method: after 2 consecutive cache misses, it determines the direction of movement and begins to make friends with the data in the cache with lines of unknown length. I need to experimentally determine the length of this string.
In my program, I have one array, which I pass to a function that already does 2 misses and then delays for the prefetch to load the data into the cache, and measures the access time to the 3rd element. Then it prints the result to a file. But unfortunately I don't see any patterns in the output file. There are indeed minor jumps over 30 int elements, but they are not constant. This function takes as input the array itself and an offset, which is set in a loop from 1 to 1024 int , which is equal to the size of the window, more than which, by definition, hardware prefetching cannot work. I have misses on the 0th element, on cache_string_size (64 b) * 2 and the next access I make cache_string_size * 4 + i. 4 because after a miss, 2 lines will probably be loaded into the cache after the second miss.
Misses construction:

if (array[0] == 0){ // first cache miss
        if (array[string_size * 2] == 0){ // second cache miss
            usleep(100);
            asm("rdtsc\n":"=a"(start.t32.th),"=d"(start.t32.tl));

            if (array[string_size * 4 + offset] == 0){ // acess to the element
                asm("rdtsc\n":"=a"(end.t32.th),"=d"(end.t32.tl));
            }
        }
}

Could you point out possible errors in my algorithm of actions? And if something is not right, tell me the way by which it will definitely be possible to do this?
OS: Ubuntu 14.04
Stone: Intel® Core™2 Duo CPU P8400
L2 size: 3mb, Associativity: 12 String size: 64 bytes
Also, I am attaching the implemented c++ program and the output file I get. Please help, I've been struggling with this for 3 weeks now. Many thanks in advance for your reply.
Implemented program
Output file

Answer the question

In order to leave comments, you need to log in

2 answer(s)
J
jcmvbkbc, 2014-12-20
@alienstone

Unfortunately, there are no regularities in the output file

What did you expect to see there? I would understand if you were going to compare the time of the second access with the time of the third.
To make sure that memory access occurs where it is written, it would not hurt to make pointers on which you go volatile int *, and asm to make asm volatile.

A
Armenian Radio, 2014-12-20
@gbg

Look at the ATLAS sources , it automatically detects the cache width and optimizes for it.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question