Answer the question
In order to leave comments, you need to log in
Why is my AMD Ryzen 3970 2-3x slower than Core i9 10850K?
This situation is strange. We bought an AMD Ryzen 3970 for work. It was supposed to multi-thread our mats on it. tasks.
Before that, we used Core i9 10850K (10 cores + 10 Hyper-threading).
As a result, I have such an ugly picture ....
40 million of our mat iterations ( mat operations do not contain floating point operations at all, only addition, subtraction, multiplication and shift )
- on Intel, when using 20 threads, it takes 20.5 seconds
- on Amd when using 64 threads takes 37.0 seconds
. At the same time, the resource monitor shows that all Intel cores are loaded at 100% and AMD at 35-45%
setting priorities when creating a thread has no effect:
threads[ core ] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)testMathThread, params, 0, NULL);
::SetThreadPriority( threads[ core ], THREAD_PRIORITY_HIGHEST);
::SetThreadAffinityMask( threads[ core ], 1 << core );
Answer the question
In order to leave comments, you need to log in
Update Issue resolved!
With the problem for 12 hours sekas managed to figure it out.
The project used a self-written lib for RNG based on mt19937 and the person who wrote it 5 years ago made it thread-safe. Having crammed into all the challenges
I don't know why AMD "rested" on these challenges longer than Intel, but the fact remains. Twice as many percent of the Reds lost time than the Blues. As a result, the blue ones have 100% load and the red ones have about 50.
As a temporary solution (until the old lib was rewritten), I added my own Random class to each thread based on the standard rand() / srand() from C ++
, this is a solution on the knee. But the main reason was found and the accuracy of the calculations was not affected.
std::lock_guard guard(mMutex);
__declspec(thread) Random* random= nullptr;
class Random
{
public:
Random()
{
_rand_state = 0;
}
void srand(unsigned int const seed)
{
_rand_state = seed;
}
uint16_t rand()
{
_rand_state = _rand_state * 214013 + 2531011;
return (_rand_state >> 16) & RAND_MAX;
}
private:
uint32_t _rand_state;
}
4 million AMD iterations 32 threads = 4.05 sec. CPU utilization 45%
4 million iterations AMD 64 threads = 3.61 sec. CPU utilization 47%
4 million Intel iterations 10 threads = 4.01 sec. CPU utilization 75%
4 million iterations Intel 20 threads = 2.61 sec. CPU usage 100%
4 million AMD iterations 32 threads = 1.25 sec. CPU utilization 60% ( 1 thread per physical core )
4 million AMD iterations 64 threads = 0.71 sec. CPU utilization 100% ( 1 thread per physical core + HP )
4 million Intel iterations 10 threads = 2.8 sec. CPU utilization 70% ( 1 thread per physical core )
4 million Intel iterations 20 threads = 2.1 sec. CPU utilization 100% ( 1 thread per physical core + HP )
- on Intel when using 20 threads takes 20.5 seconds
- on Amd when using 64 threads takes 37.0 seconds
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question