A
A
Adamos2020-10-15 10:59:27
linux
Adamos, 2020-10-15 10:59:27

The speed of pure enumeration - how can this be?

Preamble.
There is a program that does a complete enumeration with a return and cutting off dead ends. Roughly speaking, solves the knapsack problem on a large scale.
The program is written in C++ using STL. In the process of enumeration, no memory is allocated, the prepared memory is reused, the data is small and must fall into the processor cache.
Under Linux the program is compiled by GCC, under Windows - VS2010.

Ambula.
The same case, a complete search of all options, takes 40 - 60 seconds on Windows.
On Linux - 11 minutes!

Full ambulance.
The same Windows version under Wine slows down exactly the same as the Linux version. Accordingly, we can turn the compilation nuances into a tube.
If someone thinks that he understands - throw ideas in which direction to think at all, since the use of this particular program under Linux is preferable.
Brute force profiling, obviously, will show that brute force is eating all the time, I understand that ...

UPD: manual profiling showed a rather funny thing, on which one has to end the working day today:

// Вот эти строчки кушают под Окошками и под Линем практически одинаково
std::vector< short > update;
std::vector< short >::const_iterator v1 = nextVar.begin(), v2 = already.begin();
std::vector< short >::const_iterator e1 = nextVar.end(), e2 = already.end();
update.reserve(std::min(nextVar.size(), already.size())));

// А вот эти - 11% всего времени перебора под Окошками и 62% - под Линем!
// При том, что тот update УЖЕ имеет достаточный размер и НИКАК не может его превысить
while((v1 != e1) && (v2 != e2)) {
  if(*v1 < *v2) {
    ++v1;
  } else if(*v1 > *v2) {
    ++v2;
  } else {
    update.push_back(*v1);
    ++v1;
    ++v2;
  }
}

The code simply makes a third of the two sorted vectors, containing only matching values.

Answer the question

In order to leave comments, you need to log in

7 answer(s)
A
Adamos, 2020-10-18
@Adamos

“It's as simple as a pancake,” he said. - It's trivial. It's flat and trite. It's not even fun to tell

QtCreator, when building the Release version, for some reason forgets to tell its qmake that it is the Release version that is being built.
Prescribed in the project QMAKE_CXXFLAGS_RELEASE += -Ofast- is simply ignored.
It is enough to replace it with QMAKE_CXXFLAGS += -Ofastor add it to the qmake call CONFIG += release- and the compiled program in Lin on real hardware, of course, immediately hides virtual Windows, as it should be by nature.
It was enough to carefully look at the build output, which, suddenly, practically did not change from switching between Debug and Release.
A hole
And a crack
And a strange hole
It has nothing to do with it at all!

X
xmoonlight, 2020-10-15
@xmoonlight

There is NO DATA COLLECTED in the question to correctly and clearly answer this question!
No metrics collected, no architecture, no technologies used, no application type, no code compilation tools, no repository (or framework), etc. - THERE IS NOTHING.
Profiling?! No, we haven't. :)
Ok. Do it manually yourself: take and add to encapsulating (object calls) and iterative calls (loops, recursions) timings and thread IDs (and other metrics, for the functional environment used).
After measuring, you will see for yourself.

Profiling the enumeration, obviously, will show that the enumeration is eating all the time, as I understand it ...
It's strange that only you understand this! ;)
I will try to tell what's the matter when (and if) I get to the bottom of the truth.
... And I'll show that you all here don't know shit, and I'm d'Artagnan and I can EVEN! to answer your own miserable semblance of a "question"!
UPD:
The code simply makes a third of the two sorted vectors, containing only matching values.
here

M
mayton2019, 2020-10-16
@mayton2019

Prints. Or observations.
1. The cycle where there is a merge of two vectors is trivial. The weak point can be the memory reserve function, which is implemented differently in win/Linux. I'm not saying it's bad on linux. Perhaps the stars just converged so that page or other properties of wasps in relation to allocations became unfavorable.
2. What's with the bit depth 32/64? Need to check. What's with the iron? Is the author trying to outwit us by running all this on different hardware. Even tiny differences in the size of L1 caches could work here.
3. STL versions. The author uses not raw pointers but iterators. And cunning ones. What is the logic for increment and dereferencing under the hood.
To discard my assumptions completely - I propose to rewrite this cycle (presumably the hottest code according to the author) to pointers without STL.
4. GCC options should be looked at. Move optimization. O1, O2.

V
Vladimir, 2020-10-15
@MechanID

I'm not a real welder but... look in gcc -v does it have "-march=native" option? if not, specify it explicitly when compiling.

K
Karpion, 2020-10-15
@Karpion

Well, I would look at how many threads the program spawns, and how many cores they occupy. If you have about the same number of cores on your computer, how many times the difference in speed is, then it makes sense to look here (I assume that WINE can use only one core).
Is the program executed in batch, like a console utility? Or is it graphic?
Maybe run it through the time command - see what it says?

A
asd111, 2020-10-15
@asd111

If the program is really 15 years old, then most likely the problem is in wxWidget. Grandfathers write that in those days wxWidget slowed down on gtk2 and consumed 10% of the CPU in text editing.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question