How to debug programs that take a long time to run?

piva2014-12-18 23:27:11

Programming

piva, 2014-12-18 23:27:11

There is a certain program which is interrupted with an error message, for example, in 2 hours after start. I can’t reduce this time, since the error occurs only when certain parameters are selected, which lead to such a long work.
In a couple of days, writing down the results of each run in a notebook and then analyzing the results, I was able to figure out the source of the error and fix the error.
But the fact is that there are also programs that throw an error a couple of days after launch. Then debugging them is generally very difficult. I tried to run several copies in parallel on several computers in order to speed up the result and debugging. But I realized that it confuses me, because my own analysis of the result in my head will still be consistent.
Hence the question. Doesn't anyone know about methods, techniques, tricks to simplify debugging programs that run for a long time?
(I repeat that the question concerns the case when it is no longer possible to reduce the time of work)

Answer the question

In order to leave comments, you need to log in

7 answer(s)

Maxim, 2014-12-18
@z17

Make the program write to the log everything in a row during execution.
Then read this log - you will see where something went wrong.

Armenian Radio, 2014-12-18
@gbg

Formal proof and detailed logging with stack unrolling.
A little more paranoia when writing code - check all input parameters of methods and procedures for validity, check the return of all system calls. So the error will be found earlier.
Place constants wherever possible.
Collect with -Wall and get no warnings.

Mercury13, 2014-12-18
@Mercury13

1. Special conditions of performance. For example, I have my own template Array1d<>, which has not only a range check, but also the so-called. "canary" - checking if a "crazy" subroutine accidentally spoiled it. All these delights are included in the compilation options.
2. Detailed logs of key objects: what happened and why.
3. Just a flair. I caught an uninitialized variable on the stack for a long time: I knew where the error was, but any diagnostics (and even the translation of the compiler into debug) - the stack is shifted, and look for wind in the field. I turned on the range check, the same canary - it doesn’t give anything (well, of course, no one writes to the “left” memory). I plugged the error many times, but later I somehow managed to diagnose it, and then it was a matter of technology.

Inteli Sense, 2014-12-18
@Aios

As an option for the first launch, build some environment from variables and parameters in order to get a picture. Then program a hook - let's say that these same parameters could be passed to the "error area" from the outside. Thus, you increase the likelihood that you yourself will catch the error without the assistance of the program - and while it works, you will go through the parameters until the program itself falls off. There you will be able to understand whether this happened through your fault or not.

Tyranron, 2014-12-19
@Tyranron

You can also write tests =)

Don Kaban, 2014-12-19
@donkaban

Wait. If you accept the condition " it is impossible to speed up the process or emulate a crash " - then the maximum logging level is turned on, check-guards are inserted wherever possible, the call stack is written for any error (this is easy to do from the inside, without a debugger). All exceptions are processed in the style - "this is fatal, we write a call stack and fall out." Well, with eyes on the logs.
PS To everyone who says that nothing is impossible - my system driver crashed, and moreover, exactly after 3 hours of a specific load. Not a normal Linux one, but a blob from a hardware manufacturer, there was no talk of any source codes. For about a week we covered it with logs, wherever possible. Found. But it drank a lot of blood. So it happens that faster - it will not work.

xmoonlight, 2014-12-18
@xmoonlight

You are exactly using the construct:

try {
//do something
} catch (e) {
//errors
}

;)