D
D
Dmitry Kinash2014-08-18 15:44:54
C++ / C#
Dmitry Kinash, 2014-08-18 15:44:54

What does this error mean in C++?

Unfortunately, I am not at all an expert on C ++, or on system programming. Therefore, I need the help of real experts.
I have a program that is written in C++ (there are sources) and compiled for 32-bit Ubuntu 14.04. Usually this program works fine, but on some combinations of incoming data, it quietly dies with a Segmentation fault . Since the application works as a multi-threaded server, I had no way to understand what and how it was dropping. And I, as a system administrator, could only solve this problem by restarting the daemon. But yesterday there was only one request that successfully failed this server. Its repeated call still successfully kills the application.
Armed with knowledge from Google, I added the line ulimit -c unlimited to the script from /etc/init.d before starting and got a dump when it crashed. Then I fed it to the GDB input and got the following information:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x080ac8a6 in size (this=<optimized out>, this=<optimized out>) at /usr/include/c++/4.8/bits/stl_vector.h:646
646           { return size_type(this->_M_impl._M_finish - this->_M_impl._M_start); }

The penultimate backtrace command is an attempt to call the size() method on a vector from instances of a certain class. Which once again confirms the generation of the problem in the internals of the STL.
My meager knowledge of C ++ in the amount of two semesters of the university is not enough to understand the reason for knocking out a Segmentation fault with the size_type expression. It is completely incomprehensible why everything works fine on most data sets, and an error is knocked out only about once a week - although the data is on average the same type and a failed set does not stand out by eye. Ask for help from those who know. Thanks in advance!
PS I don't know how important this information is, but it definitely won't be superfluous. The size of the dump file is 2335M. A running application during working hours shows the following memory consumption indicators: VIRT=2295636, RES=2.11g. Those. As far as I understand, the Linux 3g border for 32-bit applications is still far away. Memory leaks, if any, are insignificant (I don’t have time to fix it in a week or two of stable operation), and the application successfully crashes immediately after launch without an additional load of user requests and with a sufficient margin of RAM for its calculations.
Upd. The program is now running in 3 threads, and indeed there is data that has working access from all threads. But this data is used exclusively for reading (after starting the program, 2 Gigabytes are taken up into memory and no longer change). The server receives the request and passes control to a free thread, which fetches from the shared data and performs the calculations required of it. Once again, the data on the server itself does not change! Threads inside themselves create the structures required for their algorithms, and upon completion clear the memory from them. I also emphasize once again that an error can be thrown on just one request in one thread in the absence of activity from other threads. The program has only one critical control section, but access to it is regulated using mutexes,

Answer the question

In order to leave comments, you need to log in

5 answer(s)
B
bogolt, 2014-08-18
@Dementor

An error in the bowels of the STL is an unlikely event. This library has been tested for years on a huge number of projects. The fact that the error manifested itself in std:: vector does not mean at all that the error is there. Most likely - the code is trying to work with remote memory, calls the function for getting the size, and since the object does not exist, the program crashes.
This is completely normal for programs with manual memory management (well, it seems to happen for the rest too).
Why the error may occur from time to time: There are many reasons for this. Maybe incorrect work with multithreading (and here it already depends on the OS how it is and what it will distribute) leads to the fact that occasionally one thread accesses a resource remote by another thread. Maybe the memory error does not appear immediately because at least some values ​​refer to non-existent memory, but physically this memory is in the address space of your program, which means that the OS thinks that everything is in order. Well, when the pointers go beyond this space, a big boom happens.
Since you have a query that is guaranteed to reproduce the problem, you should try to debug your server and find this error.

M
Misha Krinkin, 2014-08-18
@kmu1990

As a way to clarify the situation, you can try using valgrind, it also monitors leaks and intercepts access to "foreign" memory if you're lucky - this is one option, another option, if you have access to the source, recompile the program with the -fsanitize=address flag, with this flag, the program will spit out memory errors (which usually leads to a segmentation fault).

D
Dmitry, 2014-08-19
@TrueBers

this=optimized out, this=optimized out

Is it possible to disable -O and enable -g?
There will be much more information instead of optimized out .

M
m0rd, 2014-08-18
@m0rd

The most common mistake that causes a Segmentation fault is to free memory and then access it. You can try to set a static analyzer on the code, it will help you find errors in the code.

S
Sumor, 2014-08-18
@Sumor

Indeed, it can be driven away by static analyzers. They are good at catching situations when there is a call to freed memory. For example:
CLang analizer
CppCheck
PVS-Studio

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question