[test] Possible design flaw in execution_monitor::catch_signals

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[test] Possible design flaw in execution_monitor::catch_signals

Jim Douglas-2
A recent problem I encountered that caused the total failure of a
regression test run has highlighted what I (and others) believe to be a
design flaw in execution_monitor::catch_signals (in file
execution_monitor.ipp).

IIUC the purpose of the code in question is to intercept UNIX signals
and convert them to exceptions which contain a diagnostic message
indicating the signal type. For most signals this should work reliably.
However, we suggest that this mechanism is prone to failure when
attempting to deal with a SIGSEGV, as happened in my case.

What I observed was a test process that had stalled by blocking against
a mutex. Detailed analysis showed that the test failed with a memory
segment violation resulting in a SIGSEGV.

In brief, the primary execution failure corrupts the program heap. When
the C++ exception is thrown at execution_monitor:462, the exception
handling mechanism calls __cxa_allocate_exception which then calls
std::malloc. But, because of the corrupted heap, this call blocks
against the malloc mutex.

This is the specific case with QNX, and other OSs will deal with this in
different ways. The general point I would like to make is that after a
memory segment violation, any process's memory will be in an undefined
state and it is unreasonable to assume that it will be capable of
continued execution as per the present design. I propose that there is
no safe way to intercept a SIGSEGV and that it should be allowed to
terminate the process without intervention.

Comments please...

Jim

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [test] Possible design flaw in execution_monitor::catch_signals

Ian McCulloch
Jim Douglas wrote:

> A recent problem I encountered that caused the total failure of a
> regression test run has highlighted what I (and others) believe to be a
> design flaw in execution_monitor::catch_signals (in file
> execution_monitor.ipp).
>
> IIUC the purpose of the code in question is to intercept UNIX signals
> and convert them to exceptions which contain a diagnostic message
> indicating the signal type. For most signals this should work reliably.
> However, we suggest that this mechanism is prone to failure when
> attempting to deal with a SIGSEGV, as happened in my case.
>
> What I observed was a test process that had stalled by blocking against
> a mutex. Detailed analysis showed that the test failed with a memory
> segment violation resulting in a SIGSEGV.
>
> In brief, the primary execution failure corrupts the program heap. When
> the C++ exception is thrown at execution_monitor:462, the exception
> handling mechanism calls __cxa_allocate_exception which then calls
> std::malloc. But, because of the corrupted heap, this call blocks
> against the malloc mutex.
>
> This is the specific case with QNX, and other OSs will deal with this in
> different ways. The general point I would like to make is that after a
> memory segment violation, any process's memory will be in an undefined
> state and it is unreasonable to assume that it will be capable of
> continued execution as per the present design. I propose that there is
> no safe way to intercept a SIGSEGV and that it should be allowed to
> terminate the process without intervention.

Completely correct.  The SIGSEGV handler is not allowed to return (except in
the case where the signal was generated by a raise()), doing so is
undefined bahaviour.  Whatever extra stuff needs to be done to report the
error, needs to be done in the signal handler itself, using
async-signal-safe functions only.  Definitely no malloc().

Cheers,
Ian


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [test] Possible design flaw inexecution_monitor::catch_signals

Gennadiy Rozental
In reply to this post by Jim Douglas-2

> In brief, the primary execution failure corrupts the program heap. When
> the C++ exception is thrown at execution_monitor:462, the exception
> handling mechanism calls __cxa_allocate_exception which then calls
> std::malloc. But, because of the corrupted heap, this call blocks
> against the malloc mutex.

Is there any way to avoid memory allocation? The exception itself doesn't
allocate the memory (namely for the reasons above). We just need to avoid
allocation on system level.

> This is the specific case with QNX, and other OSs will deal with this in
> different ways. The general point I would like to make is that after a
> memory segment violation, any process's memory will be in an undefined
> state and it is unreasonable to assume that it will be capable of
> continued execution as per the present design. I propose that there is
> no safe way to intercept a SIGSEGV and that it should be allowed to
> terminate the process without intervention.

You could disable catching system errors for this test or this configuration
(using either command line argument or environment variable). But be aware
that this may cause different problems for regression testing runs.

Gennadiy



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [test] Possible design flaw inexecution_monitor::catch_signals

Jim Douglas-2
Gennadiy Rozental wrote:
>>In brief, the primary execution failure corrupts the program heap. When
>>the C++ exception is thrown at execution_monitor:462, the exception
>>handling mechanism calls __cxa_allocate_exception which then calls
>>std::malloc. But, because of the corrupted heap, this call blocks
>>against the malloc mutex.
>
> Is there any way to avoid memory allocation?

Yes, don't throw an exception. It is the system that is allocating
memory for the exception.

> The exception itself doesn't allocate the memory (namely for the reasons above).

Oh yes it does. Before you can throw an exception you have to create one
and that will involve a call to new, which in most cases results in a
call to malloc.

[...]

> You could disable catching system errors for this test or this configuration
> (using either command line argument or environment variable).

I am trying to make a general point here that the code is badly designed
and is prone to failure on any OS, not just QNX. To use a simple analogy
you are currently asking a mortally wounded soldier to get back in the
front line and continue to fight. A memory segment violation _may_
cripple a process. There is _no_guarantee_ that following a memory
segment violation a process is in a fit state to continue execution.
Don't you remember the good old days when a memory segment violation
under Windows would wipe out the entire OS :-)

See also Ian McCulloch's post that backs me up.

BTW the specific case has been dealt with by withdrawing the test. I
just want to make sure this does not happen again to anyone.

Jim


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [test] Possible design flawinexecution_monitor::catch_signals

Gennadiy Rozental

"Jim Douglas" <[hidden email]> wrote in message
news:du8qc7$ikt$[hidden email]...

> Gennadiy Rozental wrote:
>>>In brief, the primary execution failure corrupts the program heap. When
>>>the C++ exception is thrown at execution_monitor:462, the exception
>>>handling mechanism calls __cxa_allocate_exception which then calls
>>>std::malloc. But, because of the corrupted heap, this call blocks
>>>against the malloc mutex.
>>
>> Is there any way to avoid memory allocation?
>
> Yes, don't throw an exception. It is the system that is allocating
> memory for the exception.
>
>> The exception itself doesn't allocate the memory (namely for the reasons
>> above).
>
> Oh yes it does. Before you can throw an exception you have to create one
> and that will involve a call to new, which in most cases results in a
> call to malloc.

I don't see why. The exception is defined on stack. Why would the system
need to call to new?
There are other alternatives: I could make exception object static, I could
throw  special int value to indicate falire and have exception object
available globally.

> [...]
>
>> You could disable catching system errors for this test or this
>> configuration
>> (using either command line argument or environment variable).
>
> I am trying to make a general point here that the code is badly designed
> and is prone to failure on any OS, not just QNX. To use a simple analogy
> you are currently asking a mortally wounded soldier to get back in the
> front line and continue to fight.

I do not. Immediate reaction to SIGSEGV is an attempt to log an error and
exit process.

> A memory segment violation _may_
> cripple a process. There is _no_guarantee_ that following a memory
> segment violation a process is in a fit state to continue execution.
> Don't you remember the good old days when a memory segment violation
> under Windows would wipe out the entire OS :-)
>
> See also Ian McCulloch's post that backs me up.
>
> BTW the specific case has been dealt with by withdrawing the test. I
> just want to make sure this does not happen again to anyone.

This is really very simple. For the usages in regression testing user is not
interested in core files. Instead we prefer as much information in
output/log that indicate the error. I do not have an ability to properly log
an error from within signal handler. So I am trying to jump out of it to the
place where I could. In worse case scenario we should end up with another
core. Which is unusable, but we are not interested in core files in a fist
place.

And as usual any alternative proposition would be very welcome.

Gennadiy



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost