Interprocess mutex & condition variable at process termination

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Interprocess mutex & condition variable at process termination

Boost - Dev mailing list
Dear Experts,

I've just been surprised by the behaviour of the interprocess
mutex and condition variable on abnormal process termination, i.e.
they are not automatically released.

Google tells me that I'm not the first to be surprised by this; there
have been previous posts here, stack overflow questions etc.

One often-valid observation is that if a process crashes - or
otherwise terminates without executing its destructors - while it
holds a lock on a shared data structure then the data is probably
now corrupt, so unlocking the mutex that protects it is not very
useful.  I think there is an important case where that does not
apply - when the process that crashes is only reading the shared
data.  In my case, I had written a "monitor" utility that loops
forever, waiting on a shared condition, taking the corresponding
mutex, and then dumping the shared data to stdout.  I had been
running this and stopping it by pressing ctrl-C and it had not
occurred to me that this might not work as I expected.  My
attempt at debugging using this utility was making my problems worse,
not better!  Modifying this code to run destructors on ctrl-C is
non-trivial.

I am aware that the SysV shared semaphore is able to undo on
process termination (see SEM_UNDO in man semop), and I had assumed
that Boost.Interprocess was using this or something like it.  I
now see that it is using pthreads, which I didn't even realise
could work between processes, and I don't think this API has
any way to specify process termination behaviour.

Anyway, I'd like to suggest that the interprocess docs should
make some mention of the behaviour of the synchronisation
primitives on process termination, e.g. somewhere near the
beginning of http://www.boost.org/doc/libs/1_63_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.mutexes

I may now try to implement some primitives that use semop() and
unlock automatically.  I haven't yet looked at what's involved to
implement a condition variable on top of a semaphore, so I may not
get very far!  Has anyone else ever tried this?

Also, I note that Interprocess is using "old style" times, not
std::chrono like the std::mutex/condition do.  Are there any plans
to update this?


Thanks,

Phil.





_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interprocess mutex & condition variable at process termination

Boost - Dev mailing list
On 02/15/17 20:42, Phil Endecott via Boost wrote:

> Dear Experts,
>
> I've just been surprised by the behaviour of the interprocess
> mutex and condition variable on abnormal process termination, i.e.
> they are not automatically released.
>
> Google tells me that I'm not the first to be surprised by this; there
> have been previous posts here, stack overflow questions etc.
>
> One often-valid observation is that if a process crashes - or
> otherwise terminates without executing its destructors - while it
> holds a lock on a shared data structure then the data is probably
> now corrupt, so unlocking the mutex that protects it is not very
> useful.  I think there is an important case where that does not
> apply - when the process that crashes is only reading the shared
> data.  In my case, I had written a "monitor" utility that loops
> forever, waiting on a shared condition, taking the corresponding
> mutex, and then dumping the shared data to stdout.  I had been
> running this and stopping it by pressing ctrl-C and it had not
> occurred to me that this might not work as I expected.  My
> attempt at debugging using this utility was making my problems worse,
> not better!  Modifying this code to run destructors on ctrl-C is
> non-trivial.
>
> I am aware that the SysV shared semaphore is able to undo on
> process termination (see SEM_UNDO in man semop), and I had assumed
> that Boost.Interprocess was using this or something like it.  I
> now see that it is using pthreads, which I didn't even realise
> could work between processes, and I don't think this API has
> any way to specify process termination behaviour.

There is a way to handle this case, but this API is not universally
supported:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getrobust.html

If that API is not supported on your platform, you may want to avoid
locking the mutex without a timeout (i.e. failing to acquire a mutex for
a given time should be considered an indication that the mutex has been
abandoned in the locked state).

In general, synchronization primitives that reside in shared memory
(such as pthread mutexes or Boost.Interprocess mutexes) should be
considered vulnerable to (a) corruption and (b) becoming unusable (like,
indefinitely locked) because of a user process misbehavior. That is
rather obvious considering that such primitives typically do not include
any other resources, such as handles to kernel objects or file
descriptors and as such "don't exist" for the kernel (consequently, the
kernel cannot release them on process termination). Robust mutexes that
I referenced above are an exception to that general rule.

Named primitives, such as SysV semaphores, are typically more protected
because there is at least a file descriptor or something that
corresponds to the name and there is usually a limited API to interact
with the primitive (i.e. you usually don't have a direct access to the
primitive data).

There are a number of named synchronization primitives in
Boost.Interprocess, although I don't think they provide "auto unlock on
process termination" feature.

> Anyway, I'd like to suggest that the interprocess docs should
> make some mention of the behaviour of the synchronisation
> primitives on process termination, e.g. somewhere near the
> beginning of
> http://www.boost.org/doc/libs/1_63_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.mutexes
>
> I may now try to implement some primitives that use semop() and
> unlock automatically.  I haven't yet looked at what's involved to
> implement a condition variable on top of a semaphore, so I may not
> get very far!  Has anyone else ever tried this?

If you want (more or less) reliable interprocess synchronization, you
will currently have to implement it yourself. There are a number of
compromises to make along the way. For instance, pthread robust mutexes
API does not quite fit into the traditional C++ mutex API, so one has to
improvise. In the absence of robust mutexes, the timeout workaround is
not universally applicable, and the timeout itself is, obviously,
case-specific. Also, most of these APIs are not fully portable (not
between Windows and POSIX-compatible systems, anyway), so you end up
with OS-specific branches.

I did implement this an a few of my projects. One example is Boost.Log,
where I opportunistically use robust mutexes:

https://github.com/boostorg/log/blob/develop/src/posix/ipc_sync_wrappers.hpp
https://github.com/boostorg/log/blob/develop/src/posix/ipc_reliable_message_queue.cpp

You can see Windows implementation is quite different:

https://github.com/boostorg/log/blob/develop/src/windows/ipc_sync_wrappers.hpp
https://github.com/boostorg/log/blob/develop/src/windows/ipc_sync_wrappers.cpp
https://github.com/boostorg/log/blob/develop/src/windows/ipc_reliable_message_queue.cpp

The best solution to these problems, however, is to avoid locks
altogether and use lock-free algorithms in such a way that any data in
the shared memory is valid and can be handled.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interprocess mutex & condition variable at process termination

Boost - Dev mailing list
Andrey Semashev wrote:
> On 02/15/17 20:42, Phil Endecott via Boost wrote:
>> I've just been surprised by the behaviour of the interprocess
>> mutex and condition variable on abnormal process termination, i.e.
>> they are not automatically released.

> There is a way to handle this case, but this API is not universally
> supported:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getrobust.html

Thanks for pointing that out.  For some reason I thought that
"robust" mutexes solved some other problem.

I think that in my case where I have some processes that only
read the shared data, it would be possible to handle EOWNERDEAD
by either continuing if the previous lock were a read-lock, or
by throwing if it were a write lock.  I don't think Interprocess
does any of this, does it?

> The best solution to these problems, however, is to avoid locks
> altogether and use lock-free algorithms in such a way that any data in
> the shared memory is valid and can be handled.

Maybe, though my next concern would be how to implement the functionality
of a condition variable.  What happens if a process crashes while it
is waiting on a condition variable?  I did once know how Linux
implements condition variables using atomics and futexes, and I
think it's probably safe to crash in this situation, but I guess
there are no guarantees.


Thanks, Phil.





_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interprocess mutex & condition variable at process termination

Boost - Dev mailing list
On 16/02/2017 12:31, Phil Endecott via Boost wrote:

> Andrey Semashev wrote:
>> On 02/15/17 20:42, Phil Endecott via Boost wrote:
>>> I've just been surprised by the behaviour of the interprocess
>>> mutex and condition variable on abnormal process termination, i.e.
>>> they are not automatically released.
>
>> There is a way to handle this case, but this API is not universally
>> supported:
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getrobust.html
>
> I think that in my case where I have some processes that only
> read the shared data, it would be possible to handle EOWNERDEAD
> by either continuing if the previous lock were a read-lock, or
> by throwing if it were a write lock.  I don't think Interprocess
> does any of this, does it?
>
>> The best solution to these problems, however, is to avoid locks
>> altogether and use lock-free algorithms in such a way that any data in
>> the shared memory is valid and can be handled.
>
> Maybe, though my next concern would be how to implement the functionality
> of a condition variable.  What happens if a process crashes while it
> is waiting on a condition variable?  I did once know how Linux
> implements condition variables using atomics and futexes, and I
> think it's probably safe to crash in this situation, but I guess
> there are no guarantees.

The only portable way that I know of to build a portable interprocess
mutex which knows when one of the processes has died is using a pipe
instance. You write a byte to "unlock" the mutex and read all bytes
until it's empty to "lock" the mutex. select() can be used to block
until the mutex is unlocked.

I've built a fair few of these over the years and performance is
actually pretty good considering what it is. I'm kinda surprised that
Boost.Interprocess doesn't have one yet.

Niall

--
ned Productions Limited Consulting
http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interprocess mutex & condition variable at process termination

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 02/16/17 15:31, Phil Endecott via Boost wrote:

> Andrey Semashev wrote:
>> On 02/15/17 20:42, Phil Endecott via Boost wrote:
>>> I've just been surprised by the behaviour of the interprocess
>>> mutex and condition variable on abnormal process termination, i.e.
>>> they are not automatically released.
>
>> There is a way to handle this case, but this API is not universally
>> supported:
>>
>> http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getrobust.html
>>
>
> Thanks for pointing that out.  For some reason I thought that
> "robust" mutexes solved some other problem.
>
> I think that in my case where I have some processes that only
> read the shared data, it would be possible to handle EOWNERDEAD
> by either continuing if the previous lock were a read-lock, or
> by throwing if it were a write lock.  I don't think Interprocess
> does any of this, does it?

No, to my knowkedge, Boost.Interprocess doesn't use rubust mutexes.

You have to know though that condition variables also include a mutex,
and not a robust one. I don't know of any way, portable or not, to make
a CV use a robust mutex internally.

>> The best solution to these problems, however, is to avoid locks
>> altogether and use lock-free algorithms in such a way that any data in
>> the shared memory is valid and can be handled.
>
> Maybe, though my next concern would be how to implement the functionality
> of a condition variable.  What happens if a process crashes while it
> is waiting on a condition variable?

It depends on whether the process was actually blocked on the futex used
by CV to wait for notifications. If so, *I think* the failure might be
recoverable - some other thread will notify more threads than there are
actually waiting, and that's harmless. If not, then the internal mutex
in the CV was abandoned in the locked state and the CV is unusable.

Basically, when you want robust mutexes, CVs are not an option. You
might want to consider process-shared semaphores as a replacement.

http://pubs.opengroup.org/onlinepubs/7908799/xsh/sem_init.html

> I did once know how Linux
> implements condition variables using atomics and futexes, and I
> think it's probably safe to crash in this situation, but I guess
> there are no guarantees.

Yes, futexes are the way to go, if you target specifically Linux. The
important advantage is that you're in control on the mutex/CV
implementation and can define the behavior if the synchronization
promitive was abandoned. The tricky part is to detect when it is abandoned.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interprocess mutex & condition variable at process termination

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 15/02/2017 18:42, Phil Endecott via Boost wrote:

> Dear Experts,
>
> I've just been surprised by the behaviour of the interprocess
> mutex and condition variable on abnormal process termination, i.e.
> they are not automatically released.
>
> Google tells me that I'm not the first to be surprised by this; there
> have been previous posts here, stack overflow questions etc.
>
> One often-valid observation is that if a process crashes - or
> otherwise terminates without executing its destructors - while it
> holds a lock on a shared data structure then the data is probably
> now corrupt, so unlocking the mutex that protects it is not very
> useful.  I think there is an important case where that does not
> apply - when the process that crashes is only reading the shared
> data.  In my case, I had written a "monitor" utility that loops
> forever, waiting on a shared condition, taking the corresponding
> mutex, and then dumping the shared data to stdout.  I had been
> running this and stopping it by pressing ctrl-C and it had not
> occurred to me that this might not work as I expected.  My
> attempt at debugging using this utility was making my problems worse,
> not better!  Modifying this code to run destructors on ctrl-C is
> non-trivial.

There is a very poor but effective workaround if your application can
support long delays. Search for
BOOST_INTERPROCESS_ENABLE_TIMEOUT_WHEN_LOCKING and
BOOST_INTERPROCESS_TIMEOUT_WHEN_LOCKING_DURATION_MS. It's not
documented, but it should be added.

> I am aware that the SysV shared semaphore is able to undo on
> process termination (see SEM_UNDO in man semop), and I had assumed
> that Boost.Interprocess was using this or something like it.  I
> now see that it is using pthreads, which I didn't even realise
> could work between processes, and I don't think this API has
> any way to specify process termination behaviour.

Yes, but SysV shared semaphroes can't be placed in shared memory.

> Anyway, I'd like to suggest that the interprocess docs should
> make some mention of the behaviour of the synchronisation
> primitives on process termination, e.g. somewhere near the
> beginning of
> http://www.boost.org/doc/libs/1_63_0/doc/html/interprocess/synchronization_mechanisms.html#interprocess.synchronization_mechanisms.mutexes

Good suggestion.

> I may now try to implement some primitives that use semop() and
> unlock automatically.  I haven't yet looked at what's involved to
> implement a condition variable on top of a semaphore, so I may not
> get very far!  Has anyone else ever tried this?

There are several algorithms, but the problem is placing them in shared
memory. See an adapter in:

C:\Data\Libs\boost\boost\interprocess\sync\detail\condition_algorithm_8a.hpp
> Also, I note that Interprocess is using "old style" times, not
> std::chrono like the std::mutex/condition do.  Are there any plans
> to update this?

Yes, but I really can't get time to implement it. The idea one would
support std::chrono and boost::chrono. Patches welcome ;-)

Best,

Ion

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: Interprocess mutex & condition variable at process termination

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Feb 16, 2017, at 7:31 AM, Phil Endecott via Boost <[hidden email]> wrote:

> Maybe, though my next concern would be how to implement the functionality
> of a condition variable.  What happens if a process crashes while it
> is waiting on a condition variable?  I did once know how Linux
> implements condition variables using atomics and futexes, and I
> think it's probably safe to crash in this situation, but I guess
> there are no guarantees.

I recently developed an application which uses a process-shared condition variable to coordinate graphics updates between an "author" and one or more "viewer" processes.  I found that on Linux, not only was killing a viewer while it was waiting on the condition variable harmless, but killing and relaunching the author (which reinitialized the mutex and CV) had no adverse effect -- the application continued to work.  It's undefined behavior, of course, so I was pleasantly surprised.

By contrast, the OS X I tested on doesn't even appear to be POSIX-conforming.  Not only does relaunching the viewer after killing it mid-wait cause failures in the viewer *and* the tester (as one could at least anticipate, if not hope for), but a second viewer launched while the first was still waiting failed in pthread_cond_wait() (returning EINVAL), thus effectively limiting Apple's implementation of process-shared condition variables to two processes.

So yeah, no guarantees.

Josh


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost