[Boost.Serialization] Help with PR fixing a memory leak in the current version

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Boost.Serialization] Help with PR fixing a memory leak in the current version

Boost - Dev mailing list
Hi,

TLDR: I would like https://github.com/boostorg/serialization/pull/105 to
be merged for the next Boost release to fix a memory leak and need
someone to review this or help getting it merge-ready.

Details:
Some time ago I made an analysis of a crash introduced in Boost 1.65.0
in the serialization part. The details are quite complicated as the
cause is static initialization/destruction order in shared libraries
which may even depend on compilers (could not confirm) or standard
libraries (dito, observed in all tested versions).
It boils down to a violation of the expected destruction order in the
case of shared libraries. I observed how 2 global instances of the same
type but from different shared libs are destroyed together although the
2nd should not yet be which causes a use-after-free from another global
instance.
The Singletons involved did have methods to detect, whether they are
active or destroyed but they were kinda optimized away in Boost 1.65
leading to a crash which makes the whole library unusable. Not
understanding the root cause of the crash lead to changing the singleton
to use dynamic allocation, but not freeing it which leads to the memory
leak that is currently in Boost.
The current state of the develop-branch changed this back to the crash
situation we had before making it unsuitable for the next release.

Another pair of eyes would be great to check the PR and get this finally
fixed.

Thanks, Alex

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Help with PR fixing a memory leak in the current version

Boost - Dev mailing list
On 6/25/18 12:35 AM, Alexander Grund via Boost wrote:

> Hi,
>
> TLDR: I would like https://github.com/boostorg/serialization/pull/105 to
> be merged for the next Boost release to fix a memory leak and need
> someone to review this or help getting it merge-ready.
>
> Details:
> Some time ago I made an analysis of a crash introduced in Boost 1.65.0
> in the serialization part. The details are quite complicated as the
> cause is static initialization/destruction order in shared libraries
> which may even depend on compilers (could not confirm) or standard
> libraries (dito, observed in all tested versions).
> It boils down to a violation of the expected destruction order in the
> case of shared libraries. I observed how 2 global instances of the same
> type but from different shared libs are destroyed together although the
> 2nd should not yet be which causes a use-after-free from another global
> instance.
> The Singletons involved did have methods to detect, whether they are
> active or destroyed but they were kinda optimized away in Boost 1.65
> leading to a crash which makes the whole library unusable. Not
> understanding the root cause of the crash lead to changing the singleton
> to use dynamic allocation, but not freeing it which leads to the memory
> leak that is currently in Boost.
> The current state of the develop-branch changed this back to the crash
> situation we had before making it unsuitable for the next release.
>
> Another pair of eyes would be great to check the PR and get this finally
> fixed.
>
> Thanks, Alex

I much appreciate your willingness to invest effort in this issue.  But
my analysis and conclusions regarding the issue are totally different.
Motivated your efforts I added new tests for the singleton and spent
time enhancing tests for dll.  One test was just to complicated for me
to make work with b2 so I suppress it.  Only one test is failing on some
platforms - test_export.  This is related to DLLs.  I added windows +
mingw and msvc++ to my machine just to try to replicate the failure in
my own environment.  I have been unable to do so.

I believe that this is due to some issue in libc++ or libstdc++ related
to some combination of circumstances related to dynamic loading and
possible visibility. Both of these areas are undefined by the standard
so it's natural that there might be ambiguities here.

Late last year I tried to address this dll/export problem by merging in
a fix.  This fix entailed dynamically allocated a data structure in the
singleton. This fixed this failing test, but unbeknownst to me, resulted
in the memory leak.  At this point there are two paths: plow forward and
make the code evermore complex to navigate around the circumstances
which make the test fail or step back and backout the fix which resulted
in the memory leak.  I've chosen the latter.

I realize you've invested a lot of effort and it's disheartening not to
have it appreciated.  Don't think this way.  It IS appreciated.  The
serialization library is still going strong only because persons such as
yourself have made these sorts of efforts.  I incorporate a significant
portion of PRs submitted.

The way forward is to add some more tests which isolate the problem. I'm
aware that you've submitted one test which fails.  It seems like its on
the right track.  But (IIRC), it dynamically loads/unloads DLLS
containing user level serialization code in the same sequence.  I think
its reasonable to insist that if a user is going to do that that such
DLLS be unloaded in reverse order.  I admit that setting up these sorts
of tests is difficult and time consuming in any build environment.  But
that's where I am.

Robert Ramey



>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Help with PR fixing a memory leak in the current version

Boost - Dev mailing list
Hi,
> I much appreciate your willingness to invest effort in this issue. 
> But my analysis and conclusions regarding the issue are totally
> different. Motivated your efforts I added new tests for the singleton
> and spent time enhancing tests for dll.  One test was just to
> complicated for me to make work with b2 so I suppress it. Only one
> test is failing on some platforms - test_export.  This is related to
> DLLs.  I added windows + mingw and msvc++ to my machine just to try to
> replicate the failure in my own environment.  I have been unable to do
> so.
I do  have (Makefile) test, which does show the behaviour described. I'm
pretty sure, if executed on another machine, it will fail too. I tried
to add it to the b2-Testsuite but failed due to having static boost in
dynamic libraries (which is the way I found where it crashes for sure).
And there is 1 testcase currently in the Boost.Serialization testsuite
that DOES fail and I'm pretty sure the issue described is the reason but
of course I cannot really confirm this as the root cause is undefined
behaviour (or ambiquities with shared libs)
> Late last year I tried to address this dll/export problem by merging
> in a fix.  This fix entailed dynamically allocated a data structure in
> the singleton. This fixed this failing test, but unbeknownst to me,
> resulted in the memory leak.
This is what I tried to describe too in the issue and MR: The leak is
simply due to no one freeing the dynamically allocated memory. This is
even deterministic and easily reproducible. I think that oversight might
be caused by you being so used to the code, that you overlook things
that might be visible to fresh eyes. This is why I would like to also
have some one else comment on the PR too.
> The way forward is to add some more tests which isolate the problem.
> I'm aware that you've submitted one test which fails.  It seems like
> its on the right track.  But (IIRC), it dynamically loads/unloads DLLS
> containing user level serialization code in the same sequence.  I
> think its reasonable to insist that if a user is going to do that that
> such DLLS be unloaded in reverse order.  I admit that setting up these
> sorts of tests is difficult and time consuming in any build
> environment.  But that's where I am.
I can provide a test for b2 (the same as I have the Makefile for) if the
boost test suite is build with `-fPIC`. This does not involve unloading
DLLs or so but simply links an executable against 2 shared libraries
which are linked against static boost. This crashes reproducibly. If
anyone can tell me how to do the fPIC stuff for Boost.Build I'm happy to
add this test to show that it fails with a crash in current code and
that my approach fixes this.

In any case, I do have the feeling that the reason for the changes done
in my PR (and the changes itself) are not fully understood. This is why
I want to encourage question to the changes itself as I think those were
not (only) meant to fix an issue in some circumstances but actually
correct the code with test cases to prove this where possible (e.g. the
is_destroyed flag). Other testcases just try to run into the problem I
described.

Best, Alex

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Alexander Grund
Interdisziplinäre Anwendungsunterstützung und Koordination (IAK)

Technische Universität Dresden
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
Chemnitzer Str. 46b, Raum 010 01062 Dresden
Tel.: +49 (351) 463-35982
E-Mail: [hidden email]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Help with PR fixing a memory leak in the current version

Boost - Dev mailing list
On 6/26/2018 3:39 AM, Alexander Grund via Boost wrote:

> Hi,
>> I much appreciate your willingness to invest effort in this issue. But
>> my analysis and conclusions regarding the issue are totally different.
>> Motivated your efforts I added new tests for the singleton and spent
>> time enhancing tests for dll.  One test was just to complicated for me
>> to make work with b2 so I suppress it. Only one test is failing on
>> some platforms - test_export.  This is related to DLLs.  I added
>> windows + mingw and msvc++ to my machine just to try to replicate the
>> failure in my own environment.  I have been unable to do so.
> I do  have (Makefile) test, which does show the behaviour described. I'm
> pretty sure, if executed on another machine, it will fail too. I tried
> to add it to the b2-Testsuite but failed due to having static boost in
> dynamic libraries (which is the way I found where it crashes for sure).
> And there is 1 testcase currently in the Boost.Serialization testsuite
> that DOES fail and I'm pretty sure the issue described is the reason but
> of course I cannot really confirm this as the root cause is undefined
> behaviour (or ambiquities with shared libs)
>> Late last year I tried to address this dll/export problem by merging
>> in a fix.  This fix entailed dynamically allocated a data structure in
>> the singleton. This fixed this failing test, but unbeknownst to me,
>> resulted in the memory leak.
> This is what I tried to describe too in the issue and MR: The leak is
> simply due to no one freeing the dynamically allocated memory. This is
> even deterministic and easily reproducible. I think that oversight might
> be caused by you being so used to the code, that you overlook things
> that might be visible to fresh eyes. This is why I would like to also
> have some one else comment on the PR too.
>> The way forward is to add some more tests which isolate the problem.
>> I'm aware that you've submitted one test which fails.  It seems like
>> its on the right track.  But (IIRC), it dynamically loads/unloads DLLS
>> containing user level serialization code in the same sequence.  I
>> think its reasonable to insist that if a user is going to do that that
>> such DLLS be unloaded in reverse order.  I admit that setting up these
>> sorts of tests is difficult and time consuming in any build
>> environment.  But that's where I am.
> I can provide a test for b2 (the same as I have the Makefile for) if the
> boost test suite is build with `-fPIC`. This does not involve unloading
> DLLs or so but simply links an executable against 2 shared libraries
> which are linked against static boost. This crashes reproducibly. If
> anyone can tell me how to do the fPIC stuff for Boost.Build I'm happy to
> add this test to show that it fails with a crash in current code and
> that my approach fixes this.
>
> In any case, I do have the feeling that the reason for the changes done
> in my PR (and the changes itself) are not fully understood. This is why
> I want to encourage question to the changes itself as I think those were
> not (only) meant to fix an issue in some circumstances but actually
> correct the code with test cases to prove this where possible (e.g. the
> is_destroyed flag). Other testcases just try to run into the problem I
> described.

I confirm that the 1.66 version never ever invokes the singleton
destructor declared and defined in boost/serialization/singleton.hpp,
approximately line 175.  It does not matter whether it is a debug or
release build, 32-bit, or 64-bit Microsoft or Intel C++ on Windows.  A
DLL wrapper around Boost.Serialization is used at run time.  The actual
Boost.Serialization library is statically linked to the DLL.  The
roughly 20 line main function, test executable then uses the DLL to
invoke the serialization.

I verify the behavior using an old school "printf" approach, with
std::cout messages and static ptrdiff_t counting variables inside the
get_instance() and ~singleton() methods.  I observe eleven independent
instances of the singleton class, with ZERO calls to the singleton
destructor.  The eleven singletons are created with ONE instance of the
serialization object, regardless of invoking the serialize method.

The main function:

int main()
{
        {
                HINSTANCE hGetProcIDDLL = LoadLibrary(L"BoostSerializeDLL.dll");
        }
        /**/
        std::cout << "Beginning unit test...";
        std::shared_ptr<thislibrary::myclass> myc = thislibrary::create_myclass();
        {
                std::shared_ptr<std::ofstream> ostream =
std::make_shared<std::ofstream>();
                ostream->open("C:\\temp\\leak_test.xml");
                {
                        boost::archive::xml_oarchive xml(*ostream);
                        myc->serialize(xml, 1); // comment this line out and the leak remains.
                }
                ostream->close();
        }
        /* */
        return 0;
}

Fundamentally, there is something between 300 to 400 bytes memory leak.
Yes, it is not much with today's world of GB RAM.  However, when you
have production code that leaks, then ANY memory leak is unacceptable.
Essentially, the leak is due to the lack of the eleven singleton
instances NOT invoking the respective destructor.  Plus, the group I am
part of spends tens of hours tracking down why the production product
has memory leaks.  The final convincing point that must be clearly
understood is this: due to the leak, Boost.Serialization has been
replaced.  If that does not convince a reader that there is a real
problem, then nothing will.

Sincerely,

Robert


>
> Best, Alex
>


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Help with PR fixing a memory leak in the current version

Boost - Dev mailing list
On 6/27/18 2:19 PM, Robert via Boost wrote:

> On 6/26/2018 3:39 AM, Alexander Grund via Boost wrote:
>> Hi,
>>> I much appreciate your willingness to invest effort in this issue.
>>> But my analysis and conclusions regarding the issue are totally
>>> different. Motivated your efforts I added new tests for the singleton
>>> and spent time enhancing tests for dll.  One test was just to
>>> complicated for me to make work with b2 so I suppress it. Only one
>>> test is failing on some platforms - test_export.  This is related to
>>> DLLs.  I added windows + mingw and msvc++ to my machine just to try
>>> to replicate the failure in my own environment.  I have been unable
>>> to do so.
>> I do  have (Makefile) test, which does show the behaviour described.
>> I'm pretty sure, if executed on another machine, it will fail too. I
>> tried to add it to the b2-Testsuite but failed due to having static
>> boost in dynamic libraries (which is the way I found where it crashes
>> for sure). And there is 1 testcase currently in the
>> Boost.Serialization testsuite that DOES fail and I'm pretty sure the
>> issue described is the reason but of course I cannot really confirm
>> this as the root cause is undefined behaviour (or ambiquities with
>> shared libs)
>>> Late last year I tried to address this dll/export problem by merging
>>> in a fix.  This fix entailed dynamically allocated a data structure
>>> in the singleton. This fixed this failing test, but unbeknownst to
>>> me, resulted in the memory leak.
>> This is what I tried to describe too in the issue and MR: The leak is
>> simply due to no one freeing the dynamically allocated memory. This is
>> even deterministic and easily reproducible. I think that oversight
>> might be caused by you being so used to the code, that you overlook
>> things that might be visible to fresh eyes. This is why I would like
>> to also have some one else comment on the PR too.
>>> The way forward is to add some more tests which isolate the problem.
>>> I'm aware that you've submitted one test which fails.  It seems like
>>> its on the right track.  But (IIRC), it dynamically loads/unloads
>>> DLLS containing user level serialization code in the same sequence.  
>>> I think its reasonable to insist that if a user is going to do that
>>> that such DLLS be unloaded in reverse order.  I admit that setting up
>>> these sorts of tests is difficult and time consuming in any build
>>> environment.  But that's where I am.
>> I can provide a test for b2 (the same as I have the Makefile for) if
>> the boost test suite is build with `-fPIC`. This does not involve
>> unloading DLLs or so but simply links an executable against 2 shared
>> libraries which are linked against static boost. This crashes
>> reproducibly. If anyone can tell me how to do the fPIC stuff for
>> Boost.Build I'm happy to add this test to show that it fails with a
>> crash in current code and that my approach fixes this.
>>
>> In any case, I do have the feeling that the reason for the changes
>> done in my PR (and the changes itself) are not fully understood. This
>> is why I want to encourage question to the changes itself as I think
>> those were not (only) meant to fix an issue in some circumstances but
>> actually correct the code with test cases to prove this where possible
>> (e.g. the is_destroyed flag). Other testcases just try to run into the
>> problem I described.
>
> I confirm that the 1.66 version never ever invokes the singleton
> destructor declared and defined in boost/serialization/singleton.hpp,
> approximately line 175.  It does not matter whether it is a debug or
> release build, 32-bit, or 64-bit Microsoft or Intel C++ on Windows.  A
> DLL wrapper around Boost.Serialization is used at run time.  The actual
> Boost.Serialization library is statically linked to the DLL.  The
> roughly 20 line main function, test executable then uses the DLL to
> invoke the serialization.
>
> I verify the behavior using an old school "printf" approach, with
> std::cout messages and static ptrdiff_t counting variables inside the
> get_instance() and ~singleton() methods.  I observe eleven independent
> instances of the singleton class, with ZERO calls to the singleton
> destructor.  The eleven singletons are created with ONE instance of the
> serialization object, regardless of invoking the serialize method.
>
> The main function:
>
> int main()
> {
>      {
>          HINSTANCE hGetProcIDDLL = LoadLibrary(L"BoostSerializeDLL.dll");
>      }
>      /**/
>      std::cout << "Beginning unit test...";
>      std::shared_ptr<thislibrary::myclass> myc =
> thislibrary::create_myclass();
>      {
>          std::shared_ptr<std::ofstream> ostream =
> std::make_shared<std::ofstream>();
>          ostream->open("C:\\temp\\leak_test.xml");
>          {
>              boost::archive::xml_oarchive xml(*ostream);
>              myc->serialize(xml, 1); // comment this line out and the
> leak remains.
>          }
>          ostream->close();
>      }
>      /* */
>      return 0;
> }
>
> Fundamentally, there is something between 300 to 400 bytes memory leak.
> Yes, it is not much with today's world of GB RAM.  However, when you
> have production code that leaks, then ANY memory leak is unacceptable.
> Essentially, the leak is due to the lack of the eleven singleton
> instances NOT invoking the respective destructor.  Plus, the group I am
> part of spends tens of hours tracking down why the production product
> has memory leaks.  The final convincing point that must be clearly
> understood is this: due to the leak, Boost.Serialization has been
> replaced.  If that does not convince a reader that there is a real
> problem, then nothing will.

I believe this has been fixed for 1.68 which can be found in the current
develop and master branch.  What are the results of your test when
applied to the latest develop or release version?

>
> Sincerely,
>
> Robert
>
>
>>
>> Best, Alex
>>
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Help with PR fixing a memory leak in the current version

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list

> Fundamentally, there is something between 300 to 400 bytes memory
> leak. Yes, it is not much with today's world of GB RAM.  However, when
> you have production code that leaks, then ANY memory leak is
> unacceptable. Essentially, the leak is due to the lack of the eleven
> singleton instances NOT invoking the respective destructor.  Plus, the
> group I am part of spends tens of hours tracking down why the
> production product has memory leaks.  The final convincing point that
> must be clearly understood is this: due to the leak,
> Boost.Serialization has been replaced.  If that does not convince a
> reader that there is a real problem, then nothing will.
Thank you Robert. I have also added another PR that does something very
similar to your manual test. It tests that singletons are actually
destructed and that the is_destructed flag is set. Note that this is an
extract from my bigger PR and just contains the tests to prove this
point and tests are failing obviously in both 1.66 and current dev
(different parts though). Maybe we can get this one in first and then
fix it with my other PR. Would you maybe add a vote/thumbs-up for the
PRs also on Github to increase awareness?

To add on this: It is not only a memory leak. The ctor is not called. If
one relies on side effects by that, it can cause all sorts of havoc.

Thanks, Alex

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
Hi all again,

I have to stress the importance of a swift action/review here as the
current master now contains the flawed commit again that leads to a
certain crash when using shared libraries linked against static boost.

Reproduce with:
git clone [hidden email]:boostorg/boost.git && cd boost && git submodule
update --recursive --init
./bootstrap.sh && ./b2 variant=release link=static
--with-libraries=serialization cxxflags=-fPIC cflags=-fPIC -j2
mkdir mytest && cd mytest
Create 3 files:

|test_multi_singleton.cpp:intf();intg();intmain(intargc,char**){// Make
sure symbols are
usedif(argc==8)returnf();if(argc==9)returng();}multi_singleton1.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intf(){return0!=boost::serialization::extended_type_info_typeid<int>::get_const_instance().get_key();}multi_singleton2.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intg(){//
Use different(!)
typereturn0!=boost::serialization::extended_type_info_typeid<float>::get_const_instance().get_key();}|

export CPATH=$PWD/..
export LIBRARY_PATH=$PWD/../stage/lib/
g++ multi_singleton1.cpp -lboost_serialization -fPIC -shared
-olibmulti_singleton1.so
g++ multi_singleton2.cpp -lboost_serialization -fPIC -shared
-olibmulti_singleton2.so
g++ test_multi_singleton.cpp -L. -lmulti_singleton1 -lmulti_singleton2
LD_LIBRARY_PATH=. ./a.out

Result: *** Error in `./a.out': double free or corruption (fasttop):
0x0000000001accc20 ***

This is due to the bug I described and fixed in
https://github.com/boostorg/serialization/pull/105.
An extract from there only showing the bug with `is_destroyed` is
https://github.com/boostorg/serialization/pull/110

In the current state I'm strongly against including the latest master
from Boost.Serialization. It would be better, to include the Boost 1.67
version of it which "only" contains a memory leak.

Regards, Alex

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 6/29/18 2:07 AM, Alexander Grund via Boost wrote:
> Hi all again,
>
> I have to stress the importance of a swift action/review here as the
> current master now contains the flawed commit again that leads to a
> certain crash when using shared libraries linked against static boost.

Right now I'm bogged down in other stuff, but I did find a little time
to investigate this some more.  I reviewed the information on the
failures if test_dll_export along with the corresponding config.info
output.  I've found some useful things.  e.g. doesn't occur on windows.
Seems to occur only on linux platforms.  I've found it occurs on both
lib++ and libstdc++ evironments.  I can't figure out from the test
results whether the tests are linking from static or shared standard
libraries.

I didn't realize that this occurs (only on?) configurations which create
shared libraries which link against static boost.  Mixing shared
libraries is a known cause of difficulties.  It many cases it seems to
work, but in many cases it also leads to unfixable failures.

I'm still looking into this.

>
> Reproduce with:
> git clone [hidden email]:boostorg/boost.git && cd boost && git submodule
> update --recursive --init
> ./bootstrap.sh && ./b2 variant=release link=static
> --with-libraries=serialization cxxflags=-fPIC cflags=-fPIC -j2
> mkdir mytest && cd mytest
> Create 3 files:
>
> |test_multi_singleton.cpp:intf();intg();intmain(intargc,char**){// Make
> sure symbols are
> usedif(argc==8)returnf();if(argc==9)returng();}multi_singleton1.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intf(){return0!=boost::serialization::extended_type_info_typeid<int>::get_const_instance().get_key();}multi_singleton2.cpp:#include<boost/serialization/extended_type_info_typeid.hpp>intg(){//
> Use different(!)
> typereturn0!=boost::serialization::extended_type_info_typeid<float>::get_const_instance().get_key();}|
>
>
> export CPATH=$PWD/..
> export LIBRARY_PATH=$PWD/../stage/lib/
> g++ multi_singleton1.cpp -lboost_serialization -fPIC -shared
> -olibmulti_singleton1.so
> g++ multi_singleton2.cpp -lboost_serialization -fPIC -shared
> -olibmulti_singleton2.so
> g++ test_multi_singleton.cpp -L. -lmulti_singleton1 -lmulti_singleton2
> LD_LIBRARY_PATH=. ./a.out
>
> Result: *** Error in `./a.out': double free or corruption (fasttop):
> 0x0000000001accc20 ***

I'll look into your test.

>
> This is due to the bug I described and fixed in
> https://github.com/boostorg/serialization/pull/105.
> An extract from there only showing the bug with `is_destroyed` is
> https://github.com/boostorg/serialization/pull/110
>
> In the current state I'm strongly against including the latest master
> from Boost.Serialization. It would be better, to include the Boost 1.67
> version of it which "only" contains a memory leak.

We differ on the cause, the fix and the work around.  Objection noted.

Robert Ramey
>
> Regards, Alex
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 30/06/2018 03:20, Robert Ramey wrote:
> I didn't realize that this occurs (only on?) configurations which create
> shared libraries which link against static boost.  Mixing shared
> libraries is a known cause of difficulties.  It many cases it seems to
> work, but in many cases it also leads to unfixable failures.

By general definition, if you have two shared libraries that internally
statically link the "same" library, they must *completely hide* that
internal usage.  Exactly zero of the internal library's symbols are
permitted to exist in the shared library's export list, and absolutely
never can any of its types appear in the shared library's own API.

This includes usage by inheritance or by convention (such as being
streamable to a serialization archive via templated operators).

Violation of that rule occurs frequently (because it's hard to enforce
and most of the time goes unnoticed) but it is still an ironclad rule if
you don't want weird and undefined behaviour.

Templates are a frequent cause of issues since they're very leaky by
nature, and highly susceptible to ODR issues.


In the context of Boost.Serialization, this means that you can pass the
serialized representations across shared library module boundaries but
you cannot pass archives or the serializable objects themselves across
those boundaries, unless both shared libraries also use
Boost.Serialization itself as a shared library.

In general the simplest thing to do is to either keep everything static
or everything shared; mixing leads to headaches unless a great deal of
care is taken.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
Mere moments ago, quoth I:
> In the context of Boost.Serialization, this means that you can pass the
> serialized representations across shared library module boundaries but
> you cannot pass archives or the serializable objects themselves across
> those boundaries, unless both shared libraries also use
> Boost.Serialization itself as a shared library.

To put this a simpler way:

If you want to be able to use Boost.Serialization as a static library
within shared libraries, then you must make absolutely certain that
there are exactly zero #includes or references of any kind to
Boost.Serialization's header files in any of the public header files of
the shared library.

If everything compiles after that, then you should be ok.  This usually
requires making use of PIMPL techniques.


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2 July 2018 at 03:48, Gavin Lambert via Boost <[hidden email]>
wrote:

> On 30/06/2018 03:20, Robert Ramey wrote:
>
>> I didn't realize that this occurs (only on?) configurations which create
>> shared libraries which link against static boost.  Mixing shared libraries
>> is a known cause of difficulties.  It many cases it seems to work, but in
>> many cases it also leads to unfixable failures.
>>
>
> By general definition, if you have two shared libraries that internally
> statically link the "same" library, they must *completely hide* that
> internal usage.  Exactly zero of the internal library's symbols are
> permitted to exist in the shared library's export list, and absolutely
> never can any of its types appear in the shared library's own API.
>
> This includes usage by inheritance or by convention (such as being
> streamable to a serialization archive via templated operators).
>
> Violation of that rule occurs frequently (because it's hard to enforce and
> most of the time goes unnoticed) but it is still an ironclad rule if you
> don't want weird and undefined behaviour.
>
> Templates are a frequent cause of issues since they're very leaky by
> nature, and highly susceptible to ODR issues.
>
>
> In the context of Boost.Serialization, this means that you can pass the
> serialized representations across shared library module boundaries but you
> cannot pass archives or the serializable objects themselves across those
> boundaries, unless both shared libraries also use Boost.Serialization
> itself as a shared library.
>
> In general the simplest thing to do is to either keep everything static or
> everything shared; mixing leads to headaches unless a great deal of care is
> taken.
>

This is very useful information, in any context, be it Boost or not.

degski
--
*"If something cannot go on forever, it will stop" - Herbert Stein*

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
> To put this a simpler way:
>
> If you want to be able to use Boost.Serialization as a static library
> within shared libraries, then you must make absolutely certain that
> there are exactly zero #includes or references of any kind to
> Boost.Serialization's header files in any of the public header files
> of the shared library.
>
> If everything compiles after that, then you should be ok.  This
> usually requires making use of PIMPL techniques.
Please have a look at my test: The interfaces of the shared libraries do
NOT contain anything boost related. Only very simple C functions that
are called to make sure, the singletons within Boost are instantiated.
So your preconditions are met, yet the failure occurs.

Possible use case: You use Boost.Serialization in your library and use
another shared library which uses Boost.Serialization itself, but
completely independently. Nothing about it in the interface, but it
would still crash.

 >>This is due to the bug I described and fixed in
https://github.com/boostorg/serialization/pull/105.
 >>An extract from there only showing the bug with `is_destroyed` is
https://github.com/boostorg/serialization/pull/110
 >>
 >>In the current state I'm strongly against including the latest master
from Boost.Serialization. It would be better, to include the Boost 1.67
version of it which "only" contains a memory leak.
 >
 >We differ on the cause, the fix and the work around.  Objection noted.

Ok let me put this differently and lets see how far we can agree:

1. Destruction order may or may not be as intended (For single binaries
destruction order is inverse of construction, but my test shows, that
this does not hold for this shared library of static library mix)
2. Singletons of Boost.Serialization have a method `is_destroyed` which
returns whether one can still access the singleton or not
3. `!is_destroyed` is asserted by the destructors of singletons
referring to other singletons
4. 3. will fail if 1. does not hold, the program will then crash (or UB)
5. if 1. is true, the crash can be avoided by checking `is_destroyed`
before accessing the singleton (not doing so if destroyed)
6. `is_destroyed` is broken, hence the assertion is unreliable and we
get crashes where we would first get assertion failures in debug builds

Hence the steps to fix this:
First and foremost: Test and fix `is_destroyed`
Second: implement 5.

While 1. is an assumption that is disputable I think the rest is not.
Correct me here, if I'm wrong. Hence
https://github.com/boostorg/serialization/pull/110 which adds the test
for `is_destroyed`.
Even if there is another explanation for 1. what is wrong with
implementing 5.? It will always be safe and code like
`if(!foo.is_destroyed()) foo.use()` is quite idiomatic C (`if(foo)
foo->use()`)

Alex Grund

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 7/2/2018 2:22 AM, Alexander Grund via Boost wrote:

>> To put this a simpler way:
>>
>> If you want to be able to use Boost.Serialization as a static library
>> within shared libraries, then you must make absolutely certain that
>> there are exactly zero #includes or references of any kind to
>> Boost.Serialization's header files in any of the public header files
>> of the shared library.
>>
>> If everything compiles after that, then you should be ok.  This
>> usually requires making use of PIMPL techniques.
> Please have a look at my test: The interfaces of the shared libraries do
> NOT contain anything boost related. Only very simple C functions that
> are called to make sure, the singletons within Boost are instantiated.
> So your preconditions are met, yet the failure occurs.
>
> Possible use case: You use Boost.Serialization in your library and use
> another shared library which uses Boost.Serialization itself, but
> completely independently. Nothing about it in the interface, but it
> would still crash.

Although crashing is not observed with the slightly older Boost releases
(e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating
Boost.Serialization as Gavin describes fails, with the already described
lack of destructor invocations and memory leak.

>
>  >>This is due to the bug I described and fixed in
> https://github.com/boostorg/serialization/pull/105.
>  >>An extract from there only showing the bug with `is_destroyed` is
> https://github.com/boostorg/serialization/pull/110
>  >>
>  >>In the current state I'm strongly against including the latest master
> from Boost.Serialization. It would be better, to include the Boost 1.67
> version of it which "only" contains a memory leak.
>  >
>  >We differ on the cause, the fix and the work around.  Objection noted.
>
> Ok let me put this differently and lets see how far we can agree:
>
> 1. Destruction order may or may not be as intended (For single binaries
> destruction order is inverse of construction, but my test shows, that
> this does not hold for this shared library of static library mix)
> 2. Singletons of Boost.Serialization have a method `is_destroyed` which
> returns whether one can still access the singleton or not
> 3. `!is_destroyed` is asserted by the destructors of singletons
> referring to other singletons
> 4. 3. will fail if 1. does not hold, the program will then crash (or UB)
> 5. if 1. is true, the crash can be avoided by checking `is_destroyed`
> before accessing the singleton (not doing so if destroyed)
> 6. `is_destroyed` is broken, hence the assertion is unreliable and we
> get crashes where we would first get assertion failures in debug builds
>
> Hence the steps to fix this:
> First and foremost: Test and fix `is_destroyed`
> Second: implement 5.
>
> While 1. is an assumption that is disputable I think the rest is not.
> Correct me here, if I'm wrong. Hence
> https://github.com/boostorg/serialization/pull/110 which adds the test
> for `is_destroyed`.
> Even if there is another explanation for 1. what is wrong with
> implementing 5.? It will always be safe and code like
> `if(!foo.is_destroyed()) foo.use()` is quite idiomatic C (`if(foo)
> foo->use()`)
>

I completely agree with your approach.  I also agree that having
crashing and/or undefined behavior (UB) is not acceptable for any
release, let alone the upcoming 1.68.

--Robert


> Alex Grund
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list

> Although crashing is not observed with the slightly older Boost
> releases (e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating
> Boost.Serialization as Gavin describes fails, with the already
> described lack of destructor invocations and memory leak.
Did you test the code I sent around with static boost and those
versions? I expect a crash in 1.68 (if the commit is not reverted) and
in 1.65.x, so if you don't experience one I'd be interested in details
(OS, stdlib, compiler etc.). Also: 1.64 should be fine and 1.65 should
not leak, but crash. 1.66/1.67 have the leak.

Alex

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 7/2/18 6:40 AM, Alexander Grund via Boost wrote:
>
>> Although crashing is not observed with the slightly older Boost
>> releases (e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating
>> Boost.Serialization as Gavin describes fails, with the already
>> described lack of destructor invocations and memory leak.

> Did you test the code I sent around with static boost and those
> versions? I expect a crash in 1.68 (if the commit is not reverted) and
> in 1.65.x, so if you don't experience one I'd be interested in details
> (OS, stdlib, compiler etc.). Also: 1.64 should be fine and 1.65 should
> not leak, but crash. 1.66/1.67 have the leak.

Here's what I did.  I looked at the serialization test matrix:

https://www.boost.org/development/tests/develop/developer/serialization.html

The interesting test is test_dll_exported.  I compared this table
against the config.info:

https://www.boost.org/development/tests/develop/developer/config.html

And I also looked at the test command line.

test results to see if I could find a commonality between the test
failures and the configuration being tested.  I found that:

a) Test failed only on linux platforms.  But the test doesn't fail on
all linux platforms.
b) Test never fails on windows platforms.
c) tests would fail with both clang and gcc compilers
d) Test failure seems pretty reliable.  Exception: the one test which
passes uses gcc 7.3
e) I can't discern what the build switch settings are from the python
command line.  That is I would like to know if it was built with
link=shared/static and runtime-link=shared/static.  I believe that this
information would be significant.
f) a couple of ARM tests pass - one fails.

Much has been made of a memory leak.  The current version 1.68 on
develop and master branches removed the dynamic allocation in the
singleton module.  So if there is a memory leak it would be more subtle
than first meets the eye.  I believe this memory leak was introduced in
efforts to make this test pass for 1.67 (or maybe 1.66 I don't remember).

I believe the correct way to approach this is to analyze your new test
and also test_dll_export in light of Gavin Lambert's comments above.  If
this works out, I would then add this information to the serialization
library documentation.

Unfortunately, I'm bogged down in other stuff so I can't address it
instantaneously, but it I do consider it important to investigate further.

Robert Ramey





>
> Alex
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 7/2/2018 8:40 AM, Alexander Grund via Boost wrote:
>
>> Although crashing is not observed with the slightly older Boost
>> releases (e.g. 1.66, 1.65.1, 1.64, etc.), intentionally isolating
>> Boost.Serialization as Gavin describes fails, with the already
>> described lack of destructor invocations and memory leak.
> Did you test the code I sent around with static boost and those
> versions?

No, I have not tested the 1.68 revision.  If my day goes well enough
with my primary tasks, I will build 1.68 early this evening or possibly
July 5th.  This will be strictly on Windows with both Microsoft 15.6.4
and Intel C++ 18.0, Update 3.

--Robert

I expect a crash in 1.68 (if the commit is not reverted) and

> in 1.65.x, so if you don't experience one I'd be interested in details
> (OS, stdlib, compiler etc.). Also: 1.64 should be fine and 1.65 should
> not leak, but crash. 1.66/1.67 have the leak.
>
> Alex
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list

> test results to see if I could find a commonality between the test
> failures and the configuration being tested.  I found that:
>
> a) Test failed only on linux platforms.  But the test doesn't fail on
> all linux platforms.
> b) Test never fails on windows platforms.
> c) tests would fail with both clang and gcc compilers
> d) Test failure seems pretty reliable.  Exception: the one test which
> passes uses gcc 7.3
> e) I can't discern what the build switch settings are from the python
> command line.  That is I would like to know if it was built with
> link=shared/static and runtime-link=shared/static.  I believe that
> this information would be significant.
> f) a couple of ARM tests pass - one fails.
Great! I found at least for
https://www.boost.org/development/tests/develop/developer/output/BI1-Debian-Stretch-boost-bin-v2-libs-serialization-test-test_dll_exported-test-gcc-4-9-4-debug.html 
that boost was build with link=shared. I don't think runtime-link is
tested? At least I'd expect a reference to that in the link command. But
I don't think it matters much, if the test I sent is considered formally
valid (see below)
> Much has been made of a memory leak.  The current version 1.68 on
> develop and master branches removed the dynamic allocation in the
> singleton module.  So if there is a memory leak it would be more
> subtle than first meets the eye.  I believe this memory leak was
> introduced in efforts to make this test pass for 1.67 (or maybe 1.66 I
> don't remember).
FYI: The leak was introduced in 1.66, the crash in 1.65.0 and again in
current master.
> I believe the correct way to approach this is to analyze your new test
> and also test_dll_export in light of Gavin Lambert's comments above. 
> If this works out, I would then add this information to the
> serialization library documentation.
To make it simple I'd suggest you concentrate on my new test. I'm sure
it meets Gavin Lambert's requirements of not leaking Boost-"internals"
outside of the shared libs used and the crash is easy to reproduce (at
least on my side`on every linux machine). If you can reproduce this even
on a single machine on your side, then this is all argument it needs
that there is something in need of fixing.

But even without this: Isn't it enough, that `is_destroyed` is broken,
to investigate in my fix? Fixing that function is the major part of the
PR. The rest is the simple check-before-use stuff.

Alex

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 2 July 2018 at 18:39, Robert via Boost <[hidden email]> wrote:

> This will be strictly on Windows with both Microsoft 15.6.4 ...


With the current level of flux in VC development, you really should
consider upgrading to 15.7.4 (unless the 6 is a typo and should read 7), or
better, move to 15.8 Preview 3 (that's the advice STL gave me in another
issue).

Have a good day,

degski
--
*"If something cannot go on forever, it will stop" - Herbert Stein*

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 7/2/18 8:46 AM, Alexander Grund via Boost wrote:
> But even without this: Isn't it enough, that `is_destroyed` is broken,
> to investigate in my fix? Fixing that function is the major part of the
> PR. The rest is the simple check-before-use stuff

I will also investigate "is_destroyed"

I agree that everything has to come up clean and correct. I'm shooting
for "no loose ends".  But the issues are complicated for, among other
things:

a) DLL/Shared libraries are not defined by the standard.
b) different vendors have different keywords/requirements
c) visibility is not defined by the standard either - but it's almost
essential.

We're navigating a mine field here.

Robert Ramey


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 7/2/18 8:53 AM, degski via Boost wrote:
> On 2 July 2018 at 18:39, Robert via Boost <[hidden email]> wrote:
>
>> This will be strictly on Windows with both Microsoft 15.6.4 ...
>
>
> With the current level of flux in VC development, you really should
> consider upgrading to 15.7.4 (unless the 6 is a typo and should read 7), or
> better, move to 15.8 Preview 3 (that's the advice STL gave me in another
> issue).

But many/most users don't have the latest MSVC version - and the library
has to work on at least the most recent ones.

>
> Have a good day,
>
> degski
>



_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
12