[Boost.Serialization] Help with PR fixing a memory leak in the current version

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 2 July 2018 at 19:22, Robert Ramey via Boost <[hidden email]>
wrote:

> But many/most users don't have the latest MSVC version


Given how easy the upgrade process has become, I suspect, you're wrong
here. Also, users (not corporate ones, but they are not going to adopt
Boost 1.68 either at release time anyway, so that's irrelevant) are looking
forward now to the (C++17) stuff that will work (and doesn't yet work with
their current version, a minor version difference can and will make all the
difference).


> - and the library has to work on at least the most recent ones.
>

Today's Preview is tomorrow's Stable, and will be outdated 2 weeks later,
that's how things are if you really "wonna make some progress" (you should
see the number of bugs and fixes on vcpkg, it's impressive), it's a wild
ride, but it's fun. As you are a dev of an important library, your stable
is today's Preview.

degski

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 7/2/2018 10:53 AM, degski via Boost wrote:
> On 2 July 2018 at 18:39, Robert via Boost <[hidden email]> wrote:
>
>> This will be strictly on Windows with both Microsoft 15.6.4 ...
>
>
> With the current level of flux in VC development, you really should
> consider upgrading to 15.7.4 (unless the 6 is a typo and should read 7), or
> better, move to 15.8 Preview 3 (that's the advice STL gave me in another
> issue).

I would love to, but am intentionally delaying for work reasons.  Some
of the team have updated to 15.7.4.  Because of my using the older
Microsoft version, it takes only a few minutes to track down the root
cause of a recent issue.

This may not be known to news group members here.  But, the Windows
Intel C++ compiler is heavily dependent on the Microsoft supplied
standard library header files (e.g. from 2015:
https://software.intel.com/en-us/forums/intel-c-compiler/topic/596318).
Similarly, in a Mac oriented post, there is a dependency on the gcc
header files (e.g. from 2013:
https://software.intel.com/en-us/forums/intel-c-compiler/topic/366102).

I have already submitted an issue with Intel that the Visual Studio
15.7.4 headers cause Intel 18.0, Update 3 to produce thousands of
spurious errors on the std::variant.  The 15.6.4 std::variant version
does not cause the invalid error output from Intel.  Plus, only the 19.0
Intel C++ is planned to address the problem(s).

--Robert

>
> Have a good day,
>
> degski
>


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 2 July 2018 at 19:54, Robert via Boost <[hidden email]> wrote:

> Some of the team have updated to 15.7.4.  Because of my using the older
> Microsoft version ...
>

You can install the preview side-by-side with the stable version, you'd
have BOBW (another advice from STL).

This may not be known to news group members here.  But, the Windows Intel
> C++ compiler is heavily dependent on the Microsoft supplied standard
> library header files (e.g. from 2015: https://software.intel.com/en-
> us/forums/intel-c-compiler/topic/596318).


You said it, it's a compiler, not an implementation of a STL.


> Similarly, in a Mac oriented post, there is a dependency on the gcc header
> files (e.g. from 2013: https://software.intel.com/en-
> us/forums/intel-c-compiler/topic/366102).
>

Luckily I know nothing about Mac (in the same category as fuckbook.com,
twitter (I don't own a phone) and smartphones in general (fat fingers, but
my wife keeps asking me to debug her bloody Android phone, shit)).


> I have already submitted an issue with Intel that the Visual Studio 15.7.4
> headers cause Intel 18.0, Update 3 to produce thousands of spurious errors
> on the std::variant.
>

Yeah, I know from experience that they are very slow in reacting (if they
react at all, which is usually not the case). F.e. the way they detect the
compiler in tbb/mkl does not work with Clang/LLVM on windows, because they
do the detection (if it should work) in the wrong order (they should check
for VC first and Clang later, can't go wrong on linux/Mac), seems pretty
simple, I filed it, but it is still as it used to be.

degski
--
*"If something cannot go on forever, it will stop" - Herbert Stein*

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 7/2/2018 12:10 PM, degski via Boost wrote:
> On 2 July 2018 at 19:54, Robert via Boost <[hidden email]> wrote:
>
>> Some of the team have updated to 15.7.4.  Because of my using the older
>> Microsoft version ...
>>
>
> You can install the preview side-by-side with the stable version, you'd
> have BOBW (another advice from STL).

Yeah, I am not too excited about that on my primary desktop.  I might
install the preview on a VM though.  Presently, the team continues to
find newly released "features" that break stuff.  Hence, my now four
decades (and growing) distrust...

>
> This may not be known to news group members here.  But, the Windows Intel
>> C++ compiler is heavily dependent on the Microsoft supplied standard
>> library header files (e.g. from 2015: https://software.intel.com/en-
>> us/forums/intel-c-compiler/topic/596318).
>
>
> You said it, it's a compiler, not an implementation of a STL.
>
>
>> Similarly, in a Mac oriented post, there is a dependency on the gcc header
>> files (e.g. from 2013: https://software.intel.com/en-
>> us/forums/intel-c-compiler/topic/366102).
>>
>
> Luckily I know nothing about Mac (in the same category as fuckbook.com,
> twitter (I don't own a phone) and smartphones in general (fat fingers, but
> my wife keeps asking me to debug her bloody Android phone, shit)).
>
>
>> I have already submitted an issue with Intel that the Visual Studio 15.7.4
>> headers cause Intel 18.0, Update 3 to produce thousands of spurious errors
>> on the std::variant.
>>
>
> Yeah, I know from experience that they are very slow in reacting (if they
> react at all, which is usually not the case). F.e. the way they detect the
> compiler in tbb/mkl does not work with Clang/LLVM on windows, because they
> do the detection (if it should work) in the wrong order (they should check
> for VC first and Clang later, can't go wrong on linux/Mac), seems pretty
> simple, I filed it, but it is still as it used to be.

I have them fixing a couple other items too.  As far as speedily
resolving something, that all depends on what the Intel group determines
is important.  Tea leaf interpretation or flipping a coin 50 times, is
about the same prediction accuracy.  :)

--Robert

>
> degski
>


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 2 July 2018 at 22:03, Robert via Boost <[hidden email]> wrote:

> Yeah, I am not too excited about that on my primary desktop.  I might
> install the preview on a VM though.  Presently, the team continues to find
> newly released "features" that break stuff.  Hence, my now four decades
> (and growing) distrust...
>

 You are not wrong here.

... Tea leaf interpretation or flipping a coin 50 times, is about the same
> prediction accuracy.  :)
>

LOL

degski
--
*"If something cannot go on forever, it will stop" - Herbert Stein*

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On Mon, Jul 2, 2018 at 9:03 PM, Robert via Boost <[hidden email]> wrote:
> Yeah, I am not too excited about that on my primary desktop.  I might
> install the preview on a VM though.  Presently, the team continues to find
> newly released "features" that break stuff.  Hence, my now four decades (and
> growing) distrust...

In VC or in their code?
Either way, the sooner it's found the sooner it can be fixed.


--
Olaf

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 7/3/2018 3:14 AM, Olaf van der Spek via Boost wrote:
> On Mon, Jul 2, 2018 at 9:03 PM, Robert via Boost <[hidden email]> wrote:
>> Yeah, I am not too excited about that on my primary desktop.  I might
>> install the preview on a VM though.  Presently, the team continues to find
>> newly released "features" that break stuff.  Hence, my now four decades (and
>> growing) distrust...
>
> In VC or in their code?
Windows SDK, ATL, COM, C++ compiler conformance, standard library header
files, ODBC driver to SQL Server, etc.

> Either way, the sooner it's found the sooner it can be fixed.
>
>
In the group I am part of, as soon as the root cause is determined, an
issue is filed.  How soon the Visual Studio C++ team responds with a
resolution, to something that previously is not broken, is non
deterministic, however.  The tea leaf and coin flipping prediction
statement applies here too.  :)

--Robert


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
Hi all,

with the release of 1.68 beta I want to push this again. Have you tried
the example I sent around? If so, did it crash on (probably) every linux
system too?
For me it did which makes me consider this as a showstopper and I would
like to propose again to simply revert the last change to the version
with the memory leak which is way less serious than a crash. Of course
merging my fix would be preferred as it fixes all problems but it seems
that the review of that might take to long for 1.68.

Regards, Alex

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 7/11/2018 2:11 AM, Alexander Grund via Boost wrote:

> Hi all,
>
> with the release of 1.68 beta I want to push this again. Have you tried
> the example I sent around? If so, did it crash on (probably) every linux
> system too?
> For me it did which makes me consider this as a showstopper and I would
> like to propose again to simply revert the last change to the version
> with the memory leak which is way less serious than a crash. Of course
> merging my fix would be preferred as it fixes all problems but it seems
> that the review of that might take to long for 1.68.

I agree with you.  IMHO, the PR should be merged.  At a minimum, the
1.68 release notes should list where to find the PR.  Then, the users
decide whether to use it.  I am only one concerned professional user
though.  I am sure others will voice their thoughts too.

Sincerely,

Robert

>
> Regards, Alex
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
On 7/11/18 12:11 AM, Alexander Grund via Boost wrote:
 > Hi all,
 >
 > with the release of 1.68 beta I want to push this again. Have you tried
 > the example I sent around?
Sorry, I haven't had time to investigate this.

 > For me it did which makes me consider this as a showstopper and I would
 > like to propose again to simply revert the last change to the version
 > with the memory leak which is way less serious than a crash. Of course
 > merging my fix would be preferred as it fixes all problems but it seems
 > that the review of that might take to long for 1.68.
The problem is that I don't see any connection between your proposed
change and the failed test.  I'm not disputing that your change makes a
problematic symptom go away.  But I haven't been able to invest enough
time to really understand why that might be so.

The whole issue of dynamically loaded DLLS/shared libraries has been a
festering sore for at least 20 years now.  It's undefined by the
standard and the committee has not seen fit to address it. Sorry it just
takes time to get this right.

Robert Ramey


 >
 > Regards, Alex
 >
 > _______________________________________________
 > Unsubscribe & other changes:
 > http://lists.boost.org/mailman/listinfo.cgi/boost
 >


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list

> The problem is that I don't see any connection between your proposed
> change and the failed test.  I'm not disputing that your change makes
> a problematic symptom go away.  But I haven't been able to invest
> enough time to really understand why that might be so.
Please do not focus on that currently checked in test (test_dll_load or
what it was) because that one is to complex to reason about.
Please check the very basic singleton interface tests I even added as a
separate PR (https://github.com/boostorg/serialization/pull/110) which
clearly shows, that the current implementation is flawed.
Additionally I have (empiric) evidence that destruction order is
unreliable (see the test I sent around in this ML).
So what my proposed change is, is simply:
      a) fix the bug in the implementation shown by #110 (should be
indisputably the right thing to do) and
      b) check if the singleton accessed by a singleton-destructor is
still alive (equivalent of `if(foo) foo->bar()`)
I don't see how this fixes a symptom only. It tackles the root causes: A
defect in the code and a use-after-free caused by the unexpected
destruction order (although the cause for that is unclear and only know
to be related to shared libraries).

I do acknowledge your lack of time and because the fix is so
logical/straight forward I asked for another reviewer to relieve you
from that or help out.
> The whole issue of dynamically loaded DLLS/shared libraries has been a
> festering sore for at least 20 years now.  It's undefined by the
> standard and the committee has not seen fit to address it. Sorry it
> just takes time to get this right.
I also do understand this. But I have proof that your latest change
causes a (possible) crash instead of a (certain) memory leak and hence
ask you to revert that change for 1.68 or the library will be unusable
for some users. Once there is more time, the issue can be fully resolved.

Alex Grund

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
On 7/11/18 9:32 AM, Alexander Grund via Boost wrote:
>
>> The problem is that I don't see any connection between your proposed
>> change and the failed test.  I'm not disputing that your change makes
>> a problematic symptom go away.  But I haven't been able to invest
>> enough time to really understand why that might be so.
> Please do not focus on that currently checked in test (test_dll_load or
> what it was) because that one is to complex to reason about.

it's test_dll_export.  It's the simplest test I could figure out.
There's another one that tests dynamic loading of DLL test_dll_plugin.
I don't run that test as I would have to dive a lot more into bjam than
I can aford to do.

> Please check the very basic singleton interface tests I even added as a
> separate PR (https://github.com/boostorg/serialization/pull/110) which
> clearly shows, that the current implementation is flawed.

I think I did that once.  On this basis I added a test_singleton to the
test suite. I'm pretty
https://github.com/boostorg/serialization/blob/master/test/test_singleton.cpp 
. I does check that all singletons are destroyed in the reverse order.
Note: this test doesn't appear in any of the "teeks99" tests.  I have no
idea why that might be.

Also note that the current singleton implementation does no allocation
from the heap.  If there is a memory leak - it's not obvious why
singleton might be causing it.

OK - i've looked again at your pr.  It replaces my test with your own.
Rather than doing that, it's been my practice to add new tests.  This
helps prevent turning into a process of whack-a-mole.  I'll add these
two new tests and see if I can re-produce the results on my own machine.

> Additionally I have (empiric) evidence that destruction order is
> unreliable (see the test I sent around in this ML).
I looked as one or two of your tests and as a result updated the test
suite as above. So I think things are covered.  I'll double check though.

> So what my proposed change is, is simply:
>       a) fix the bug in the implementation shown by #110 (should be
> indisputably the right thing to do) and

LOL - sorry it's not indiputable to me.

>       b) check if the singleton accessed by a singleton-destructor is
> still alive (equivalent of `if(foo) foo->bar()`)

I will check this.

> I don't see how this fixes a symptom only. It tackles the root causes: A
> defect in the code and a use-after-free caused by the unexpected
> destruction order

I do not believe there is any freeing going on here.

(although the cause for that is unclear and only know
> to be related to shared libraries).

Right.  I do not think its possible to avoid problems if libs are not
unloaded in the reverse order that they are loaded. I believe that this
will be something that has to be enforced at the application level.  It
would be nice if it were possible to detect this from the library, but
without any standard defined behavior, this is going to be difficult.

> I do acknowledge your lack of time and because the fix is so
> logical/straight forward I asked for another reviewer to relieve you
> from that or help out.

I don't always agree with the PRs and the analysis of other
participants.  I try to give all such suggestions the attention they
deserve and recognize the hard work that they entail - even if I decline
to include them.  On the other hand. Most of the suggestions do
eventually get incorporated - perhaps after some back and forth to
refine them, add/update tests, etc.  It is only because of efforts by
persons such as yourself, that I have been able to maintain the library
after all these years.  And it only because of these efforts that I keep
motivated to keep it as perfect as possible. You make me a better man.

>> The whole issue of dynamically loaded DLLS/shared libraries has been a
>> festering sore for at least 20 years now.  It's undefined by the
>> standard and the committee has not seen fit to address it. Sorry it
>> just takes time to get this right.

> I also do understand this. But I have proof that your latest change
> causes a (possible) crash instead of a (certain) memory leak and hence
> ask you to revert that change for 1.68 or the library will be unusable
> for some users. Once there is more time, the issue can be fully resolved.

This is the crux.  I do not this can be resolved definitively without a
real understanding of the source of the problem.

>
> Alex Grund
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list

> it's test_dll_export.  It's the simplest test I could figure out.
> There's another one that tests dynamic loading of DLL test_dll_plugin.
> I don't run that test as I would have to dive a lot more into bjam
> than I can aford to do.
Did you test what I sent around in this ML? As already explained I could
not get it into bjam either, but it does show the crash(!) on all linux
systems without further fiddling highlighting the need of an action.
> Also note that the current singleton implementation does no allocation
> from the heap.  If there is a memory leak - it's not obvious why
> singleton might be causing it.
The current implementation does NOT have a leak but a CRASH. Thats why I
want you to reverse that as the former is less severe (see above).
> OK - i've looked again at your pr.  It replaces my test with your own.
> Rather than doing that, it's been my practice to add new tests.  This
> helps prevent turning into a process of whack-a-mole.  I'll add these
> two new tests and see if I can re-produce the results on my own machine.
My test was developed before yours, so I had a merge conflict. Looking
at your test I found that it tests the wrong thing: Instead of testing
that the interface of singleton is correct, it tests the
construction/destruction order. That is guaranteed by the compiler,
hence you actually test the compiler. As that order only changes in the
context of shared libraries I did not see any added value of the test,
hence the complete overwrite (yeah, sorry, but was the easiest and as
explained a test of the interface is better than a test of the compiler)
>> Additionally I have (empiric) evidence that destruction order is
>> unreliable (see the test I sent around in this ML).
> I looked as one or two of your tests and as a result updated the test
> suite as above. So I think things are covered.  I'll double check though.
If you had run the test I sent around you'd see that this is not
covered. Destruction order is guaranteed except in the case of shared
libraries which is hard to get into bjam (well actually not, but it
requires building with fPIC. If that's possible/allowed, I'll add a test
to bjam that will crash on travis although it is very valid.
>> So what my proposed change is, is simply:
>>       a) fix the bug in the implementation shown by #110 (should be
>> indisputably the right thing to do) and
>
> LOL - sorry it's not indiputable to me.
Can you explain that a bit more? #110 shows that `is_destroyed` does
return a wrong value. Why would it be not right to fix that?
>> I don't see how this fixes a symptom only. It tackles the root
>> causes: A defect in the code and a use-after-free caused by the
>> unexpected destruction order
>
> I do not believe there is any freeing going on here.
An object is used after it is destroyed. That object has a map which is
freed on destruction but accessed afterwards. Try valgrind on the code I
sent around and it will show you the use-after-free.
> Right.  I do not think its possible to avoid problems if libs are not
> unloaded in the reverse order that they are loaded. I believe that
> this will be something that has to be enforced at the application
> level.  It would be nice if it were possible to detect this from the
> library, but without any standard defined behavior, this is going to
> be difficult.
How can this be enforced if there is no (manual) dynamic loading going
on? You simply link 2 shared libraries and the application crashes on
exit. Nothing a user can do about that.
>> I also do understand this. But I have proof that your latest change
>> causes a (possible) crash instead of a (certain) memory leak and
>> hence ask you to revert that change for 1.68 or the library will be
>> unusable for some users. Once there is more time, the issue can be
>> fully resolved.
>
> This is the crux.  I do not this can be resolved definitively without
> a real understanding of the source of the problem.
Well the source is indeterminate destruction order. You could try to
find the source for that or take it as given and handle it properly
which is easily possible.

If you have some time we can discuss this in IRC or another platform
directly. But please test what I sent around first. Otherwise we are
running around in circles as you don't seem to believe me that there is
a crash in a valid use case without anything a user can do.

Alex Grund


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Reply | Threaded
Open this post in threaded view
|

Re: [Boost.Serialization] Crash in current master (1.68)

Boost - Dev mailing list
In reply to this post by Boost - Dev mailing list
FWIW I opened another PR to show the crash also on travis:
https://github.com/boostorg/serialization/pull/111. See the (expected as
described) failures with LINK=static on Linux.
I hope #111 is enough to show, that the latest change needs to be
reverted for 1.68.

#110 together with #111 should show clearly, where the implementation is
broken.

Alex Grund

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
12