[crystax.net] Large regression files..

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[crystax.net] Large regression files..

Rene Rivera-2
As the thread "Tests for 'develop' not showing" concluded.. The multi-GiB XML files from the CrystaX.NET runners are causing result processing to fail because of the large memory requirement to parse those large files. Please find the reason why those files are large. And correct it so that they aren't so large. Otherwise I'll be forced to remove them from the result tables.

Rene.

--
-- Rene Rivera
-- Grafik - Don't Assume Anything
-- Robot Dreams - http://robot-dreams.net
-- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail

_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing
Reply | Threaded
Open this post in threaded view
|

Re: [crystax.net] Large regression files..

Dmitry Moskalchuk
On 30/09/15 21:21, Rene Rivera wrote:
> As the thread "Tests for 'develop' not showing" concluded.. The
> multi-GiB XML files from the CrystaX.NET runners are causing result
> processing to fail because of the large memory requirement to parse
> those large files. Please find the reason why those files are large.
> And correct it so that they aren't so large. Otherwise I'll be forced
> to remove them from the result tables.

Well, XML results was always large, as I remember. Maybe not 3 GiB, but
1 or 2 GiB - I've seen that multiple times. These XML files are produced
by process_jam_log utility, no extra steps. I can't inspect them by eyes
thoroughly, of course, but what I see is completely correct - there are
just results of tests, nothing more. BTW, bjam.log for such big XML
reports are big too - specifically for CrystaX.NET-apilevel-19-armeabi
runner, bjam.log is 3 GiB too:

$ du -ms bjam.log
3181    bjam.log

$ du -ms CrystaX.NET-apilevel-19-armeabi.*
3127    CrystaX.NET-apilevel-19-armeabi.xml
120    CrystaX.NET-apilevel-19-armeabi.zip

However, it don't cause any problems on our server, where we generate
reports too: https://boost.crystax.net/develop/developer/summary.html.
As you can see, CrystaX.NET-apilevel-19-armeabi runner is present there,
so that big XML file was correctly parsed and corresponding HTML was
generated.

If you think such big size is wrong, please help me figure out why. I
don't do anything extra with those files except processing bjam.log with
process_jam_log utility. I can provide bjam.log if needed. I'll try to
look on that too, but really, as of now I don't see anything wrong there
- if results data are really big, XML will be big too.

BTW, I'm wondering why HTML generation code depends on XML size? Does it
read whole XML file in memory before parsing it?

The only I can suggest right now is to split such runners by toolsets -
i.e. CrystaX.NET-apilevel-19-armeabi-gcc-4.9 and
CrystaX.NET-apilevel-19-armeabi-gcc-5 instead of common
CrystaX.NET-apilevel-19-armeabi. This, obviously, will reduce size of
XML reports, but it still will be big - 1.5 GiB each in this specific
case. This is not what I'd like to do though, since it looks ugly and
don't fix main problem anyway, just workaround it.


--
Dmitry Moskalchuk



_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing

signature.asc (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [crystax.net] Large regression files..

Rene Rivera-2
On Wed, Sep 30, 2015 at 2:05 PM, Dmitry Moskalchuk <[hidden email]> wrote:
On 30/09/15 21:21, Rene Rivera wrote:
> As the thread "Tests for 'develop' not showing" concluded.. The
> multi-GiB XML files from the CrystaX.NET runners are causing result
> processing to fail because of the large memory requirement to parse
> those large files. Please find the reason why those files are large.
> And correct it so that they aren't so large. Otherwise I'll be forced
> to remove them from the result tables.

Well, XML results was always large, as I remember. Maybe not 3 GiB, but
1 or 2 GiB - I've seen that multiple times. These XML files are produced
by process_jam_log utility, no extra steps. I can't inspect them by eyes
thoroughly, of course, but what I see is completely correct - there are
just results of tests, nothing more. BTW, bjam.log for such big XML
reports are big too - specifically for CrystaX.NET-apilevel-19-armeabi
runner, bjam.log is 3 GiB too:

$ du -ms bjam.log
3181    bjam.log

Which leads me to suspect that the 64K output limit is not being eonforced in your case for some reason. Could you:

a) Inspect the b2 invocation to see if the "-m64" option is getting added?
b) Inspect some of the b2 output log (or resulting processed capture files) to see if there are >64K command output?

BTW, I'm wondering why HTML generation code depends on XML size? Does it
read whole XML file in memory before parsing it?

Yes, it seems the XML library the report generator uses just reads the entire XML into memory. 

The only I can suggest right now is to split such runners by toolsets -
i.e. CrystaX.NET-apilevel-19-armeabi-gcc-4.9 and
CrystaX.NET-apilevel-19-armeabi-gcc-5 instead of common
CrystaX.NET-apilevel-19-armeabi. This, obviously, will reduce size of
XML reports, but it still will be big - 1.5 GiB each in this specific
case. This is not what I'd like to do though, since it looks ugly and
don't fix main problem anyway, just workaround it.

Right.. That splitting would definitely avoid the current problem. But I understand it's not desirable for you.

--
-- Rene Rivera
-- Grafik - Don't Assume Anything
-- Robot Dreams - http://robot-dreams.net
-- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail

_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing
Reply | Threaded
Open this post in threaded view
|

Re: [crystax.net] Large regression files..

Dmitry Moskalchuk
On 30/09/15 22:48, Rene Rivera wrote:
$ du -ms bjam.log
3181    bjam.log

Which leads me to suspect that the 64K output limit is not being eonforced in your case for some reason. Could you:

a) Inspect the b2 invocation to see if the "-m64" option is getting added?

Yes, that's it! Option '-m64' is not added to b2 invocation. As I see, this happens due to bug in regression.py. There is following code (https://github.com/boostorg/regression/blob/develop/testing/src/regression.py#L282):

# if no -m bjam option add -m64 (limit target to 64 kb of output)
if self.bjam_options.find('-m') == -1:
    self.bjam_options += ' -m64'

And this condition is wrong, because we pass "address-model=32" (or "address-model=64", depending on target) in bjam flags and that code in regression.py erroneously take it as if "-mN" option was already passed. I'll add "-m64" to b2 invocation right now, but please fix regression.py too.


--
Dmitry Moskalchuk


_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing

signature.asc (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [crystax.net] Large regression files..

Adam Wulkiewicz
In reply to this post by Rene Rivera-2
Rene Rivera wrote:

BTW, I'm wondering why HTML generation code depends on XML size? Does it
read whole XML file in memory before parsing it?

Yes, it seems the XML library the report generator uses just reads the entire XML into memory. 

Here: https://github.com/boostorg/regression/blob/develop/reports/src/xml.cpp#L351

It's becasue internally rapidxml is used to process the XML and this library requires null-terminated string containing an XML. But I don't see a reason why e.g. an istream or an InputIterator e.g. istream_iterator couldn't be passed instead. This would require to modify the rapidxml code that currently resides in PropertyTree but is doable. Alternatively we could maintain a modified copy of rapidxml in regression for now and possibly later merge the changes in PropertyTree.

Regards,
Adam

_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing
Reply | Threaded
Open this post in threaded view
|

Re: [crystax.net] Large regression files..

Tom Kent
In reply to this post by Rene Rivera-2


On Wed, Sep 30, 2015 at 1:21 PM, Rene Rivera <[hidden email]> wrote:

As the thread "Tests for 'develop' not showing" concluded.. The multi-GiB XML files from the CrystaX.NET runners are causing result processing to fail because of the large memory requirement to parse those large files. Please find the reason why those files are large. And correct it so that they aren't so large. Otherwise I'll be forced to remove them from the result tables.

Could we add a filter to the report script so that it would ignore zip files over ~10MB? Aside from the CrystaX results, all the others in master and develop look like they max out around 5MB.

Maybe have the large runners show up as an empty column (or with some error message in it) so that runners could have some feedback as to why their results aren't being processed?

Tom

_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing
Reply | Threaded
Open this post in threaded view
|

Re: [crystax.net] Large regression files..

Dmitry Moskalchuk
In reply to this post by Dmitry Moskalchuk
On 30/09/15 23:04, Dmitry Moskalchuk wrote:
Yes, that's it! Option '-m64' is not added to b2 invocation. As I see, this happens due to bug in regression.py. There is following code (https://github.com/boostorg/regression/blob/develop/testing/src/regression.py#L282):

# if no -m bjam option add -m64 (limit target to 64 kb of output)
if self.bjam_options.find('-m') == -1:
    self.bjam_options += ' -m64'

So I did experiment - run regression tests for ubuntu's gcc/clang with the same bjam options we're using for Android testing. First time I ran it with the following command:

./run.py --runner=CrystaX.NET-ubuntu14.04 --tag=develop --toolsets=gcc-4.9,clang-3.5 --platform=Linux --bjam-options="-j16 variant=release link=static,shared runtime-link=shared threading=multi address-model=64"

This prevent adding "-m64" to b2 invocation due to bug described above. This make bjam.log of 2.5 GiB and XML file of the approximately the same size.

Then I've started it with the same command line, but adding "-m64" to bjam options:

./run.py --runner=CrystaX.NET-ubuntu14.04 --tag=develop --toolsets=gcc-4.9,clang-3.5 --platform=Linux --bjam-options="-j16 variant=release link=static,shared runtime-link=shared threading=multi address-model=64 -m64"

This time bjam.log was 221 MiB, and corresponding XML file 161 MiB.

So adding "-m64" option definitely fix the problem and soon all CrystaX.NET reports will be updated with files of significantly less size.


--
Dmitry Moskalchuk


_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing

signature.asc (1K) Download Attachment