Testing direction (was: Request for funding - Test Assets)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Testing direction (was: Request for funding - Test Assets)

Tom Kent
On Sat, Dec 12, 2015 at 8:35 PM, Rene Rivera <[hidden email]> wrote:
Not going to comment on the aspect of purchasing a machine. But will point out that the real benefit to having dedicated machines is that of having non-traditional setups (OS+toolset). I.e. dedicated machines give you coverage.

On Sat, Dec 12, 2015 at 8:08 PM, 'Tom Kent' via Boost Steering Committee <[hidden email]> wrote:

I also think that, like Niall said, we should move towards CI style testing where every commit is tested, but that is going to be a *huge* transition.

I wouldn't say huge.. Maybe "big". 
 
I would love to see direction on this in general from the steering committee, and am encouraged that almost all new libraries already have this. 

I can't speak for the committee. But as testing manager I can say moving Boost to CI is certainly something I work on a fair amount.

Retrofitting it onto all the existing libraries will be an undertaking.

Working on that. Getting closer and closer. 

I didn't realize this was being  actively pursued. How many of the existing libraries have been setup for this? Is there a broader strategy for getting the individual maintainers to take these changes? Any simple tasks I could help with in my (very limited) spare time?
 
 
I would suggest that as an interim step, we update our existing regression facility so that the runners just specify what their configuration is (msvc-12.0, gcc-4.9-cpp14, clang-3.3-arm64-linux, etc) and we have a centralized server that gives them a commit to test (and possibly a specific test to run).

Not sure what you mean by that. 
 
They would also send their results back to this server (via http post, no more ftp!) in a format (json) that can be immediately displayed on the web without interim processing.

It's not actually possible to eliminate the processing. Although it's possible to reduce it to a much shorter time span than what it is now. That processing is what adds structure and statistics that we see now in results. Without it theres considerably less utility in the results. And I can say that because..

Here's the idea I've been pondering for a while...curious what you (and others) think of it....

Currently when a user starts the regression tests with run.py, the specify the branch that they want to run (master or develop) and then get the latest commit from that  branch. I would like to remove this from the user's control. When they call run.py, they just pass in their configuration and their id string and run.py goes out to a server to see what needs to be run. By default this server could just alternate between giving back the latest master/develop (or maybe only run master 1 in 3 times). That would give us uniform coverage of master and develop branches.

This would also enable us to have a bit more control around release time. Once an RC is created, we could give each runner that commit to test (allowing master's latest to have changes), then we could get tests of what is proposed for the release (something that is a  bit lacking right now, although the fact that we freeze the master branch gets close to this). After a release, we could save the snapshot of tests and archive that so that future users of that release could have something documenting its state. 

As far as processing the output, what I was envisioning was moving a lot more of it to each test runner and the rest to the client side with some javascript. To re-create the summary page, each runner could upload a json file with all the data for their column: pass/fail, percent failed, metadata. Then we could run a very lightweight php (or other) script on the server that keeps track of which json files are available (i.e. all of the ones uploaded, except those not white-listed on master) and whenever a user opens that page, their browser is given that list of json files which the browser then downloads, renders and displays. There would be a similar pattern for each of the libraries' individual result summaries. Which could link to separately uploaded results for each failure. 

I'm not an expert on what the report runner actually does, but I think that is the majority of it, right?

 
Even this kind of intermediate step would be a lot of development for someone....and I don't have time to volunteer for it.

I've been working on such a testing reporting system for more than 5 years now. This past year I've been working on it daily. Mind you only being able to devote a very limited amount of time daily. Recently I've been working on processing performance, in trying to get processing of 100 test results to happen in under 3 seconds (running the Google cloud infrastructure).

Thanks for all the amazing work you've done with the testing infrastructure, you definitely don't get enough recognition for it! 

Tom


_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing
Reply | Threaded
Open this post in threaded view
|

Re: Testing direction (was: Request for funding - Test Assets)

Rene Rivera-2
On Sun, Dec 13, 2015 at 8:59 AM, Tom Kent <[hidden email]> wrote:
On Sat, Dec 12, 2015 at 8:35 PM, Rene Rivera <[hidden email]> wrote:
Not going to comment on the aspect of purchasing a machine. But will point out that the real benefit to having dedicated machines is that of having non-traditional setups (OS+toolset). I.e. dedicated machines give you coverage.

On Sat, Dec 12, 2015 at 8:08 PM, 'Tom Kent' via Boost Steering Committee <[hidden email]> wrote:

I also think that, like Niall said, we should move towards CI style testing where every commit is tested, but that is going to be a *huge* transition.

I wouldn't say huge.. Maybe "big". 
 
I would love to see direction on this in general from the steering committee, and am encouraged that almost all new libraries already have this. 

I can't speak for the committee. But as testing manager I can say moving Boost to CI is certainly something I work on a fair amount.

Retrofitting it onto all the existing libraries will be an undertaking.

Working on that. Getting closer and closer. 

I didn't realize this was being  actively pursued. How many of the existing libraries have been setup for this? Is there a broader strategy for getting the individual maintainers to take these changes? Any simple tasks I could help with in my (very limited) spare time?

The only library so far I have is my own (Predef.. But that's an easy one). There are a lot of small changes needed to deal with this. You can look at the current functionality for this CI testing here <https://github.com/boostorg/regression/tree/develop/ci/src> (plus the .travis.yml and appveyor.yml in Predef).

One in particular I did a PR for BB as it was a functionally "radical" change (see <https://github.com/boostorg/build/pull/83>). But I will likely move on without that change anyway. My plan was to start on the "Robert" version of isolated testing (checking out a library to a particular commit, but checking out the monolithic Boost to a release commit). 

My next step on that was to move to testing another more complex library using the CI script (and extend the script as needed).

As for broader strategy.. At some point when I have reasonably complete CI support (Travis and Appveyor and a complex library) I'll just start making changes to all libraries. As I know that getting authors to do this work will likely not work. I.e. I'll take my usual "just do it" approach :-) As for resources.. My goal is to move the testing of the common toolsets/platforms all to cloud based services. Relieving our dedicated tester to concentrate on the not so common & bleeding edge toolsets (such as Android, IBM, Intel, BSD, etc configurations).
  
I would suggest that as an interim step, we update our existing regression facility so that the runners just specify what their configuration is (msvc-12.0, gcc-4.9-cpp14, clang-3.3-arm64-linux, etc) and we have a centralized server that gives them a commit to test (and possibly a specific test to run).

Not sure what you mean by that. 
 
They would also send their results back to this server (via http post, no more ftp!) in a format (json) that can be immediately displayed on the web without interim processing.

It's not actually possible to eliminate the processing. Although it's possible to reduce it to a much shorter time span than what it is now. That processing is what adds structure and statistics that we see now in results. Without it theres considerably less utility in the results. And I can say that because..

Here's the idea I've been pondering for a while...curious what you (and others) think of it....

Currently when a user starts the regression tests with run.py, the specify the branch that they want to run (master or develop) and then get the latest commit from that  branch. I would like to remove this from the user's control. When they call run.py, they just pass in their configuration and their id string and run.py goes out to a server to see what needs to be run. By default this server could just alternate between giving back the latest master/develop (or maybe only run master 1 in 3 times). That would give us uniform coverage of master and develop branches.

Interesting. I'll have to think about that some.

This would also enable us to have a bit more control around release time. Once an RC is created, we could give each runner that commit to test (allowing master's latest to have changes), then we could get tests of what is proposed for the release (something that is a  bit lacking right now, although the fact that we freeze the master branch gets close to this). After a release, we could save the snapshot of tests and archive that so that future users of that release could have something documenting its state. 

As far as processing the output, what I was envisioning was moving a lot more of it to each test runner and the rest to the client side with some javascript. To re-create the summary page, each runner could upload a json file with all the data for their column: pass/fail, percent failed, metadata. Then we could run a very lightweight php (or other) script on the server that keeps track of which json files are available (i.e. all of the ones uploaded, except those not white-listed on master) and whenever a user opens that page, their browser is given that list of json files which the browser then downloads, renders and displays. There would be a similar pattern for each of the libraries' individual result summaries. Which could link to separately uploaded results for each failure. 

That's not far from what I plan to do, and have partly working. Except for the aspect of doing as much on the client side as you say. I attempted to do that early on in my work and found that it just didn't work. First there wasn't enough testing time computation that could be done to facilitate the server/client side. As much of the computation cuts across various testers. Second it conflicted with one of my goals of making the testing side simpler to increase the number of testers (which is a common complaint currently).

Right now what I have is: Testers upload results as they happen (each test would do a post to the Google cloud). When a test run is done the data is aggregated (again in the Google cloud) to generate the collective stats & structure (it's this part that I'm optimizing at the moment). When a person browses to the results the web client downloads json describing that page of results, and renders a table with client side C++ (emscripten currently). Note, I try and only generate on the server the minimum stats information possible to reduce that processing time and shifting as much as possible to the web client.
 
I'm not an expert on what the report runner actually does, but I think that is the majority of it, right?

 
Even this kind of intermediate step would be a lot of development for someone....and I don't have time to volunteer for it.

I've been working on such a testing reporting system for more than 5 years now. This past year I've been working on it daily. Mind you only being able to devote a very limited amount of time daily. Recently I've been working on processing performance, in trying to get processing of 100 test results to happen in under 3 seconds (running the Google cloud infrastructure).

Thanks for all the amazing work you've done with the testing infrastructure, you definitely don't get enough recognition for it! 

Tom


_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing



--
-- Rene Rivera
-- Grafik - Don't Assume Anything
-- Robot Dreams - http://robot-dreams.net
-- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail

_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing
Reply | Threaded
Open this post in threaded view
|

Re: Testing direction (was: Request for funding - Test Assets)

Adam Wulkiewicz
In reply to this post by Tom Kent
Tom Kent wrote:
On Sat, Dec 12, 2015 at 8:35 PM, Rene Rivera <[hidden email]> wrote:
Not going to comment on the aspect of purchasing a machine. But will point out that the real benefit to having dedicated machines is that of having non-traditional setups (OS+toolset). I.e. dedicated machines give you coverage.

On Sat, Dec 12, 2015 at 8:08 PM, 'Tom Kent' via Boost Steering Committee <[hidden email]> wrote:

I also think that, like Niall said, we should move towards CI style testing where every commit is tested, but that is going to be a *huge* transition.

I wouldn't say huge.. Maybe "big". 
 
I would love to see direction on this in general from the steering committee, and am encouraged that almost all new libraries already have this. 

I can't speak for the committee. But as testing manager I can say moving Boost to CI is certainly something I work on a fair amount.


FYI, Boost.Geometry is setup to use CircleCI and Coveralls. See the readme:
https://github.com/boostorg/geometry

We're using CircleCI instead of TravisCI because the latter fails due to the lack of memory needed to run the tests. The integration of CircleCI with Coveralls is not as straightforward as it is for TravisCI esspecially in the case of parallel testing. I was forced to manually gather coverage info from parallel runs into one VM/container, merge chunks manually and send with curl into Coveralls. See the script if you're interested, it's based on the Antony Polukhin's TravisCI script:
https://github.com/boostorg/geometry/blob/develop/circle.yml

Currently I have to manually push the changes into my fork of Boost.Geometry in order to run the tests. Obviously the tests for pull requests for the main repository aren't run automatically either.
So if you plan to enable the support for online CI services I'd suggest to allow the maintainers to choose the services they prefer, somehow.

Btw, I'm also playing with the performance regression testing on CircleCI:
https://circleci.com/gh/awulkiew/benchmark-geometry-trigger/80#artifacts
https://circle-artifacts.com/gh/awulkiew/benchmark-geometry-trigger/80/artifacts/0/tmp/circle-artifacts.Nv98VEW/index.html
The above charts were generated by scripts, benchmarks and report generator tool. It's not that this is natively supported by CircleCI.

Regards,
Adam

_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing
Reply | Threaded
Open this post in threaded view
|

Re: Testing direction (was: Request for funding - Test Assets)

Raffi Enficiaud-3
In reply to this post by Rene Rivera-2
Le 13/12/15 17:47, Rene Rivera a écrit :

> On Sun, Dec 13, 2015 at 8:59 AM, Tom Kent <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     On Sat, Dec 12, 2015 at 8:35 PM, Rene Rivera <[hidden email]
>     <mailto:[hidden email]>> wrote:
>
>         Not going to comment on the aspect of purchasing a machine. But
>         will point out that the real benefit to having dedicated
>         machines is that of having non-traditional setups (OS+toolset).
>         I.e. dedicated machines give you coverage.
>
>         On Sat, Dec 12, 2015 at 8:08 PM, 'Tom Kent' via Boost Steering
>         Committee <[hidden email]
>         <mailto:[hidden email]>> wrote:
>
>
>             I also think that, like Niall said, we should move towards
>             CI style testing where every commit is tested, but that is
>             going to be a *huge* transition.
>
>
>         I wouldn't say huge.. Maybe "big".
>
>             I would love to see direction on this in general from the
>             steering committee, and am encouraged that almost all new
>             libraries already have this..
>
>
>         I can't speak for the committee. But as testing manager I can
>         say moving Boost to CI is certainly something I work on a fair
>         amount.
>
>             Retrofitting it onto all the existing libraries will be an
>             undertaking.
>
>
>         Working on that. Getting closer and closer.
>
>
>     I didn't realize this was being  actively pursued. How many of the
>     existing libraries have been setup for this? Is there a broader
>     strategy for getting the individual maintainers to take these
>     changes? Any simple tasks I could help with in my (very limited)
>     spare time?
>
>
> The only library so far I have is my own (Predef.. But that's an easy
> one). There are a lot of small changes needed to deal with this. You can
> look at the current functionality for this CI testing here
> <https://github.com/boostorg/regression/tree/develop/ci/src> (plus the
> .travis.yml and appveyor.yml in Predef).
>
> One in particular I did a PR for BB as it was a functionally "radical"
> change (see <https://github.com/boostorg/build/pull/83>). But I will
> likely move on without that change anyway. My plan was to start on the
> "Robert" version of isolated testing (checking out a library to a
> particular commit, but checking out the monolithic Boost to a release
> commit).
>
> My next step on that was to move to testing another more complex library
> using the CI script (and extend the script as needed).

Hi all,

I will just give my personal opinions about that, and what I did for
boost.test.

After all the complains boost.test got lately, I deployed an internal CI
based on Atlassian Bamboo (https://www.atlassian.com/software/bamboo):
- tests every commits that happens on boost.test only
- tests every branch of boost.test

What I do is what you call the "Robert" CI testing:
- I clone boost to develop
- I checkout a specific branch of boost.test (tolerant to force updates,
since those are topic branches I force push them before merge)
- Bamboo runs boost.test unit tests on this branch vs. develop, on
several configurations (more or less 7 configurations, win, osx and
linux), on exactly the same version of the code.


The benefit is that
- I test the same version of the code on several configuration, so the
feedback I have from the CI is for this specific version, which is
currently lacking for the regression dashboard
- I have clean topic branches, and clear status on all those
- Forking the CI "plan" to a new branch is automatically done by
Atlassian Bamboo
- I am not polluting the boost.test develop branch with immature
developments anymore, I do not need to use a fork of the repository for
that neither
- I have a very fast feedback on all branches, and I can have a branch
policy that is also avoiding any clash of topics: I merge different
topics on a "next" branch that gets automatically tested as well,
asserting that an union of topic is still ok. Once "next" is green, it
is more or less safe to merge to develop.
- the interface is clear, it keeps the history and the logs and
everything I need.


Atlassian Bamboo is a paying solution, but it is free for open source
projects. It needs a master server, that schedules the builds on several
slave machines (agents). I used to use Jenkins a lot, I have to say
Bamboo is far above.

The problem I can see in using this kind of solution though is that the
current runners are asynchronous in their result: they run whenever they
can, and there is no enforcement on the revision that is getting tested.
It is more or less push vs. pull, and bamboo is better adapted to a park
of runners that are highly available. This is the current setup I have
for boost.test and I am pretty happy with it.


> As for broader strategy.. At some point when I have reasonably complete
> CI support (Travis and Appveyor and a complex library) I'll just start
> making changes to all libraries. As I know that getting authors to do
> this work will likely not work. I.e. I'll take my usual "just do it"
> approach :-) As for resources.. My goal is to move the testing of the
> common toolsets/platforms all to cloud based services. Relieving our
> dedicated tester to concentrate on the not so common & bleeding edge
> toolsets (such as Android, IBM, Intel, BSD, etc configurations).
>
> [snip]
>
>     Here's the idea I've been pondering for a while...curious what you
>     (and others) think of it....
>
>     Currently when a user starts the regression tests with run.py, the
>     specify the branch that they want to run (master or develop) and
>     then get the latest commit from that  branch. I would like to remove
>     this from the user's control. When they call run.py, they just pass
>     in their configuration and their id string and run.py goes out to a
>     server to see what needs to be run. By default this server could
>     just alternate between giving back the latest master/develop (or
>     maybe only run master 1 in 3 times). That would give us uniform
>     coverage of master and develop branches.
>
>
> Interesting. I'll have to think about that some.
>
>     This would also enable us to have a bit more control around release
>     time. Once an RC is created, we could give each runner that commit
>     to test (allowing master's latest to have changes), then we could
>     get tests of what is proposed for the release (something that is a
>       bit lacking right now, although the fact that we freeze the master
>     branch gets close to this). After a release, we could save the
>     snapshot of tests and archive that so that future users of that
>     release could have something documenting its state.

Instead of doing it that way, and specifically for RCs, I would go for a
specific branching scheme:
- an RC goes to a release branch
- runners check that release branch first, and test it if not already done.

Also it lets ppl see in the repository this specific RC, lets them clone
it and test it.


>     As far as processing the output, what I was envisioning was moving a
>     lot more of it to each test runner and the rest to the client side
>     with some javascript. To re-create the summary page, each runner
>     could upload a json file with all the data for their column:
>     pass/fail, percent failed, metadata. Then we could run a very
>     lightweight php (or other) script on the server that keeps track of
>     which json files are available (i.e. all of the ones uploaded,
>     except those not white-listed on master) and whenever a user opens
>     that page, their browser is given that list of json files which the
>     browser then downloads, renders and displays. There would be a
>     similar pattern for each of the libraries' individual result
>     summaries. Which could link to separately uploaded results for each
>     failure..
>
>
> That's not far from what I plan to do, and have partly working. Except
> for the aspect of doing as much on the client side as you say. I
> attempted to do that early on in my work and found that it just didn't
> work. First there wasn't enough testing time computation that could be
> done to facilitate the server/client side. As much of the computation
> cuts across various testers. Second it conflicted with one of my goals
> of making the testing side simpler to increase the number of testers
> (which is a common complaint currently).
>
> Right now what I have is: Testers upload results as they happen (each
> test would do a post to the Google cloud). When a test run is done the
> data is aggregated (again in the Google cloud) to generate the
> collective stats & structure (it's this part that I'm optimizing at the
> moment). When a person browses to the results the web client downloads
> json describing that page of results, and renders a table with client
> side C++ (emscripten currently). Note, I try and only generate on the
> server the minimum stats information possible to reduce that processing
> time and shifting as much as possible to the web client.
>

 From all that, as I understand it is that you want to have a new
dashboard, and not necessarily a new full testing CI?
I believe this is a big but not huge development effort to mimic a CI
dashboard.

Also, I do not know if mixing server and client side technologies is the
way to go. I would rather go for server side only, rendering static html
files asynchronously: those will be easily cached by the web server and
web client and the rendering would be almost immediate on every device.
To be honest, I hate JS and node.js bubble just makes me smile.

I also think that summarizing/visualizing the information is the key for
a dashboard:
- most of the state should be rendered as a function of time, where time
is the time of the commit
- for each commit, associated # of runners, # of tests, # failing tests,
and the deltas wrt. previous version (including removed/added tests)
- the same for every libraries, with the list of test we have now,
without the segmented logs
- access to full build log instead of segmented/broken ones

It means that, if at some point a lunatic runner wakes up, it will push
its result to a specific commit, making this point of time richer that
before (and not removing/replacing part of the information).

I do not know the server side technologies you are using right now,
lately (last 2 years) I have been using Django, and I find it pretty
cool. It's Python, it has a big community, I think it would benefit from
contributions (including me).
I made something that manages revisions and branches, permissions etc.
for storing documentation from a CI:
https://bitbucket.org/renficiaud/code_doc (or about page
https://bitbucket.org/renficiaud/code_doc/src/6fe3560284ca84a31e4379331af1edfa1e458999/code_doc/templates/code_doc/about.html?at=master&fileviewer=file-view-default)


Best,
Raffi

_______________________________________________
Boost-Testing mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/boost-testing