(I posted this mail in Boost Archive, but it seems I should send the mail to ublas mail list directly)
I'm Wei Wang, a CS master student focusing on high-performance computing field. `boost::ublas` project 3 adding GPU computation interests me a lot and I'd like to help add this feature to ublas. I find this project was also on last years' list and I'm curious if anyone did this before or on which stage he/she has finished.
I've already read the initial source code of `ublas` in `boost 1.29`(I also read 1.66 API, and found it add one concept `container`, which used to be `bounded_array` and `unbounded_array`). I wrote a passage describing its template parameter deduction relationships. Besides, I wrote a series blogs teaching how to use openCL efficiently with proper data partition and memory usage.
This is my first time participating in GSOC, and I'm a bit of confused on following question: 1. Integrating openCL requires preparing for context, command_queue, event and other "environment objects", should they also be included in this lib? 2. Take matrix matrix multiplication A*B for example. The last stage before matrix copy assignment is in `matrix_matrix_prob` class and its evaluation requires loop through all items on both matrix. If I want to add GPU compute features, I need to launch kernel for each computation expression at this step, but it seems to be contradictory to `ublas`'s lazy evaluation rationale. Is it possible to bypass the rule? 3. What should I implement in the competency matrix class? Just integer matrix or template matrix class?Should I support current `ublas` interface(those typedefs and traits)?