
12

Hi,
I am interested in the current development status of the library. is
there the possibility to get the latest version somewhere? I tried to
find a working branch on https://github.com/boostorg/ublas but could
only find changes older than half a year.
I am interested in Benchmarking the "new" ublas against "my"
reimplemented subset (think about my personal summer of code :) )
Greetings,
Oswin
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Hi,
the latest version is indeed on github. We didn't commit many things this year so far. We are talking (slowly, I must admit, please blame me not the others) about new features and changes:
(1) integrating GSOC 2013 in the main branch
(2) removing the vector object to replace it by a new one that will be a matrix and therefore implement a real row of column vector (3) Matrix multiplication with operator* valid for all standard operation (vector, matrix, etc...). It is related to (2)
(4) Algorithms infrastructure (so that we can have real useful features) (5) Matrix/vector views for interoperability < I think
this is ultra critical because now ublas is monolithic in the sense that
you have to use it everywhere you manipulate data. This would really
help into letting people for example have a list of vectors (they are
plotting) and ublas working on top of that to do for example
transformations
(6) NEW DOCUMENTATION  examples and the rest. The current one is ... old (7) Incorporate some critical bindings (i.e. mumps bindings which is
currently probably the most efficient smp and distributed open source
linalg solver)
(8) Matlab binding? Matlab I/O etc... Well same thing with R, csv files, etc... Bindings and I/O in general are pretty poor
(9) Distributed ublas (GSOC'13)
I'm really interested seeing your benchmark. Maybe you can join the github ublas project as well. If you give me your github id I can associate it with the project and you can push your code. Practically speaking, there is a ublas project on github which is a copy a the main boostorg/ublas project. And there are many branches for each project that people can use.
And last but not least, there are a few changes for supporting IDE. In fact, that would be good if people can contribute more. I'm a vi user only, so I need others' skills for that. QTCreator is on its way. Ideally, if we can have some sort of support for the main IDE, on top of my mind: Eclipse, Netbeans, Visual Studio, Anjuta, KDevelop, etc...
Best regards, David
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Hi David,
Thanks for the fast reply. There might be a misunderstanding: I
don't have any code here that is ready for uBLAS. The story goes
back to approximately spring where it was decided to improve
uBLAS. I decided to part ways because of our differences in how
uBLAS should look like (and since it is your library it is of
course your right to choose!). So I rewrote the whole thing (or
better: the parts that i needed, that is dense and sparse algebra,
no banded matrices) and now got a 16000 LOC subset of uBLAS with
most of the functionality (and sometimes more) and automatic ATLAS
bindings for fast_prod etc.
Now i wanted to come back and compare the different approaches and
see whether I could adopt some of your changes as well as maybe
port some of my code back to uBLAS.
So once most of the new features are there, i would be happy to
compare! You would also get a direct benchmark against atlas if
you like.
My version is unfortunately not standalone yet because of
$REASONS. Also looking at your changes, i doubt that the libraries
are compatible, for example because I overloaded op* and op/ for
elementwise operations ( I need them so often, element_prod and
element_div just does not cut it...)
But if you like you can have a look at it here
(most of ublas functionality minus systems of equations and
algorithms)
http://sourceforge.net/p/sharkproject/code/HEAD/tree/trunk/Shark/include/shark/LinAlg/BLAS/
(and systems solver + algorithms (wip))
http://sourceforge.net/p/sharkproject/code/HEAD/tree/trunk/Shark/include/shark/LinAlg/
Best Regards,
Oswin
On 29.11.2013 12:40, David Bellot wrote:
Hi,
the
latest version is indeed on github. We didn't commit many
things this year so far. We are talking (slowly, I must admit,
please blame me not the others) about new features and
changes:
(1)
integrating GSOC 2013 in the main branch
(2) removing the vector object to replace it by a new one that
will be a matrix and therefore implement a real row of column
vector
(3) Matrix multiplication with operator* valid for all
standard operation (vector, matrix, etc...). It is related
to (2)
(4) Algorithms infrastructure (so that we can have real
useful features)
(5) Matrix/vector views for interoperability < I think
this is ultra critical because now ublas is monolithic in
the sense that you have to use it everywhere you manipulate
data. This would really help into letting people for example
have a list of vectors (they are plotting) and ublas working
on top of that to do for example transformations
(6) NEW DOCUMENTATION  examples and the rest. The
current one is ... old
(7) Incorporate some critical bindings (i.e. mumps bindings
which is currently probably the most efficient smp and
distributed open source linalg solver)
(8) Matlab binding? Matlab I/O etc... Well same thing
with R, csv files, etc... Bindings and I/O in general are
pretty poor
(9) Distributed ublas (GSOC'13)
I'm
really interested seeing your benchmark. Maybe you can join
the github ublas project as well. If you give me your github
id I can associate it with the project and you can push your
code. Practically speaking, there is a ublas project on github
which is a copy a the main boostorg/ublas project. And there
are many branches for each project that people can use.
And
last but not least, there are a few changes for supporting
IDE. In fact, that would be good if people can contribute
more. I'm a vi user only, so I need others' skills for that.
QTCreator is on its way. Ideally, if we can have some sort of
support for the main IDE, on top of my mind: Eclipse,
Netbeans, Visual Studio, Anjuta, KDevelop, etc...
Best
regards,
David
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Hi Oswin,
well, I'm just the maintainer of uBLAS, not the owner and for sure not the first developer as it was already an old project when I took over it. I believe any contributions is welcome and honnestly so many things can be changed in uBLAS. It's getting a little bit old.
So that's why I'm always happy to have new contributions. That's also the reason I wanted to have GSOC projects this summer and I think it has been a good thing.
I didn't know Shark was your library ! Well done. It's a good library.
I'm especially interested in your implementaion of RBM.
So please, feel free to propose anything you think is relevant to uBLAS. At some point we even thought about rewriting a big part of uBLAS and call it uBLAS 2 (and go through the peer review again, etc...). Well, I think uBLAS can evolve smoothly rather than having a big revolution.
The automatic ATLAS binding is definitively something I want !!!!
Best, David
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Hi David,
Yes at a point i thought about contributing to uBLAS. But there
are two problems with it that are making it hard to develop.
1. There is no real consensus of what uBLAS should be targeted at.
Right now I have the feeling that it should be more or less like
Eigen: be applicable to every use case out there. And this is
hard. Most of Eigen looks to me like a big mess. This is because
it must behave differently in the cases of small, big and very big
problems. Right now uBLAS has only a good performance for small
dense problems and I am mostly only interested in the middle
category. And the last time i proposed changes I got nearly no
support but fire from the "small" and "very big" guys. This makes
it quite unattractive for me to propose further changes.
2.
So please, feel free to propose anything you think is relevant
to uBLAS. At some point we even thought about rewriting a big
part of uBLAS and call it uBLAS 2 (and go through the peer
review again, etc...). Well, I think uBLAS can evolve smoothly
rather than having a big revolution.
I don't think that this is possible without iteratively annoying
all uBLAS users who actually use things like loops over matrix
elements and the other low level interfaces. I had to rewrite the
sparse interface completely(and i am still not sure that i have
done that right) as well as the iterator interface. I tried to do
it smoothly but I nevertheless broke it 3 times.
The next problem, again sparse, is that there are 3 different
formats supported by uBLAS. It is hard to write algorithms for it,
because the three formats need to be treated differently. But in a
smooth evolution we can't really throw out stuff, especially if
there is no 1:1 replacement.
I think you can find such issues in a lot of parts, as you said,
uBLAS is old. There are more things, like that you might want to
change the way expressions are evaluated so that things like
noalias(A)+=prod(B,C)+D;
are fast and don't require intermediate results in prod().
I think a full rewrite from scratch is required to get rid of all
the old burdens. While a review process of course is a bit
annoying it might actually help in developing it to something that
is actually useful :). (And honestly i doubt that uBLAS as is
would pass a review.)
Best,
Oswin
On 29.11.2013 13:55, David Bellot wrote:
Hi
Oswin,
well,
I'm just the maintainer of uBLAS, not the owner and for sure
not the first developer as it was already an old project when
I took over it. I believe any contributions is welcome and
honnestly so many things can be changed in uBLAS. It's getting
a little bit old.
So
that's why I'm always happy to have new contributions. That's
also the reason I wanted to have GSOC projects this summer and
I think it has been a good thing.
I didn't know Shark was your library ! Well done. It's a good
library.
I'm
especially interested in your implementaion of RBM.
So
please, feel free to propose anything you think is relevant to
uBLAS. At some point we even thought about rewriting a big
part of uBLAS and call it uBLAS 2 (and go through the peer
review again, etc...). Well, I think uBLAS can evolve smoothly
rather than having a big revolution.
The
automatic ATLAS binding is definitively something I want !!!!
Best,
David
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Dear David,
all of the thins you poin out are nice ... if i could add something to the wish list i woul like to add a few items: 1  please mantain some sort of backward compatibility for people that do not use internals... functions as noalias, prod innner_prod etc, while not strictly needed do not make any harm and maintaining them woul make people's life easier in the migration...
2  clearly separate where openmp is going to be used (for example sparse and large dense matrices) and where not ("tiny" matrices both in stack and heap). this is important as one needs to avoid nested openmp parallelism
3  it would be nice to have more c interoperability with the internals of sparse matrices. this would make much easier to wrie wrappers
4  consider pastix as a direct solver apart for mumps. it is really good and actively developed...
good luck with your effort
ciao
Riccardo
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Hi,
I like your ideas. But i am not sure what you mean with 1. and 2.
The problem I see is, that uBLAS right now doesn't offer the
desired quality and speed for single core, let alone multi core.
Therefore I will focus on the internal infrastructure instead of
motivating new algorithms.
short description:
1. Rock solid implementation of the BLAS algorithms( "compute
kernels")
2. Rewriting of the expression mechanic.
2.1 automatic use of the compute kernels for the operations
2.2 automatic optimisation of expressions
long description
1. Rock solid implementation of the BLAS algorithms: matrix x
vector and matrix x matrix in all useful combinations
(sparse/sparse dense/dense dense/sparse ... triangular/dense...etc
there are a lot), but also: matrix assignment. uBLAS will not be
compared in #Features but in runtime compared to
Armadillo/Eigen/Atlas/GotoBlas. Having a fast implementation of
this is also crucial for multicore and distributed computations.
Also all higher level decompositions (Cholesky, LU, SVD,
RankDecomp,...) and triangular solvers rely on fast BLAS.
2. Rewriting of the expression mechanic. The way uBLAS is
implemented right now, writing
noalias(A)+=prod(B,C)+D
or
vec = prod(prod<matrix>(B,C),vec2);
is in most cases a bad decision. The second expression wouldn't
even compile without the template parameter.
What I would like to have is the following:
2.1 automatic use of the fast algorithms. That is the above
expression should be translated to
axpy_prod(B,C,A,false);
noalias(A) +=D;
right now the matrix_assign algorithms are used which do a
terrible job on prod(B,C).
2.2 automatic optimization of expressions. The second expression
above is really inefficient. First the matrixmatrix prod is
evaluated and the result is afterwards used for a single
matrixvector prod.
now compare to this:
prod(B,prod<vector>(C,d));
only 2 matrixvector products are performed which might save 99%
of the FLOPS of the first expression. So it would be better if
this would be transformed automatically. Normally the user could
do that himself but sometimes it is not possible. Consider a
Conjugate Gradient Solver:
x=CG(A,b);
a usual pattern of A is A=XX^T+c*Identity. CG uses internally only
matrixvector products  and possibly not many of them. Therefore
you don't want to compute A because it is a) slow b) might require
more memory than is available. Without expression optimizations it
would be impossible to write a good CG that could do that.
Greetings,
Oswin
On 07.12.2013 11:38, David Bellot wrote:
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Dear Oswin,
my first point is really trivial. i do not doubt that the current expression template mechanism is not optimal. the only thing i propose is to mantain a small backward compatibility layer so that code that works with the old ublas is not unnecessarily broken.
for example if you assume that the novel syntax is C += A*B i believe that it could be useful to define some wraper between the old and new syntax. the easiest way (dirty) would be to define two macros
"noalias" Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â andÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â " prod" so that noalias(C) += prod(A,B) Â expands to the new format... indeed using macros is not a good idea but it gives the point
my second point is mostly aboutÂ matrices of size say 30*30, whose size is NOT known at compile time. for such matrix size openmp is not useful, however one may perform in parallel a lot of operations on differentÂ small matrices, parallelizing some sort of outer iteration.
for this outer parallelization to be useful, one needs guarantees that openmp is not used in the inner loop... otherwise problems may arise with nested parallelism.
...i only implied this...
ciao
Riccardo
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Hi all,
I'm very new here and don't know how much value I can add to the discussion! But if a rewrite certainly wins over the current implementation, then it could very well be a good choice! Python did the same when they transitioned to Python 3.0; they had no other way to remove some fundamental design flaws in Python2.x.Â
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Hi,
I agree with you and I think a rewrite wouldn't change the
"highlevel" syntax. I like the way expressions are formed right
now. There is no real way to get rid of noalias as there is no
real cheap solution for aliastests. And honestly i like prod()
over op* because prod() scrams in your face "this is going to take
a while".
for this outer parallelization to be useful, one needs
guarantees that openmp is not used in the inner loop...
otherwise problems may arise with nested parallelism.
I don't see the issue here. OpenMP has in most configurations a
fixed size thread pool and would never spawn more threads than
allowed. Further it is allowed for an OpenMP thread to spawn more
threads.
The real issue is, OpenMP does not and will never support C++.
There is OpenMP support in gcc but in fact there is no guarantee
that this works as the C++11 memory model is not compatible with
the OpenMP model. AFAIK there are no plans for a C++11 compatible
OpenMP standard. Therefore we can't expect that there ever is
OpenMP support for clang or other compilers and I really think
that uBLAS would need to use proper threading models, for example
using asio thread_pool
On 07.12.2013 23:35, Riccardo Rossi wrote:
Dear Oswin,
my first point is really trivial. i do not
doubt that the current expression template
mechanism is not optimal.
the only thing i propose is to mantain a small
backward compatibility layer so that code that
works with the old ublas is not unnecessarily
broken.
for example if you assume that the novel syntax
is
C += A*B
i believe that it could be useful to define some
wraper between the old and new syntax. the easiest
way (dirty) would be to define two macros
"noalias" and "
prod"
so that
noalias(C) += prod(A,B)
expands to the new format... indeed using macros is not a
good idea but it gives the point
my second point is mostly about matrices of size say 30*30,
whose size is NOT known at compile time.
for such matrix size openmp is not useful, however one may
perform in parallel a lot of operations on different small
matrices, parallelizing some sort of outer iteration.
for this outer parallelization to be useful, one needs
guarantees that openmp is not used in the inner loop...
otherwise problems may arise with nested parallelism.
...i only implied this...
ciao
Riccardo
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Hi,
I would like to suggest :
 the ability of the usage of tbb as a threading model. OpenMP is basically
a FORTRAN tool. OpenMP does not integrate well with
other threading models. OpenMP on a library level is going to become
a maintenance nightmare.
 the creation of solvers and more general lapack wrappers as classes,
wrapping Lapack calls
(i.e. something like RogueWave libraries provided).
 Also, especially for the LAPACK fortran style matrices, the incorporation
of BLAS calls as member functions.
So, instead of axpy type calls, the usage of operator= and operator +=
overloading with the specific template expressions.
This cannot be generic, i.e. for all kind of matrices, but nevertheless, is
still very useful for the ones for which BLAS should be called.
 traits classes that would provide for whether the matrix is dense, banded
etc, row/column major, what is the LAPACK leading dimension
etc.
Thak you for your consideration,
Petros P. Mamales
Sent: Sunday, December 08, 2013 4:18 AM
Subject: Re: [ublas] Status of development
/Benchmarks
Hi, I agree with you and I think a rewrite
wouldn't change the "highlevel" syntax. I like the way expressions are formed
right now. There is no real way to get rid of noalias as there is no real cheap
solution for aliastests. And honestly i like prod() over op* because prod()
scrams in your face "this is going to take a while".
for this outer parallelization to be useful, one
needs guarantees that openmp is not used in the inner loop... otherwise
problems may arise with nested parallelism.
I don't see the issue
here. OpenMP has in most configurations a fixed size thread pool and would never
spawn more threads than allowed. Further it is allowed for an OpenMP thread to
spawn more threads. The real issue is, OpenMP does not and will never
support C++. There is OpenMP support in gcc but in fact there is no guarantee
that this works as the C++11 memory model is not compatible with the OpenMP
model. AFAIK there are no plans for a C++11 compatible OpenMP standard.
Therefore we can't expect that there ever is OpenMP support for clang or other
compilers and I really think that uBLAS would need to use proper threading
models, for example using asio thread_pool On 07.12.2013 23:35,
Riccardo Rossi wrote:
Dear Oswin,
my first point is really trivial. i do not doubt
that the current expression template mechanism is not
optimal. the only thing i propose is to mantain a small backward
compatibility layer so that code that works with the old ublas is not
unnecessarily broken. for example if you assume that the novel
syntax is C += A*B i believe that it could be useful
to define some wraper between the old and new syntax. the easiest way (dirty)
would be to define two
macros "noalias"
and
" prod" so that noalias(C) += prod(A,B)
expands to the new format... indeed using macros is not a good idea
but it gives the point my second point is
mostly about matrices of size say 30*30, whose size is NOT known at
compile time.
for such matrix size openmp is not useful, however one may perform in
parallel a lot of operations on different small matrices, parallelizing
some sort of outer iteration.
for this outer parallelization to be useful, one needs guarantees that
openmp is not used in the inner loop... otherwise problems may arise with
nested parallelism.
...i only implied this...
ciao
Riccardo
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________ ublas mailing
list [hidden email]http://lists.boost.org/mailman/listinfo.cgi/ublas Sent
to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Hi,
On 08.12.2013 22:04, Petros wrote:
 traits classes that would provide for whether the
matrix is dense, banded etc, row/column major, what is the
LAPACK leading dimension
This is already there.
Given a Matrix M, we have the following:
M::storage_category is
dense_tag if the matrix has a dense storage format
sparse_tag if the matrix has one of the many sparse formats
packed_tag if the matrix has packed (packed, triangular) storage
unknown_tag everything else (i.e. A+B has unknown_tag even if A and
B are dense)
there are a few more tags regarding proxies, but these are the most
important ones.
M::orientation_category
either row_major_tag or column_major_tag
if the storage_category is not unknown, this returns the
orientation of storage in memory
The leading dimension is a bit tricky but there are functions in
detail/raw.hpp
But maybe you should take a look at the numeric bindings package. I
think they have complte lapack bindings and the traits already
implemented.
etc.
Thak you for your
consideration,
Petros P. Mamales
Sent: Sunday, December 08, 2013 4:18 AM
Subject: Re: [ublas] Status of development
/Benchmarks
Hi,
I agree with you and I think a rewrite wouldn't change the
"highlevel" syntax. I like the way expressions are formed
right now. There is no real way to get rid of noalias as
there is no real cheap solution for aliastests. And
honestly i like prod() over op* because prod() scrams in
your face "this is going to take a while".
for this outer parallelization to be useful, one needs
guarantees that openmp is not used in the inner loop...
otherwise problems may arise with nested parallelism.
I don't see the issue here. OpenMP has in most
configurations a fixed size thread pool and would never
spawn more threads than allowed. Further it is allowed for
an OpenMP thread to spawn more threads.
The real issue is, OpenMP does not and will never support
C++. There is OpenMP support in gcc but in fact there is
no guarantee that this works as the C++11 memory model is
not compatible with the OpenMP model. AFAIK there are no
plans for a C++11 compatible OpenMP standard. Therefore we
can't expect that there ever is OpenMP support for clang
or other compilers and I really think that uBLAS would
need to use proper threading models, for example using
asio thread_pool
On 07.12.2013 23:35, Riccardo Rossi wrote:
Dear Oswin,
my first point is really trivial. i
do not doubt that the current
expression template mechanism is not
optimal.
the only thing i propose is to mantain
a small backward compatibility layer
so that code that works with the old
ublas is not unnecessarily broken.
for example if you assume that the novel
syntax is
C += A*B
i believe that it could be useful to define
some wraper between the old and new syntax.
the easiest way (dirty) would be to define
two macros
"noalias"
and " prod"
so that
noalias(C) += prod(A,B)
expands to the new format... indeed using macros is
not a good idea but it gives the point
my second point is mostly about matrices of size say
30*30, whose size is NOT known at compile time.
for such matrix size openmp is not useful, however
one may perform in parallel a lot of operations on
different small matrices, parallelizing some sort of
outer iteration.
for this outer parallelization to be useful, one
needs guarantees that openmp is not used in the inner
loop... otherwise problems may arise with nested
parallelism.
...i only implied this...
ciao
Riccardo
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Hey David,
the whole problem of most numerical packages, IMHO, is that everything
is tied together. I would encourage a very loosely coupled system. I.e.,
a system that maybe even would be able to switch storage layout,
algorithms, etc., at runtime, maybe following some simple numerical
tests. Of course, only if this would be enabled at compiletime.
Data storage and manipulation should be fully loosely coupled, from low
abstraction to high abstraction:
* the storage of data. This is about memoryefficiency, and hence, speed
of computation. Storage engines might be linear, sparse, etc..
* functional shape of the data. Dense, triangular, etc.
* numerical properties of the data. Positive definite, etc.
* loading and saving the data. Why not support a whole bunch of
dataformats
* unified matrix/vector view on the data
* procedural manipulation of the data, shallow views on the data, views
on views, etc.
Algorithms should be loosely coupled, too, from low abstraction to high
abstraction:
* execution / memory models (synchronous, asynchronous, single core,
multicore, shared memory, distributed memory, GPU memory)
* computational kernels (own C++ CPUoptimized implementations,
openblas, atlas, gotoblas2, lapack, mumps, opencl, cuda, fft, etc)
* expression trees, syntax of expression trees (Boost.Proto stuff)
* optimal assignment of computational kernels to expression trees (maybe
partly at runtime!)
* being able to extend the expression tree machinery (e.g., with hooks)
with own algorithms
Maybe a new library name would be appropriate?
Cheers,
Rutger
On 20131207 11:38, David Bellot wrote:
> Hi,
>
> it has been a long discussion we all had for many months now. Should we
> rewrite ublas from scratch or simply improve it.
> Joaquim and Oswin wants a brand new ublas
> Nasos was more in favor of improving it.
>
> I personally find very exciting the idea of writing something new, but
> ublas is very well known now. On the other hand, Eigen and Armadillo
> took the crown of the main C++ blas library in users' hearts.
>
> On my todo list for ublas, there are things that will require ublas to
> be deeply changed. At this stage, we can almost talk about a new library.
>
> Christmas is very close now, so maybe it's a good time to start talking
> about the features we wish for ublas and see if they can be implemented
> with the current version or if a new uBLAS 2.0 is necessary.
> After all, Boost::signal did the same a few years ago. We can
> definitively do the transition.
>
>
> I begin:
>
>  unified representation of vectors and matrices to represent the fact
> that a vector IS a matrix. Matlab does the same
>  automated use of different algorithm to let the compiler "chooses" the
> best implementation (if possible) and switch on SSE, distributed or
> whateve we need
>  implementation of solvers and decompositions algorithms
>
> and this is what Nasos and I think should be integrated too:
> 1. Matrix multiplication
> 2. Algorithms infrastructure (so that we can have real useful features)
> 3. Matrix/vector views for interoperability < I think this is ultra
> critical because now ublas is monolithic in the sense that you have to
> use it everywhere you manipulate data. This would really help into
> letting people for example have a list of vectors (they are plotting)
> and ublas working on top of that to do for example transformations
> 4. NEW DOCUMENTATION  examples and the rest
> 5. Incorporate some critical bindings (i.e. mumps bindings which is
> currently probably the most efficient smp and distributed open source
> linalg solver)
> 6. matlab binding?
> 7. distributed ublas
>
>
> Please add and ESPECIALLY, please tell me your view on the current
> infrastructure of uBLAS. It seems many people are not happy with the
> current "expression template" grammar.
>
> I'm open to everything
>
> Best,
> David
>
>
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


On 20131209 11:18, sguazt wrote:
>
>
> +1 for a library with looselycoupled feature, even if this means a
> completely rewriting of uBLAS
>
> For the integration of computational kernels, why don't we use the
> boostnumeric_bindings lib?
>
> Also, I'd like to have more MATLABlike functionalities (I've
> implemented some of them in boostublasx, but they relies on
> boostnumeric_bindings and LAPACK functions)
>
If there is enough interest to integrate (parts of) the numeric bindings
library, I'm willing to help.
Cheers,
Rutger
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


I would just like to make a few remarks
here. I also integrated some of the ideas that people posted on
the list up to now.
On 12/07/2013 05:38 AM, David Bellot wrote:
This is not correct. I only suggested that in order to be able to
meet the GSOC deadlines.
Since I aided in compiling this list I will just point out some more
details on the basic features I believe uBLAS 2.0 should focus on:
1. Expression templates mechanism rewrite : probably go with
boost::proto
2. Unified matrix  vector classes. We could even make this a tensor
superclass that I would prefer in the end. The critical issue would
be to make a graceful traits model (maybe a traits object that
bundles together all relevant information, rather than a bunch of
properties the current traits model support)
3. Separation of data storage from matrix/vector/tensor classes to
break the current monolithic model and enable interoperability with
other libraries (i.e. graphics, storage, gui, algebra, etc.). A
model like what is used in boost::gil is very appealing to me. Doing
that for example would be ultra easy to make an ublas/octave/matlab
wrapper.
4. Parallel implementation: The C++11 threading model can be adopted
instead of openMP. It will be a little bit more involved to setup
and it may require some state in the form of a singleton. To avoid
this we will need some sort of interface to initialize threads pools
and provide algorithms with the thread data.
5. Distributed uBlas. I consider this critical since it seems that
large computations will move to clusters in the future (especially
with the availability of all those inexpensive cloud computational
cluster services).
6. Algorithms infrastructure (from fast BLAS to general purpose
algebra and beyond): I think this is very critical.
7. DENSE MATRIX MULTIPLICATION: We already have a good
implementation from GSOC, that needs some face lifting before it
gets into production.
8. Bindings (Migration of the current ones and extension to other
popular packages like MUMPS, PARDISO and other solvers)
9. Enable syntax similar to MATLAB or even extend it by using
summation convention and Einstein notation. Under such a model,
algorithms will be ultra easy to write (although in many cases not
optimal with respect to the CPU architecture). Imagine for example
the multiplication:
ublas2::index i,j,k;
A(i,j) = B(i,k)*C(k,j); // Automatic looping over i,j and summation
over the rhs 'k' index.
and that would be it!
// note indexes can be reused since they are merely placeholders
ublas2::index m(1,10);
A(m,j) = B(m+20, j)*C(m,j);
For such a plan the critical path would be:
0a. Decide what type of data uBlas will be operating on and start
with the traits mechanism. Because all algorithm dispatching will be
based on it it should be carefully designed. Of course a very first
trivial interface would be to build on top of the current one:
typedef ublas2::matrix<double> my_matrix:
typedef my_matrix::traits traits;
with
template <typename Storage, typename Orientation, std::size_t
Order>
struct traits {
typedef Storage storage_category;
typedef Orientation orientation_category;
static constexpr std::size_t tensor_order = Order;
}
0b. Make sure the traits model is easily extendable
1a. Dense tensor_view class and appropriate template specialization
for matrix and vector classes that people are more accommodated
with. Do not forget static containers!
1a. Sparse tensor_view and template specialization for matrix and
vector classes if applicable. Otherwise implement only specialized
sparse containers.
1b. tensor, matrix, vector classes (own data instead of views,
probably wrap around views)
2a. Development of new expression template mechanism and
introduction of computational kernel concept alongside with threaded
implementation.
2b. Range and/or Summation convention/Einstein notation mechanism
3. Everything else.
Ideas/thoughts are welcomed.
Nasos
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


All Riccardo's remark have my vote.
On 12/07/2013 01:50 PM, Riccardo Rossi wrote:
Dear David,
all of the thins you poin out are nice ... if i could add
something to the wish list i woul like to add a few items:
1  please mantain some sort of backward compatibility for
people that do not use internals... functions as noalias, prod
innner_prod etc, while not strictly needed do not make any
harm and maintaining them woul make people's life easier in
the migration...
2  clearly separate where openmp is going to be used (for
example sparse and large dense matrices) and where not ("tiny"
matrices both in stack and heap). this is important as one
needs to avoid nested openmp parallelism
3  it would be nice to have more c interoperability with
the internals of sparse matrices. this would make much easier
to wrie wrappers
4  consider pastix as a direct solver apart for mumps. it
is really good and actively developed...
good luck with your effort
ciao
Riccardo
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]


Oswin,
what do you have in mind in 2.2?
On 12/07/2013 04:10 PM, oswin krause wrote:
Hi,
I like your ideas. But i am not sure what you mean with 1. and
2.
The problem I see is, that uBLAS right now doesn't offer the
desired quality and speed for single core, let alone multi core.
Therefore I will focus on the internal infrastructure instead of
motivating new algorithms.
short description:
1. Rock solid implementation of the BLAS algorithms( "compute
kernels")
2. Rewriting of the expression mechanic.
2.1 automatic use of the compute kernels for the operations
2.2 automatic optimisation of expressions
long description
1. Rock solid implementation of the BLAS algorithms: matrix x
vector and matrix x matrix in all useful combinations
(sparse/sparse dense/dense dense/sparse ...
triangular/dense...etc there are a lot), but also: matrix
assignment. uBLAS will not be compared in #Features but in
runtime compared to Armadillo/Eigen/Atlas/GotoBlas. Having a
fast implementation of this is also crucial for multicore and
distributed computations. Also all higher level decompositions
(Cholesky, LU, SVD, RankDecomp,...) and triangular solvers rely
on fast BLAS.
2. Rewriting of the expression mechanic. The way uBLAS is
implemented right now, writing
noalias(A)+=prod(B,C)+D
or
vec = prod(prod<matrix>(B,C),vec2);
is in most cases a bad decision. The second expression wouldn't
even compile without the template parameter.
What I would like to have is the following:
2.1 automatic use of the fast algorithms. That is the above
expression should be translated to
axpy_prod(B,C,A,false);
noalias(A) +=D;
right now the matrix_assign algorithms are used which do a
terrible job on prod(B,C).
2.2 automatic optimization of expressions. The second expression
above is really inefficient. First the matrixmatrix prod is
evaluated and the result is afterwards used for a single
matrixvector prod.
now compare to this:
prod(B,prod<vector>(C,d));
only 2 matrixvector products are performed which might save 99%
of the FLOPS of the first expression. So it would be better if
this would be transformed automatically. Normally the user could
do that himself but sometimes it is not possible. Consider a
Conjugate Gradient Solver:
x=CG(A,b);
a usual pattern of A is A=XX^T+c*Identity. CG uses internally
only matrixvector products  and possibly not many of them.
Therefore you don't want to compute A because it is a) slow b)
might require more memory than is available. Without expression
optimizations it would be impossible to write a good CG that
could do that.
Greetings,
Oswin
On 07.12.2013 11:38, David Bellot wrote:
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: [hidden email]

Dr Athanasios Iliopoulos
Research Assistant Professor
George Mason University
Resident at Computational Multiphysics Systems Lab. Code 6394
Center of Computational Material Science
Naval Research Laboratory
4555 Overlook Ave. SW, Washington DC 20375
Tel : +1 202 767 2165
email: [hidden email]
URL: https://www.nrl.navy.mil/mstd/ccms/cmsl/
_______________________________________________
ublas mailing list
[hidden email]
http://lists.boost.org/mailman/listinfo.cgi/ublasSent to: [hidden email]

12
