Setting up Phis is indeed an issue, especially because they are "locked" with icpc. Openmp is working properly though.
> The demo I posted does not use micro kernels that exploit SSE, AVX or > FMA instructions. With that the matrix product is on par with Intel > MKL. Just like BLIS. For my platforms I wrote > my own micro-kernels but the interface of function ugemm is compatible > to BLIS. > If you compile with -O3 I think you are getting near optimal SSE vectorization. gcc is truly impressive and intel is even more.