Open
Description
Hi!
In comparing OpenBLAS performance with Intel MKL I've noticed that (at least in my particular case: real or hermitian eigenvalue problem, e.g. ZHEEV) OpenBLAS is consuming too much more kernel times (red bars in htop) than Intel MKL and maybe this is why it is so slow (three to five times slower, depending on matrix size) compared to MKL. Does anybody know what is causing so much kernel threads/time and how to avoid it? I've already limited OPENBLAS_NUM_THREADS to 4 or 8... TIA.