You are not logged in.

#1 2023-06-16 04:21:32

archisman
Member
Registered: 2020-05-05
Posts: 9

blas-openblas results in slower matrix diagonalization in GNU Octave

I recently install `blas-openblas` and `blas64-openblas` after seeing this news https://archlinux.org/news/openblas-032 … tervention

After that, matrix diagonalization in GNU Octave became significantly slower.

I found a workaround - Installing https://aur.archlinux.org/packages/openblas-lapack fixes the issue (but it takes a while to compile sad  )

Does anyone know any other workaround, for examples, will it help to install some package from the official repos?

Here is my GNU Octave code, which takes 12 seconds with `openblas-lapack` (it took the almost the same time before installing blas-openblas), but after installing blas-openblas, it takes about 19 seconds.

```
ii = 1:2000;
a = sin(ii.^2 + ii');
a = (a + a')/2;

tic;[b,c]=eig(a);toc
```

Last edited by archisman (2023-06-16 04:22:43)

Offline

#2 2023-06-16 08:11:51

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,754

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

Tried switching (back) to blas and lapack? However the point of these changes should be that the package you link to is logically obsolete...

Offline

#3 2023-06-16 11:06:47

arojas
Developer
From: Spain
Registered: 2011-10-09
Posts: 2,101

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

which blas implementation were you using before?

Offline

#4 2023-06-16 13:16:05

archisman
Member
Registered: 2020-05-05
Posts: 9

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

arojas wrote:

which blas implementation were you using before?

AFAIK, I was using openblas before the upgrade, but I did have `lapack` (not `openblas-lapack`). Installing `blas-openblas` removed `lapack`

I will switch to blas and lapack, and report back if that makes the calculation fast again.

Offline

#5 2023-07-30 14:05:36

ezacaria
Member
Registered: 2007-12-10
Posts: 113

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

Even though I try to stay away from octave as much as possible, I got curious.

Here's the performance with blas-openblas from the repo (bad performance):

octave:1> version -blas
ans = OpenBLAS (config: OpenBLAS 0.3.23 DYNAMIC_ARCH NO_AFFINITY USE_OPENMP SkylakeX MAX_THREADS=64)
octave:2> ii = 1:2000; a = sin(ii.^2 + ii'); a = (a + a')/2;
octave:3> tic;[b,c]=eig(a);toc
Elapsed time is 24.6372 seconds.

I went back to the old packages as V1del suggested. The performance is much better and close to the 12s that archisman reported.

pacman -S blas cblas lapack
resolving dependencies...
looking for conflicting packages...
:: blas and blas-openblas are in conflict. Remove blas-openblas? [y/N] y

Packages (4) blas-openblas-0.3.23-3 [removal]  blas-3.11.0-2  cblas-3.11.0-2  lapack-3.11.0-2

octave:1> version -blas
ans = unknown or reference BLAS
octave:2> ii = 1:2000; a = sin(ii.^2 + ii'); a = (a + a')/2;
octave:3> tic;[b,c]=eig(a);toc
Elapsed time is 10.4412 seconds.
octave:4> 

And then I tried the openblas-lapack that archisman pointed out. Best performance so far:

pacman -U /tmp/openblas-lapack/openblas-lapack-0.3.23-1-x86_64.pkg.tar.zst
loading packages...
resolving dependencies...
looking for conflicting packages...
:: openblas-lapack and openblas are in conflict. Remove openblas? [y/N] y
:: openblas-lapack and blas are in conflict. Remove blas? [y/N] y
:: openblas-lapack and lapack are in conflict. Remove lapack? [y/N] y
:: openblas-lapack and cblas are in conflict. Remove cblas? [y/N] y
...

octave:1> version -blas
ans = OpenBLAS (config: OpenBLAS 0.3.23 NO_AFFINITY USE_OPENMP USE_TLS SKYLAKEX MAX_THREADS=16)
octave:2> ii = 1:2000; a = sin(ii.^2 + ii'); a = (a + a')/2;
octave:3> tic;[b,c]=eig(a);toc
Elapsed time is 5.05404 seconds.

That leaves us with the two OpenBLAS 0.3.23 showing considerably different performance:

OpenBLAS (config: OpenBLAS 0.3.23 DYNAMIC_ARCH NO_AFFINITY USE_OPENMP SkylakeX MAX_THREADS=64) -- blas-openblas. 24s
OpenBLAS (config: OpenBLAS 0.3.23 NO_AFFINITY USE_OPENMP USE_TLS SKYLAKEX MAX_THREADS=16)      -- openblas-lapack. 5s

I found that "eig" might be implemented using LAPACK functions. This is true at least of numpy's eig.

And also this this old discussion:

The idea of LAPACK is that performance comes from a few kernel functions. These kernel functions are the BLAS functions. Basically all real computation is done in BLAS. So BLAS is like a standardized engine. There are brands like OpenBLAS or ATLAS implementing this set of functions.

In many cases the most crucial function is the matrix-matrix multiply (dgemm). If dgemm is fast then LAPACK is fast. As also parallelization is only done in the BLAS kernel. LAPACK itself is not parallel. As opposed to PLASMA which is doing parallelization on a mathematically higher level. But I think PLASMA still can't do eigenvalue computations like (d)geev.

I am not familiar with the OpenBLAS configuration, though -- no clue if the DYNAMIC_ARCH or MAX_THREADS are relevant, or if the key is in something not shown in those config strings reported by octave.
Just in case, my local compilations are made with "-march=native -mtune=native -O3".

Offline

#6 2024-01-24 09:56:16

archisman
Member
Registered: 2020-05-05
Posts: 9

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

There is an ongoing effort (see the comments in https://aur.archlinux.org/packages/openblas-lapack) from some part of the community to delete the `openblas-lapack` AUR package (which does not have this bug, while the official package has this bug). Someone mentioned filing a bug report on Arch Linux's gitlab, so that the package issue is fixed. From what I realized, that (https://gitlab.archlinux.org/archlinux/ … s/-/issues) is probably not the appropriate page to report bugs, and I don't have enough packaging knowledge to determine what causes this bug. How should I proceed?

Last edited by archisman (2024-01-24 09:56:46)

Offline

#7 2024-01-24 11:43:33

Lone_Wolf
Forum Moderator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,948

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

The package appears to have been deleted and that usually means all comments are gone.

The location you linked to IS the correct place to file issues, but https://bugs.archlinux.org/task/78781 suggests the current blas-openblas repo package should have the same functionality as the aur package .

Maybe you can test with current versions and start a new thread to discuss issues ?


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#8 2024-01-24 11:59:24

archisman
Member
Registered: 2020-05-05
Posts: 9

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

Sorry, the link had an extra bracket. The correct link is https://aur.archlinux.org/packages/openblas-lapack

> Maybe you can test with current versions and start a new thread to discuss issues ?

Did you mean, I should start a new thread in https://bbs.archlinux.org after testing the current version? Or should I report it at Arch Linux Gitlab?

Offline

#9 2024-01-24 13:13:11

Lone_Wolf
Forum Moderator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,948

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

Bugs need investigation to be solved succesfully. That investigation can be started on this forum (Creating & Modifying Packages or AUR Issues, Discussion & PKGBUILD Requests seem appropriate for this case) or in a bug report on gitlab.

An advantage of starting on this forum is that more people read/post stuff here then would do on a bug report.
Once the cause has been determined  (or narrowed down substantially) the info from the thread can be used as basis for a bug report.

Is there a reproducable testcase that shows the difference between repo & aur versions ?


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#10 2024-01-24 22:17:48

ezacaria
Member
Registered: 2007-12-10
Posts: 113

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

Lone_Wolf wrote:

Is there a reproducable testcase that shows the difference between repo & aur versions ?

Yes, the test used by archisman is good. It can be run in octave with a one-liner:

ii = 1:2000; a = sin(ii.^2 + ii'); a = (a + a')/2; tic;[b,c]=eig(a);toc

I just checked again. The performance is still better with the old blas+cblas+lapack, compared to the repo's blas-openblas (11 seconds vs 28 seconds). I also built the AUR package fresh, and the performance is still the best - at about 5 seconds. Thus, the situation seems to be the same as it was last year's July.

Maybe we can start hammering on the official PKGBUILD till we find which of the -D options has the most impact. The options from the PKGBUILD in the AUR package are few compared to the official's.

Edit: I tried to build the official package with different -D options but did not find anything that would yield the performance from the AUR package. I'm starting to think that the cmake files might have some different flags compared to the make versions (which is used on the AUR package).

Last edited by ezacaria (2024-01-25 00:27:31)

Offline

#11 2024-01-25 10:35:12

Lone_Wolf
Forum Moderator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,948

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

The repo package builds 2 versions : one with 64-bit integers (-DINTERFACE64=1) and one with whatever the default is.

I think the octave test uses floating point numbers ? Even if that's true, comparing the test with and without the 64-bit installed seems a good idea.

The other big difference is that the AUR package overrides MAKEFLAGS (unset MAKEFLAGS) and the repo package doesn't .

Building the repo package (in a clean chroot) with that command added just before the cmake -B lines and re-running the test should clarify if the MAKEFLAGS used by archlinux do have an impact .

Note that cmake by default creates GNU Makefiles, so both packages do use make as build system.

(I see cmake as a configure system, not a build system).

Last edited by Lone_Wolf (2024-01-25 10:36:17)


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#12 2024-01-25 11:00:40

ezacaria
Member
Registered: 2007-12-10
Posts: 113

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

Thanks, Lone_Wolf!
I also see it so (forgot to mention the "unset" in my post, though). The "64" variant plays no role here so far as archisman's test case, I had even deactivated it in the PKGBUILD to save some building time. But we should check carefully what happens to that package when we are reaching some conclusion.

But coming to that, I think it is time for archisman to give the building a shot and do some testing. I am a bit pressed for time these days, and not really using octave wink

Offline

#13 2024-03-06 16:57:08

archisman
Member
Registered: 2020-05-05
Posts: 9

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

I am sorry for the late reply.

I will be happy to do some testing. I am a bit unsure how to begin with.

Should I take this PKGBUILD (https://gitlab.archlinux.org/archlinux/ … type=heads), add

unset MAKEFLAGS

in

build()

right before building, and then get back with the results?

Last edited by archisman (2024-03-06 16:58:03)

Offline

#14 2024-03-09 13:28:41

Lone_Wolf
Forum Moderator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,948

Re: blas-openblas results in slower matrix diagonalization in GNU Octave

Yes, that's the idea.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

Board footer

Powered by FluxBB