python-pytorch-lightning pulls in way more dependencies than it should

Aphid_ARCH · 2024-07-04 20:11:02

I've been trying to get this installed for the last week.

Before I've got things compiled some dependency in its enormous tree will already have upgraded to a new version and fails the build. I literally can't compile fast enough to keep up.

Something's wrong somewhere in its dependency chain that causes it to pull in pretty much every machine learning package in the entire AUR. Probably several cases of something that isn't marked as optional should be. I don't know exactly where, maybe it's at the 'safetensors' level?

If I read the AUR correctly, pytorch-lightning depends on python-torchmetrics, which pulls in python-transformers, which pulls in python-safetensors, which then ends up pulling in both python-jax and python-tensorflow, which then end up pulling in jax and tensorflow, which will require a binary 'driver' low-level thing of that written in C/FORTRAN/etc, which are both huge and error-prone compiles (like most ML software, written for NVidia first. If you have something different, well you're on your own).

The dependencies aren't even complete w.r.t. those tests; libhdf5 is missing in the list (which should be pulled in with tensorflow). If anything though; safetensors doesn't necessarily depend on those frameworks; the tests should be optional and only ran if the optional dependency is installed, rather than the current behaviour, which is running all the tests regardless (this causes the build to fail without the optional dependencies). See https://github.com/huggingface/safetens … thon/tests and

At least, I'm pretty sure what's a wrapper around pytorch shouldn't be pulling in two rival machine learning frameworks as well (tensorflow and jax). So some of these packages are mistakenly pulling in dependencies they shouldn't. Python tests failing on compilation notwithstanding.

Halfway-through trying to install it, a quick ncdu of my home folder results in... 1,1 million files, 35GB of cache. It was 60GB at one point during a previous attempt, but I've managed to partially install the web of dependencies and cut it down 'a little'. Anyhow, I'm pretty sure something's wrong. Maybe this is just a completely garbage piece of software, but pulling in even more unnecessary stuff that can fail to compile doesn't help getting it to run.

If it helps, I'm trying to use the 'rocm' versions of the dependencies that are low-level.

To reproduce, install arch linux on a fresh machine with a capable amd GPU. Install some AUR helper (you're gonna need it), then run e.g. 'yay -Sy python-pytorch-lightning'. Then suffer endless torture in dependency hell.

Question: If there are multiple packages in the PKGBUILD/SRCINFO files, how do I tell makepkg which one to build if I'm building manually? I've never had to do that with such a package and always just cargo-cult used '-si'.

Note I recommend having at least a 1TB SSD to fit everything.

Edit: I finally got through python-safetensors step, and I still can't compile things as python-safetensors actually manages to depend on different versions of the same library through its various code paths from those tests. In other words, it's literally impossible to install this package without diving into and modifying code.

ERROR tests/test_tf_comparison.py - AttributeError: `np.complex_` was removed in the NumPy 2.0 release. Use `np.complex128` instead.. Did you mean: 'complex64'?

__> Either find out a way to run multiple versions of Numpy simultaneously or fix their tests. I'll file a bug report upstream for that.

Question: If there are multiple packages in the .PKGBUILD and .SRCINFO files, how do I select which one makepkg should make? I normally always used -si which now seems to start to build the first option, but I want another one...

Last edited by Aphid_ARCH (2024-07-05 06:36:33)

Lone_Wolf · 2024-07-05 10:09:53

man makepkg wrote:

--nocheck
Do not run the check() function in the PKGBUILD or handle the checkdepends.

Running check() function while building is considered optional for aur packages (may be mandatory for PM who builds a repo package) .

Question: If there are multiple packages in the .PKGBUILD and .SRCINFO files, how do I select which one makepkg should make?

PKGBUILDs & .SRCINFO files list only one thing, say foo .

pacman searches for packages that provide foo like this :

1. Does any installed package provide foo ?
yes > stop searching and goto End as requirement forr foo has been fulfilled

2 . read all configured repos listed in pacman.conf and process them in [the listed order
Stop at the first package found that provides foo and install it, then goto End.

3. no package has been found that provides foo
create error message, goto End

End :
report result to user.

That shows 2 options to influence what is used : install it before building or list the repo you want to use it from above other repos that also provide foo;.

example : when using testing repos they must be listed above their non-testing counterpart to function as intended.

Aphid_ARCH · 2024-07-05 17:31:31

Maybe I should give an explicit example:

https://github.com/rocm-arch/tensorflow … r/PKGBUILD

Here's one with four options; tensorflow-rocm, tensorflow-opt-rocm, python-tensorflow-rocm, python-tensorflow-opt-rocm
When I use yay I can select the fourth option (and fail to compile because it's horribly out of date).

If I manually fix the pkgbuild and do makepkg -si, it'll always do option (1), which is not what I want.

loqs · 2024-07-05 18:42:23

makepkg -si

The issue is -i installing all the built packages. For your use case you only want the -opts packages. You do not even want to build the none -opt ones. This PKGBUILD contains suppport for doing that; when editing the PKGBUILD change `_build_no_opt=1` to `_build_no_opt=0`. In general you could just the build the packages without installing them by dropping the -i option then selecting the packages you want to install.

Last edited by loqs (2024-07-05 18:42:56)

Aphid_ARCH · 2024-07-06 12:39:43

Using --nocheck allows me to just install only the safetensors python package without any of the ML frameworks, which allowed me to get further. Didn't end up being of any use in the end though.

Of course, now I run into the problem that Arch's rocm packages are apparently super broken and when used, corrupt the GPUs memory into random garbage, which causes either a complete system meltdown or a GPU reset and crash.

I've been spending yesterday and today trying to compile it myself but this is just way beyond what I can do. The official instructions are full of utter crap, like downloading random google tools, linking up with google accounts, assuming you're running a certain version of ubuntu, downloading a bunch of binaries just for the build, using tools that don't include a bisect option (for something this unstable...) depending on a tonne of python shit it clearly doesn't/shouldn't need just to compile, chown-ing my whole homedir to my UID (I mean, this doesn't matter in my case, but could very well matter in someone else's), and then to top it all off have utterly unhelpful generic 'ERROR 1' messages. It's nice they have a build script... if it wasn't so crappy.

Honestly, no wonder rocm is on hold and has been broken for months, I don't blame your maintainers. I should probably return this GPU, it belongs on the scrap heap.

Arch Linux

#1 2024-07-04 20:11:02

python-pytorch-lightning pulls in way more dependencies than it should

#2 2024-07-05 10:09:53

Re: python-pytorch-lightning pulls in way more dependencies than it should

#3 2024-07-05 17:31:31

Re: python-pytorch-lightning pulls in way more dependencies than it should

#4 2024-07-05 18:42:23

Re: python-pytorch-lightning pulls in way more dependencies than it should

#5 2024-07-06 12:39:43

Re: python-pytorch-lightning pulls in way more dependencies than it should

Board footer