You are not logged in.
Hi,
after upgrading from `linux 6.18.13.arch1-1` to any later version, my eGPU setup stops working.
Hot-plugging the card and trying to load the `nvidia` modules (`modprobe nvidia-drm`) results in this error in `dmesg`:
NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:3d:00.0)Earlier in the journal I can see these messages:
kernel: pci 0000:3d:00.0: BAR 1 [mem size 0x10000000 64bit pref]: can't assign; no space
kernel: pci 0000:3d:00.0: BAR 1 [mem size 0x10000000 64bit pref]: failed to assignDowngrading back to `linux` and `linux-headers` `6.18.13.arch1-1`, as well as `nvidia-open` and `-utils` `590.48.01` (latest releases of each)
returns the lost functionality and the journal has slightly different messages related to BAR 1:
pci 0000:3d:00.0: BAR 1 [mem 0x00000000-0x0fffffff 64bit pref]
pci 0000:3d:00.0: BAR 1 [mem 0x6020000000-0x602fffffff 64bit pref]: assigned
nvidia 0000:3d:00.0: BAR 1 [mem 0x6020000000-0x602fffffff 64bit pref]: releasing
nvidia 0000:3d:00.0: BAR 1 [mem size 0x200000000 64bit pref]: can't assign; no space
nvidia 0000:3d:00.0: BAR 1 [mem size 0x200000000 64bit pref]: failed to assign
... scales down to 0x20000000 ...
nvidia 0000:3d:00.0: BAR 1 [mem size 0x20000000 64bit pref]: can't assign; no space
nvidia 0000:3d:00.0: BAR 1 [mem size 0x20000000 64bit pref]: failed to assign
nvidia 0000:3d:00.0: BAR 1 [mem 0x6020000000-0x602fffffff 64bit pref]: assignedand then it loads the driver just fine. I'm on a Thinkpad X1 Extreme, where "Resizable BAR" is not a visible BIOS option, and trying to set it via
`setup_var.efi` the BIOS didn't let me with a `READONLY variable` (or similar) message
(though I'm not 100% sure I did that right, was following this guide).
Is there something that changed in the kernel between 6.18 and 6.19 that affected how BAR is allocated? Maybe some build flag?
Anyway, I doubt tinkering with my BIOS in the UEFI shell is the right way to approach this. Thanks in advance for suggestions/help!
Kernel journal with the newer kernels (in this case 7.0.2, but identical to 6.19): https://pastebin.com/TUSHwXiu
Kernel journal with the older kernel: https://pastebin.com/UeQkPwNL
Kernel cmdline (I tried adding `pci=nocrs` and remove all the nvidia blacklist stuff, no change): `loglevel=3 quiet rw apparmor=1 lsm=landlock,yama,integrity,apparmor,bpf audit=1 audit_backlog_limit=8192 pcie_port_pm=off pci=realloc modprobe.blacklist=nouveau,nvidia,nvidia_drm,nvidia_uvm,nvidia_modeset`
Earliest date I can try with a Fedora machine to find out if it's just an Arch issue
is the 3rd of May, sorry.
P.S. the eGPU is at 0000:3d:00.0, whereas 0000:01:00.0 is the dedicated GPU in the laptop which is too old to work with nvidia-open, therefore it fails to load. That's intentional
Offline
There's an i915, a GeForce GTX 1050 Ti Mobile and the GeForce RTX 4060 ADA chip
The latter two are supported by https://aur.archlinux.org/packages?O=0&K=580xx - can you use the system (both GPUs) w/ those drivers?
Alternatively does it help to hide the 1050?
"pci_stub.ids=10de:1c8c,10de:0fb9", https://wiki.archlinux.org/title/Kernel_parameters
Online
Do I need to hide the 1050? I have a udev rule that disables it:
# Remove NVIDIA USB xHCI Host Controller devices, if present
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{device}=="0x1c8c", ATTR{power/control}="auto", ATTR{remove}="1"It's commented out right now but I remember trying to upgrade to 6.19 when I still had it enabled and the same exact problem being there with the BAR.
(Didn't know about the stub ids kernel param, though. I'll use that instead for disabling, thanks)
Last edited by kofola (2026-05-01 11:12:04)
Offline
The idea was just hat the flailing driver impacts the situation and for that it has to completely disappear
If you don't want to use the dGPU and only the eGPU the best approach would actually be to disable it in the firmware (what might implicitly take care of the BAR issue)
Online
That makes sense, unfortunately my BIOS doesn't let me disable the dGPU.. There's two options: Dedicated graphics only or Hybrid graphics. In addition there's an option to choose between the amount of allocated VRAM or something which allows me to choose between 256M and 512M, neither of which does anything.
Still, my main question remains what changed between kernel 6.18 and 6.19 for this to stop working. I'm not versed at all in kernel compilation and this commit changes a lot of things, one of which may theoretically be related to my issue.
Last edited by kofola (2026-05-01 11:30:53)
Offline
one of which may theoretically be related to my issue
I'd assume this to be far more related to the kernel version than some config key.
Does the LTS kernel (6.18.25) still work?
Online
Oh.
I thought the lts kernel was 6.19 and didn't even bother trying.. I'll test it when I'm home
Offline
Nope, same thing on 6.18.26-2:
May 03 18:36:39 kernel: pci 0000:3d:00.0: BAR 1 [mem size 0x10000000 64bit pref]: can't assign; no space
May 03 18:36:39 kernel: pci 0000:3d:00.0: BAR 1 [mem size 0x10000000 64bit pref]: failed to assign
...
May 03 18:38:33 kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 508
May 03 18:38:33 kernel: NVRM: GPU 0000:01:00.0 is already bound to pci-stub.
May 03 18:38:33 kernel: NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR1 is 0M @ 0x0 (PCI:0000:3d:00.0)
May 03 18:38:33 kernel: nvidia 0000:3d:00.0: probe with driver nvidia failed with error -1
May 03 18:38:33 kernel: NVRM: The NVIDIA probe routine was not called for 1 device(s).
May 03 18:38:33 kernel: NVRM: This can occur when another driver was loaded and
NVRM: obtained ownership of the NVIDIA device(s).
May 03 18:38:33 kernel: NVRM: Try unloading the conflicting kernel module (and/or
NVRM: reconfigure your kernel without the conflicting
NVRM: driver(s)), then try loading the NVIDIA kernel module
NVRM: again.
May 03 18:38:33 kernel: NVRM: The NVIDIA probe routine failed for 1 device(s).
May 03 18:38:33 kernel: NVRM: None of the NVIDIA devices were initialized.
May 03 18:38:33 kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 508Offline
https://archive.archlinux.org/packages/ … kg.tar.zst - you'll find all LTS kernels there, maybe we can isolate the breaking version (if 6.18.16 works, don't continue w/ 6.18.17 but 6.18.21, ie. bisect the search to narrow down on the breaking version)
Online
ngl I was hoping for a more profound solution ![]()
Offline
https://wiki.archlinux.org/title/Bisect … s_with_Git
But obviously bisecting the prebuilt kernels first will be much faster and w/ a little luck there's only two dozen commits in the breaking update and one looks incredibly suspicous because the answer to
Still, my main question remains what changed between kernel 6.18 and 6.19]
is about a million commits or so and ~50 in https://github.com/gregkh/linux/commits … rivers/pci
Online
So it's somewhere between 6.18.13 and 6.18.16 then, even the earliest LTS 6.18 doesn't work. All with the latest nvidia-open-dkms, I even rebuilt 6.18.13 therewith to rule out the nvidia driver version
Last edited by kofola (2026-05-03 22:26:32)
Offline
It's not gonna be the nvidia driver, this happens way earlier.
https://github.com/gregkh/linux/commits … rivers/pci - Commits on Mar 4, 2026 and Feb 26, 2026 are not in 6.18.13
The bridge window related commits in the latter look prone - question being whether they caused a bug or exposed an actual resource limitation, but even https://github.com/gregkh/linux/commit/ … c0bbcfa3eb might have caused this as side effect.
You'll have to build and try to build 6.18.14 and then bisect it against 6.18.13
The commit unfortunately show up starting v6.19.4 and the first 6.19 kernel arch packaged is v6.19.5 ![]()
Once you found the offending commit you'll have to take this upstream for an opinion on whether this is a bug or your setup only worked by accident ![]()
Online
So am I right in assuming I should build from the gitlab 6.18.13 source, keep the 6.18.13 github patches, and bump the version(s) to .14 (as well as those in config.x86_64)? Or will those patches not work? Can I set all pgpkeys and xxsums to SKIP? Do I just run makepkg to build?
Last edited by kofola (2026-05-04 16:27:49)
Offline
There's no 6.18.14 in https://github.com/archlinux/linux/
If you want to use that source i assume the most straightforward way would be to bisect between 11fc53e^1 (you'll still have to assert that this commit works) and 6.19.5
Online
Where can I find the makepkg configs the official packages are built with? Specifically /etc/makepkg.conf.d/rust.conf, as rust is complaining about "custom targets are unstable and require `-Zunstable-options`", guessing it's because my rust.conf is empty..
RUSTFLAGS="-C target-cpu=native" doesn't work
adding --target=x86_64-unknown-linux-gnu -Zunstable-options also does nothing...
Last edited by kofola (2026-05-04 18:09:37)
Offline
https://gitlab.archlinux.org/archlinux/ … 6ec9c7d313
The packages is supposed to build w/o any custom makepkg.conf's but there's https://github.com/rust-lang/rust/issues/151729 - why are you trying to build rust?
Online
why are you trying to build rust?
Because the kernel has rust code? I can't get past the prepare phase in the makefile
INSTALL libsubcmd_headers
HOSTCC /build/linux-lts/src/linux-6.18.14/tools/bpf/resolve_btfids/main.o
HOSTCC /build/linux-lts/src/linux-6.18.14/tools/bpf/resolve_btfids/rbtree.o
HOSTCC /build/linux-lts/src/linux-6.18.14/tools/bpf/resolve_btfids/zalloc.o
HOSTCC /build/linux-lts/src/linux-6.18.14/tools/bpf/resolve_btfids/string.o
HOSTCC /build/linux-lts/src/linux-6.18.14/tools/bpf/resolve_btfids/ctype.o
HOSTCC /build/linux-lts/src/linux-6.18.14/tools/bpf/resolve_btfids/str_error_r.o
HOSTLD /build/linux-lts/src/linux-6.18.14/tools/bpf/resolve_btfids/resolve_btfids-in.o
LINK resolve_btfids
RUSTC L rust/core.o
error: error loading target specification: custom targets are unstable and require `-Zunstable-options`
|
= help: run `rustc --print target-list` for a list of built-in targets
make[2]: *** [rust/Makefile:516: rust/core.o] Error 1
make[1]: *** [/build/linux-lts/src/linux-6.18.14/Makefile:1286: prepare] Error 2
make: *** [Makefile:248: __sub-make] Error 2
==> ERROR: A failure occurred in build().
Aborting...
==> ERROR: Build failed, check /var/lib/archbuild/core-testing-x86_64/user-1/build(same thing w/ just normal makepkg)
Not sure what I'm doing wrong, didn't alter anything in the PKGBUILD aside from the versions and checksums
Last edited by kofola (2026-05-04 20:58:37)
Offline
Maybe there's some way I could build it online via the AUR infrastructure, similar to Fedora's koji scratch-builds?
Last edited by kofola (2026-05-04 21:01:15)
Offline
It rather seemed you were trying to build rustc itself ![]()
https://wiki.archlinux.org/title/Arch_build_system
But you'd not be able to bisect anything that fetches a tarball this way.
https://bbs.archlinux.org/viewtopic.php … 4#p2270274 illustrates how the PKGBUILD can be adapted to build from git rather than then tarball.
Online
Yeah, that's pretty much exactly what I'm doing, adapting the PKGBUILD slightly and running pkgctl build, the rust error is still there...
Offline
Tried building 2 unmodified PKGBUILDS, 6.18.13 spew out the same exact error, 7.0.3-2 worked. Guess the latest rust version no longer works for compiling slightly out of date kernels. No clue what to do now.
Offline
No clue what to do now.
1. rant about rust in the kernel ![]()
2. Does
export RUSTFLAGS="-Zjson-target-spec"work?
Online
export KRUSTFLAGS="-Zjson-target-spec"this worked, thanks a lot (note the K, though)!. Neither 6.18.14 nor 6.19.4 work on my machine, though. Now I should try building from
1. https://github.com/gregkh/linux/commit/ … 560cbdb770
2. https://github.com/gregkh/linux/commit/ … 82f3ad3a9e
3. https://github.com/gregkh/linux/commit/ … c0bbcfa3eb
to see which one of these could have broken my setup, right?
Also what do you mean "worked by accident"? Do you think I never had enough space and it was being assigned anyway?? I'd say this still "breaks userspace" ![]()
Offline
Ok, I tried that first commit's archive, after that one a bit older, all the way to this one from 16th feb, that is one supposedly IN 6.18.13, and that same error occurred.
I also tried just rebuilding the official 6.18.13 from gitlab, just in case something in how I was building everything was the issue, and that works just fine (though I also had to put that rustflag as above), I'm on it right now.
This means that the commit affecting this is older than that one from 16th february, and is somehow NOT in this upstream tarball (???) Severely confused, logging off for the day
Offline