You are not logged in.
Hi all,
Often when upgrading an Arch Linux distribution, unless it's a barebones one with no customisation, the nvidia kernel module breaks. The problem is that the nvidia module is compiled against a specific kernel version, but this version isn't listed in the package's dependencies. This means it's trivial to install an incompatible kernel and nvidia module and then on reboot there is no graphical environment, and you have to go through a lengthy trial and error process installing different versions of the kernel and/or the nvidia package in order to get a working system.
A very simple solution would be to list the kernel version as a dependency of the nvidia package, which would make it difficult to install incompatible versions and save a lot of hassle.
Someone opened a bug report about this but it was closed as 'not a bug' without a word of explanation, so I'm wondering what the reasons are, and why it would be preferable to have an unreliable package over a reliable one? I'm having trouble understanding the logic. Is it just that it's too much effort to update the kernel version number each time the nvidia module is rebuilt? Maybe there is another trusted user who could take over maintenance of the nvidia module if that's the case?
Anyway, since you can't comment on closed bugs I was just hoping to try to understand the reasons behind the decision to prefer a fragile upgrade process over a more robust one, given how seemingly simple it would be to achieve.
Offline
>The problem is that the nvidia module is compiled against a specific kernel version
This is why you should use the dkms version of the driver: It is not compiler against a specific kernel version as it is rebuilt when the kernel is updated
Fun fact: I actually have no clue what I'm doing
Offline
Sounds like you're doing partial updates, which is explicitly not supported. You updated everything or you update nothing.
Online
This means it's trivial to install an incompatible kernel and nvidia module and then on reboot there is no graphical environment,
There's a simple method to boot to a text console that allows to troubleshoot, just append systemd.unit=multi-user.target as kernel parameter to your bootloader .
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
Sounds like you're doing partial updates, which is explicitly not supported. You updated everything or you update nothing.
The problem is I don't want to do partial updates, but it happens without warning. The nvidia package needs to have the kernel version as a dependency to prevent this from happening. I guess I just don't understand why it's such a problem to list the kernel version in the nvidia package? Couldn't you script it so it would be automatic if that was the issue?
There's a simple method to boot to a text console that allows to troubleshoot, just append systemd.unit=multi-user.target as kernel parameter to your bootloader
Unfortunately that's not so simple for me because it requires finding a keyboard and crawling into a tight space to plug it in, and I can't see the screen from where I am typing. Normally I access the machine using Barrier (formerly Synergy) so it doesn't have any input devices plugged in. So generally it's a pain to troubleshoot like this which is why I'd like to avoid having to do it.
This is why you should use the dkms version of the driver: It is not compiler against a specific kernel version as it is rebuilt when the kernel is updated
This is probably the best workaround, as it sounds like nobody cares enough to fix the broken package!
Offline
Scimmia wrote:Sounds like you're doing partial updates, which is explicitly not supported. You updated everything or you update nothing.
The problem is I don't want to do partial updates, but it happens without warning.
And how, exactly, does that happen? Do you have specific examples?
Online
This means it's trivial to install an incompatible kernel and nvidia module
No, it's not. I guess with the mostr rotten of luck (or a systematically out-of-sync mirror), you might update at a moment when one package is updated in the repos, but the other one is not for the next minute or so, but that's not "trivial" - it means you did something to make the gods hate you
and then on reboot there is no graphical environment
Yes.
you have to go through a lengthy trial and error process installing different versions of the kernel and/or the nvidia package in order to get a working system.
No - you just update again.
There're two ways for this to be a problem:
1. As discussed: The god hates you. Really much. (In which case you should always wear a helmet, because a surprising amount of humans die from falling coconuts)
2. You're withholding critical information that amount to "I systematically run partial updates" and "installing different versions of the kernel" sounds a lot like this, eg. like if you're using a customized kernel or ignore the kernel for pacman updates.
In that case you should elaborate on your situation.
Offline
One scenario that I don't see covered in this thread is the rare case when nvidia package is updated while the kernel is not. Two solutions I know of are to create a hook that rebuilds the kernel image any time nvidia is updated, or, as I do, manually rebuild kernel image.
To emulate flesh machines, I am learning...
Offline
One scenario that I don't see covered in this thread is the rare case when nvidia package is updated while the kernel is not. Two solutions I know of are to create a hook that rebuilds the kernel image any time nvidia is updated, or, as I do, manually rebuild kernel image.
That will not cause a mismatch between the kernel and module, it's a different issue.
Online
That can cause a mismatch, but only if you actively added the nvidia modules to your initramfs, in which case you should already be aware and ideally already created a hook for that because you read the corresponding wiki page to add them to the initramfs in the first place.
Offline
That will cause a mismatch between the module and userspace tools, which isn't what the OP is talking about.
Online