You are not logged in.

#1 2022-10-17 07:25:19

Akkohito891
Member
Registered: 2022-10-17
Posts: 4

[SOLVED] Unable to boot archlinux after Kernel upgrade.

PC specs: Ryzen 3500, Nvidia GTX 1650, MSI B450M pro M2 Max, 16GB RAM.
System: Archlinux with all latest packages, linux kernel, kde plasma, nvidia/nvidia-open driver.

I upgraded my kernel{5.19 >> 6.0.2} and nvidia driver{nvidia-open 520.56.06-3} yesterday and since then my computer is unusable. Rebooting after the upgrade, system crashes around 10 secs after reaching graphical user interface. After the system crash the computer REFUSES to boot at all ! After force rebooting, I can't even get into the BIOS, and the VGA debug LED stays lit on my motherboard. Turning off power and coming back later to boot it works, but same error occours, again. Even the TTY is laggy after this update !
Here the logs for the gpu :

Oct 14 18:43:25 turing systemd[1]: Finished Load AppArmor profiles.
Oct 14 18:43:25 turing systemd-udevd[578]: nvidia: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidiactl c $(grep nvidia-frontend /proc/devices | cut -d \  -f 1) 255'' failed with exit code 1.
Oct 14 18:43:25 turing systemd-udevd[578]: nvidia: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \  -f 4); do /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c $(grep nvidia-frontend /p>
Oct 14 18:43:25 turing systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
Oct 14 18:43:25 turing systemd[1]: Condition check resulted in ST1000DM010-2EP102 Linux\x20swap being skipped.
Oct 14 18:43:25 turing kernel: sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
Oct 14 18:43:25 turing kernel: sp5100-tco sp5100-tco: Using 0xfeb00000 for watchdog MMIO address
Oct 14 18:43:25 turing kernel: nvidia-gpu 0000:29:00.3: enabling device (0000 -> 0002)
Oct 14 18:43:25 turing kernel: usbcore: registered new device driver apple-mfi-fastcharge
Oct 14 18:43:25 turing kernel: sp5100-tco sp5100-tco: initialized. heartbeat=60 sec (nowayout=0)
Oct 14 18:43:25 turing kernel: nvidia-uvm: Loaded the UVM driver, major device number 511.
Oct 14 18:43:26 turing kernel: nvidia-gpu 0000:29:00.3: i2c timeout error e0000000
Oct 14 18:43:26 turing kernel: ucsi_ccg 3-0008: i2c_transfer failed -110
Oct 14 18:43:26 turing kernel: ucsi_ccg 3-0008: ucsi_ccg_init failed - -110
Oct 14 18:43:26 turing kernel: ucsi_ccg: probe of 3-0008 failed with error -110
Oct 14 18:43:46 turing systemd[1007]: pam_warn(systemd-user:setcred): function=[pam_sm_setcred] flags=0x8004 service=[systemd-user] terminal=[] user=[sddm] ruser=[<unknown>] rhost=[<unknown>]
Oct 14 18:43:46 turing systemd[1]: user@973.service: Deactivated successfully.
Oct 14 18:43:46 turing systemd[1]: Stopped User Manager for UID 973.
Oct 14 18:43:46 turing audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=user@973 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 14 18:43:46 turing systemd[1]: Stopping User Runtime Directory /run/user/973...
Oct 14 18:43:46 turing systemd[1]: run-user-973.mount: Deactivated successfully.
Oct 14 18:43:46 turing systemd[1]: user-runtime-dir@973.service: Deactivated successfully.
Oct 14 18:43:46 turing systemd[1]: Stopped User Runtime Directory /run/user/973.
Oct 14 18:43:46 turing audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=user-runtime-dir@973 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 14 18:43:46 turing systemd[1]: Removed slice User Slice of UID 973.
Oct 14 18:43:50 turing systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Oct 14 18:43:50 turing audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 14 18:43:52 turing kernel: NVRM nvAssertFailedNoLog: Assertion failed: GPPut < WATCHDOG_GPFIFO_ENTRIES @ kernel_rc_watchdog.c:1261
Oct 14 18:43:52 turing kernel: NVRM: GPU at PCI:0000:29:00: GPU-17f9f589-b1fa-de51-71ba-159714cd6965
Oct 14 18:43:52 turing kernel: NVRM: Xid (PCI:0000:29:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Oct 14 18:43:52 turing kernel: NVRM: GPU 0000:29:00.0: GPU has fallen off the bus.
Oct 14 18:43:52 turing kernel: NVRM prbEncStartAlloc: Can't allocate memory for protocol buffers.
Oct 14 18:43:52 turing kernel: NVRM: A GPU crash dump has been created. If possible, please run
                               NVRM: nvidia-bug-report.sh as root to collect this data before
                               NVRM: the NVIDIA kernel module is unloaded.
Oct 14 18:43:54 turing NetworkManager[938]: <info>  [1665753234.1169] dhcp4 (enp37s0): state changed new lease, address=192.168.1.103
Oct 14 18:43:54 turing NetworkManager[938]: <info>  [1665753234.1172] manager: NetworkManager state is now CONNECTED_SITE
Oct 14 18:43:54 turing NetworkManager[938]: <info>  [1665753234.1172] policy: set 'Wired connection 1' (enp37s0) as default for IPv4 routing and DNS
Oct 14 18:43:54 turing dbus-daemon[924]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.5' (uid=0 pid=938 comm="/usr/bin/NetworkManager ->
Oct 14 18:43:54 turing systemd[1]: Starting Network Manager Script Dispatcher Service...
Oct 14 18:43:54 turing dbus-daemon[924]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'
Oct 14 18:43:54 turing systemd[1]: Started Network Manager Script Dispatcher Service.
Oct 14 18:43:54 turing audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=unconfined msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Oct 14 18:43:54 turing NetworkManager[938]: <info>  [1665753234.6609] manager: NetworkManager state is now CONNECTED_GLOBAL

Can Some one please help me figure this out ? I don't think that my GPU or My Motherboard are malfunctioning, because I can properly boot into manjaro live iso with GUI working. I've also tried the standard nvidia driver and the same error occurs.

Last edited by Akkohito891 (2022-10-20 17:20:14)

Offline

#2 2022-10-17 07:49:41

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,788

Re: [SOLVED] Unable to boot archlinux after Kernel upgrade.

Install linux-lts and nvidia-lts and wait for nvidia to align the the ACPI API changes in the kernel.

Offline

#3 2022-10-17 08:00:35

Akkohito891
Member
Registered: 2022-10-17
Posts: 4

Re: [SOLVED] Unable to boot archlinux after Kernel upgrade.

seth wrote:

Install linux-lts and nvidia-lts and wait for nvidia to align the the ACPI API changes in the kernel.

Okay ! I installed the lts kernel and nvidia driver, and updated the systemd-boot entry, this time the GUI did not crash but the inconsistent boot behaviour remained, sometimes when I reboot from the gui normally, the computer gets stuck with the vga led on, I have to cut power to boot it again.
Sometimes I can only boot into my system by first entering the bios, then selecting "exit without saving", which then triggers the arch boot process.
As of now I have cleared the cmos battery and it seems to work fine. Hopefully this will be fixed soon.

**How do I know that the nvidia API changes are made ? Do I have to temporarily update every few days and check if the driver works ??**

Offline

#4 2022-10-17 08:10:02

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,788

Re: [SOLVED] Unable to boot archlinux after Kernel upgrade.

Hopefully nvidia will address that in their next major or minor release, but as long as the version says "520.56" you don't have to try.

sometimes when I reboot from the gui normally, the computer gets stuck with the vga led on, I have to cut power to boot it again

Do you have a complete system journal for such boot?
Are there any error messages on screen while rebooting?
Does this also happen if you only boot the multi-user.target (2nd link below) and/or w/ "nomodeset" (the parameter will effectively prevent any GUI session, but your GUI target might be stubborn and crash to eternity, so pair it w/ the mutli-user.target) and "systemctl reboot" from there?

Offline

#5 2022-10-17 09:17:50

Akkohito891
Member
Registered: 2022-10-17
Posts: 4

Re: [SOLVED] Unable to boot archlinux after Kernel upgrade.

seth wrote:

Hopefully nvidia will address that in their next major or minor release, but as long as the version says "520.56" you don't have to try.

Okay !

seth wrote:

Do you have a complete system journal for such boot?

I can just get the journal with

journalclt -b -4 > crash.log

But the actual error occurs when the system is not running, so I don't know if the journal will be of any help hmm

seth wrote:

Are there any error messages on screen while rebooting?

I don't see anything out of the odinary during the shutdown sequence, after which there is only a black screen and the pc gets stuck trying to reboot hmm

seth wrote:

Does this also happen if you only boot the multi-user.target (2nd link below) and/or w/ "nomodeset" (the parameter will effectively prevent any GUI session, but your GUI target might be stubborn and crash to eternity, so pair it w/ the mutli-user.target) and "systemctl reboot" from there?

I don't quite understand what you said here sad I'm still relatively new to arch. I will search these up in the archwiki to get a better understanding.
Thanks smile

Offline

#6 2022-10-20 17:18:40

Akkohito891
Member
Registered: 2022-10-17
Posts: 4

Re: [SOLVED] Unable to boot archlinux after Kernel upgrade.

Update: System is running normally after replacing the latest linux kernel and driver with linux-lts and nvidia-lts.
One thing that I do want to note is that using the nvidia-dkms or nvidia-open-dkms seems to work just fine with the latest 6.0 linux kernel, probably because the dkms is designed to adapt to your kernel, so for now I have upgraded to linux 6.0 with the nvidia-dkms driver and it seem to be working just fine. I'm planning to move to amd gpu in the long run. Marking this issue as solved !

TLDR: Just use nvidia-dkms or nvidia-open-dkms if you're facing issues with the current nvidia driver. Or as a last resort you can use the linux-lts and nvidia-lts/nvidia-open-lts variants until the issue is fixed. Or just use an AMD gpu instead smile

Offline

Board footer

Powered by FluxBB