You are not logged in.

#1 2023-08-25 19:13:49

GaryScottMartin
Member
From: Tehachapi, California, U.S.
Registered: 2021-08-21
Posts: 20
Website

[SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

Prior to changing the GPU, I uninstalled xf86-video-amdgpu and everything else associated with the graphics driver (except mesa) using pacman. I have tried nouveau, the proprietary driver, and the modesetting driver. In all cases the kernel boots through mounting of the file system, but at about the point where it should be loading the GPU driver, further output ceases, leaving the preceding kernel messages on the screen.

The live iso  boots just fine, so I know that there must be a way to boot to a recovery console, but I haven't been able to find it thus far.

What am I missing? Suggestions?

Gary

Last edited by GaryScottMartin (2023-09-16 09:31:59)

Offline

#2 2023-08-25 23:16:03

Scimmia
Fellow
Registered: 2012-09-01
Posts: 13,694

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

What should it do? If you're booting graphical.target with a DM, you just have to switch to another tty, or boot mulit-user.target.

Offline

#3 2023-08-26 01:31:25

GaryScottMartin
Member
From: Tehachapi, California, U.S.
Registered: 2021-08-21
Posts: 20
Website

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

If you're booting to an emergency recovery console, it shouldn't be trying to start the DM, it should only be trying to get you to a command line root login prompt.  It never gets there.

Offline

#4 2023-08-26 02:11:02

Scimmia
Fellow
Registered: 2012-09-01
Posts: 13,694

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

But you aren't booting into an emergency recovery console.

Offline

#5 2023-08-26 06:45:19

GaryScottMartin
Member
From: Tehachapi, California, U.S.
Registered: 2021-08-21
Posts: 20
Website

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

I have grub set to leave the recovery options I can select at boot time. After the standard kernel failed, I tried to boot the recovery selections and they exhibited the same behavior: Kernel messages halting shorting into the kernel boot, long before an alternate TTY becomes available, precluding the possibility of going to another TTY. This necessitated the use of the live ISO as a rescue disk to chroot into the system and experiment with changes.

I am currently typing this on the system though. I still have issues, but I have the NVIDIA driver working, at least with the Linux-hardened kernel. With the basic Linux kernel, SDDM displays appropriately, but Plasma apparently dies while starting. I normally use the Linux-zen kernel, but I haven't tested it yet.

I have suggestions for moving some of the info from the NVIDIA Troubleshooting page in the Wiki to the basic NVIDIA page or pointing to it from the NVIDIA Wiki Page. There is information about the kernel parameters that are necessary to prevent Nouveau from interfering with the NVIDIA driver in a number of places. That information should be consolidated and improved to make it easier to get the NVIDIA driver to work properly "out of the box."

Offline

#6 2023-08-26 06:54:08

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,313

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

2nd link below, also try "nomodeset" - we'll need some data out of the system, there're too many ways to fuck this up.
If you can reboot out of the broken state by frenetically pressing ctrl+alt+del you'll retain the journal, otherwise you'll need to use https://wiki.archlinux.org/title/Keyboa … el_(SysRq)

Edit: Please post your complete system journal for the boot:

sudo journalctl -b | curl -F 'file=@-' 0x0.st

Last edited by seth (2023-08-26 06:54:56)

Online

#7 2023-08-26 07:52:47

GaryScottMartin
Member
From: Tehachapi, California, U.S.
Registered: 2021-08-21
Posts: 20
Website

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

I have confirmed that the problem is only (apparently) resolved for the Linux-hardened kernel. Both the standard Linux kernel and the Linux-Zen kernel fail after SDDM tries to start Plasma. Linux-Zen seems to work for a few seconds after Plasma starts. However, Plasma never appears on the standard Linux kernel. The problem is the loss of all graphics card output. The scene goes black and the backlight goes out briefly. It is not possible to shift to a TTY, there is no response to CTRL-ALT-F2/F3/F4, although shutting the system off by momentarily pressing the power button causes the display of TTY output to occur briefly (~0.5-1.0 sec) before poweroff.

Journal from the current boot (Linux-Hardened)

Here are my current kernel parameters from my /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="lsm=landlock,lockdown,yama,apparmor,bpf audit=l loglevel=3 quiet nomodeset nouveau.modeset=0 nvidia_drm.modeset=1"

All headers are installed:
[gary@TehachapiMtn ~]$ pacman -Q | grep "headers"
ffnvcodec-headers 12.0.16.0-1
linux-api-headers 6.4-1
linux-hardened-headers 6.4.11.hardened1-1
linux-headers 6.4.12.arch1-1
linux-zen-headers 6.4.12.zen1-1
vulkan-headers 1:1.3.257-1
[gary@TehachapiMtn ~]$

[gary@TehachapiMtn ~]$ pacman -Q | grep "headers"
ffnvcodec-headers 12.0.16.0-1
linux-api-headers 6.4-1
linux-hardened-headers 6.4.11.hardened1-1
linux-headers 6.4.12.arch1-1
linux-zen-headers 6.4.12.zen1-1
vulkan-headers 1:1.3.257-1
[gary@TehachapiMtn ~]$ pacman -Q | grep "linux"
archlinux-appstream-data 20230715-1
archlinux-keyring 20230821-1
archlinux-themes-sddm 2.0-1
lib32-util-linux 2.39.2-1
linux 6.4.12.arch1-1
linux-api-headers 6.4-1
linux-firmware 20230804.7be2766d-2
linux-firmware-whence 20230804.7be2766d-2
linux-hardened 6.4.11.hardened1-1
linux-hardened-headers 6.4.11.hardened1-1
linux-headers 6.4.12.arch1-1
linux-zen 6.4.12.zen1-1
linux-zen-headers 6.4.12.zen1-1

The hardened kernel is still at 6.4.11, the other two are now at 6.4.12. Don't know if that is significant.

Last edited by GaryScottMartin (2023-08-26 08:08:37)

Offline

#8 2023-08-26 07:58:32

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,313

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

Coincidence-facts.jpg
Post the actual journals, the hardened kernel will likely just result in different kernel module presence because you're using dkms and forgot to install the headers for it or whatever.
Anecdotal "worked on a friday while I was scratching my butt" won't lead to a systematic solution.

Online

#9 2023-08-26 08:40:42

GaryScottMartin
Member
From: Tehachapi, California, U.S.
Registered: 2021-08-21
Posts: 20
Website

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

Hardened Kernel (boot to graphical target]
Standard Linux Kernel (boot to multi-user target)
Linux Zen Kernel (boot to multi-user target)

BTW, the problem is intermittent. It occurred immediately after I logged into SDDM during my first attempt to boot the hardened kernel in order to post this. This is the second attempt, so the hardened kernel has been successful in three of four attempts. The standard kernel is 0 for 3 and so is Linux-Zen.

Good Night & Thanks

Last edited by GaryScottMartin (2023-08-26 08:45:46)

Offline

#10 2023-08-26 12:24:32

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,313

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

You've
* networkmanager
* systemd-networkd
* dhcpcd
* dhclient
running, causing a massive mess on the network stack. Pick one, disable the others

lsm=landlock,lockdown,yama,apparmor,bpf audit=l … quiet nomodeset nouveau.modeset=0

Remove all of that from the kernel commandline

6.4.12-arch1-1 and 6.4.12-zen1-1-zen don't attmpt to start the graphical.target, but there's no limit in the kernel commandline?
Did you change the default target on disk between boots?

The hardened kernel then actually crashes in the nvidia-drm module

Aug 26 00:41:05 TehachapiMtn kernel: ------------[ cut here ]------------
Aug 26 00:41:05 TehachapiMtn kernel: ioremap on RAM at 0x0000000308392000 - 0x00000003084bdfff
Aug 26 00:41:05 TehachapiMtn kernel: WARNING: CPU: 0 PID: 719 at arch/x86/mm/ioremap.c:216 __ioremap_caller+0x381/0x3a0
Aug 26 00:41:05 TehachapiMtn kernel: Modules linked in: overlay nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 cfg80211 8021q nf_tables garp mrp stp nfnetlink llc hid_logitech_hidpp mousedev joydev input_leds hid_logitech_dj uvcvideo uvc videobuf2_vmalloc btusb videobuf2_memops videobuf2_v4l2 btrtl snd_usb_audio btbcm videodev btintel snd_usbmidi_lib btmtk snd_rawmidi usbhid videobuf2_common snd_seq_device bluetooth mc ecdh_generic nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) intel_rapl_msr intel_rapl_common nvidia(POE) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm spi_nor irqbypass crct10dif_pclmul mtd snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic crc32_pclmul polyval_clmulni snd_hda_intel snd_intel_dspcfg iTCO_wdt polyval_generic gf128mul snd_intel_sdw_acpi at24 snd_hda_codec eeepc_wmi mei_pxp mei_hdcp asus_wmi snd_hda_core ghash_clmulni_intel battery sha512_ssse3 intel_pmc_bxt snd_hwdep aesni_intel ext4 snd_pcm ledtrig_audio sg
Aug 26 00:41:05 TehachapiMtn kernel:  spi_intel_platform crypto_simd crc16 iTCO_vendor_support sparse_keymap mei_me cryptd spi_intel rapl i2c_i801 snd_timer platform_profile mbcache rfkill wmi_bmof intel_cstate pcspkr snd i2c_smbus intel_uncore soundcore e1000e video mei lpc_ich jbd2 wmi intel_smartconnect evdev mac_hid fuse crypto_user dm_mod loop dmi_sysfs ip_tables x_tables uas usb_storage btrfs blake2b_generic libcrc32c crc32c_generic xor lzo_compress raid6_pq sr_mod crc32c_intel xhci_pci cdrom xhci_pci_renesas
Aug 26 00:41:05 TehachapiMtn kernel: CPU: 0 PID: 719 Comm: kwin_wayland Tainted: P           OE   T  6.4.11-hardened1-1-hardened #1
Aug 26 00:41:05 TehachapiMtn kernel: Hardware name: ASUS All Series/H87M-E, BIOS 2201 06/18/2015
Aug 26 00:41:05 TehachapiMtn kernel: RIP: 0010:__ioremap_caller+0x381/0x3a0
Aug 26 00:41:05 TehachapiMtn kernel: Code: 85 c0 0f 89 ae fd ff ff e9 50 fd ff ff 48 8d 54 24 28 48 8d 74 24 18 48 c7 c7 82 6e 59 8f c6 05 b8 d5 92 01 01 e8 2f d6 01 00 <0f> 0b e9 48 fd ff ff 89 c6 48 c7 c7 e8 d5 60 8f e8 3a 99 0a 00 e9
Aug 26 00:41:05 TehachapiMtn kernel: RSP: 0018:ffffa6a141f6fa90 EFLAGS: 00010282
Aug 26 00:41:05 TehachapiMtn kernel: RAX: 0000000000000000 RBX: 0000000308392000 RCX: 0000000000000000
Aug 26 00:41:05 TehachapiMtn kernel: RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
Aug 26 00:41:05 TehachapiMtn kernel: RBP: ffff8f913ac47a00 R08: 0000000000000000 R09: ffffa6a141f6f920
Aug 26 00:41:05 TehachapiMtn kernel: R10: 0000000000000003 R11: ffffffff8f8ca368 R12: 0000000000000001
Aug 26 00:41:05 TehachapiMtn kernel: R13: 000000000012c000 R14: ffffffffc0fecdb0 R15: 0000000000000000
Aug 26 00:41:05 TehachapiMtn kernel: FS:  00006f106202e640(0000) GS:ffff8f938ec00000(0000) knlGS:0000000000000000
Aug 26 00:41:05 TehachapiMtn kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 26 00:41:05 TehachapiMtn kernel: CR2: 00006f104003a000 CR3: 00000003d87fc002 CR4: 00000000001706f0
Aug 26 00:41:05 TehachapiMtn kernel: Call Trace:
Aug 26 00:41:05 TehachapiMtn kernel:  <TASK>
Aug 26 00:41:05 TehachapiMtn kernel:  ? __ioremap_caller+0x381/0x3a0
Aug 26 00:41:05 TehachapiMtn kernel:  ? __warn+0x7e/0x130
Aug 26 00:41:05 TehachapiMtn kernel:  ? __ioremap_caller+0x381/0x3a0
Aug 26 00:41:05 TehachapiMtn kernel:  ? report_bug+0x191/0x1c0
Aug 26 00:41:05 TehachapiMtn kernel:  ? prb_read_valid+0x1b/0x30
Aug 26 00:41:05 TehachapiMtn kernel:  ? handle_bug+0x3c/0x80
Aug 26 00:41:05 TehachapiMtn kernel:  ? exc_invalid_op+0x17/0x70
Aug 26 00:41:05 TehachapiMtn kernel:  ? asm_exc_invalid_op+0x1a/0x20
Aug 26 00:41:05 TehachapiMtn kernel:  ? __nv_drm_gem_nvkms_map+0x60/0xb0 [nvidia_drm]
Aug 26 00:41:05 TehachapiMtn kernel:  ? __ioremap_caller+0x381/0x3a0
Aug 26 00:41:05 TehachapiMtn kernel:  __nv_drm_gem_nvkms_map+0x60/0xb0 [nvidia_drm]
Aug 26 00:41:05 TehachapiMtn kernel:  __nv_drm_gem_nvkms_prime_vmap+0x28/0x40 [nvidia_drm]
Aug 26 00:41:05 TehachapiMtn kernel:  nv_drm_gem_vmap+0x27/0x50 [nvidia_drm]
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_vmap+0x22/0x50
Aug 26 00:41:05 TehachapiMtn kernel:  dma_buf_vmap+0x81/0x100
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_shmem_vmap_locked+0x27/0x1c0
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_shmem_object_vmap+0x31/0x50
Aug 26 00:41:05 TehachapiMtn kernel:  ? dma_resv_get_singleton+0x46/0x140
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_vmap+0x22/0x50
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_vmap_unlocked+0x2a/0x50
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_fb_vmap+0x41/0x120
Aug 26 00:41:05 TehachapiMtn kernel:  drm_atomic_helper_prepare_planes+0x17b/0x210
Aug 26 00:41:05 TehachapiMtn kernel:  drm_atomic_helper_commit+0x78/0x140
Aug 26 00:41:05 TehachapiMtn kernel:  drm_mode_atomic_ioctl+0x9b2/0xbc0
Aug 26 00:41:05 TehachapiMtn kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Aug 26 00:41:05 TehachapiMtn kernel:  drm_ioctl_kernel+0xcd/0x170
Aug 26 00:41:05 TehachapiMtn kernel:  drm_ioctl+0x26f/0x4a0
Aug 26 00:41:05 TehachapiMtn kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Aug 26 00:41:05 TehachapiMtn kernel:  __x64_sys_ioctl+0x97/0xd0
Aug 26 00:41:05 TehachapiMtn kernel:  do_syscall_64+0x60/0x90
Aug 26 00:41:05 TehachapiMtn kernel:  entry_SYSCALL_64_after_hwframe+0x77/0xe1
Aug 26 00:41:05 TehachapiMtn kernel: RIP: 0033:0x6f106690c9df
Aug 26 00:41:05 TehachapiMtn kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Aug 26 00:41:05 TehachapiMtn kernel: RSP: 002b:000078411f5b3340 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 26 00:41:05 TehachapiMtn kernel: RAX: ffffffffffffffda RBX: 00000fc3383528b0 RCX: 00006f106690c9df
Aug 26 00:41:05 TehachapiMtn kernel: RDX: 000078411f5b33e0 RSI: 00000000c03864bc RDI: 0000000000000014
Aug 26 00:41:05 TehachapiMtn kernel: RBP: 000078411f5b33e0 R08: 0000000000000020 R09: 0000000000000001
Aug 26 00:41:05 TehachapiMtn kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 00000000c03864bc
Aug 26 00:41:05 TehachapiMtn kernel: R13: 0000000000000014 R14: 00000fc3380b11e0 R15: 00000fc338352af0
Aug 26 00:41:05 TehachapiMtn kernel:  </TASK>
Aug 26 00:41:05 TehachapiMtn kernel: ---[ end trace 0000000000000000 ]---
Aug 26 00:41:05 TehachapiMtn kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Failed to ioremap_wc NvKmsKapiMemory 0x00000000b56f02fa
Aug 26 00:41:05 TehachapiMtn kernel: ------------[ cut here ]------------
Aug 26 00:41:05 TehachapiMtn kernel: WARNING: CPU: 0 PID: 719 at drivers/dma-buf/dma-buf.c:1537 dma_buf_vmap+0xf0/0x100
Aug 26 00:41:05 TehachapiMtn kernel: Modules linked in: overlay nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 cfg80211 8021q nf_tables garp mrp stp nfnetlink llc hid_logitech_hidpp mousedev joydev input_leds hid_logitech_dj uvcvideo uvc videobuf2_vmalloc btusb videobuf2_memops videobuf2_v4l2 btrtl snd_usb_audio btbcm videodev btintel snd_usbmidi_lib btmtk snd_rawmidi usbhid videobuf2_common snd_seq_device bluetooth mc ecdh_generic nvidia_uvm(POE) nvidia_drm(POE) nvidia_modeset(POE) intel_rapl_msr intel_rapl_common nvidia(POE) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm spi_nor irqbypass crct10dif_pclmul mtd snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic crc32_pclmul polyval_clmulni snd_hda_intel snd_intel_dspcfg iTCO_wdt polyval_generic gf128mul snd_intel_sdw_acpi at24 snd_hda_codec eeepc_wmi mei_pxp mei_hdcp asus_wmi snd_hda_core ghash_clmulni_intel battery sha512_ssse3 intel_pmc_bxt snd_hwdep aesni_intel ext4 snd_pcm ledtrig_audio sg
Aug 26 00:41:05 TehachapiMtn kernel:  spi_intel_platform crypto_simd crc16 iTCO_vendor_support sparse_keymap mei_me cryptd spi_intel rapl i2c_i801 snd_timer platform_profile mbcache rfkill wmi_bmof intel_cstate pcspkr snd i2c_smbus intel_uncore soundcore e1000e video mei lpc_ich jbd2 wmi intel_smartconnect evdev mac_hid fuse crypto_user dm_mod loop dmi_sysfs ip_tables x_tables uas usb_storage btrfs blake2b_generic libcrc32c crc32c_generic xor lzo_compress raid6_pq sr_mod crc32c_intel xhci_pci cdrom xhci_pci_renesas
Aug 26 00:41:05 TehachapiMtn kernel: CPU: 0 PID: 719 Comm: kwin_wayland Tainted: P        W  OE   T  6.4.11-hardened1-1-hardened #1
Aug 26 00:41:05 TehachapiMtn kernel: Hardware name: ASUS All Series/H87M-E, BIOS 2201 06/18/2015
Aug 26 00:41:05 TehachapiMtn kernel: RIP: 0010:dma_buf_vmap+0xf0/0x100
Aug 26 00:41:05 TehachapiMtn kernel: Code: c0 01 89 43 28 48 85 c9 74 1c 48 8b 43 30 48 8b 53 38 49 89 04 24 49 89 54 24 08 eb c3 0f 0b b8 ea ff ff ff eb bc 0f 0b 0f 0b <0f> 0b eb b4 b8 ea ff ff ff eb ad e8 30 02 3f 00 90 90 90 90 90 90
Aug 26 00:41:05 TehachapiMtn kernel: RSP: 0018:ffffa6a141f6fb50 EFLAGS: 00010282
Aug 26 00:41:05 TehachapiMtn kernel: RAX: 00000000fffffff4 RBX: ffff8f90db9b0400 RCX: 0000000000000000
Aug 26 00:41:05 TehachapiMtn kernel: RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
Aug 26 00:41:05 TehachapiMtn kernel: RBP: ffffa6a141f6fb78 R08: 0000000000000000 R09: ffffa6a141f6f9a8
Aug 26 00:41:05 TehachapiMtn kernel: R10: 0000000000000003 R11: ffffffff8f8ca368 R12: ffff8f91ba3fd098
Aug 26 00:41:05 TehachapiMtn kernel: R13: ffff8f91ba3fed98 R14: ffff8f91ba3fd098 R15: 0000000000000000
Aug 26 00:41:05 TehachapiMtn kernel: FS:  00006f106202e640(0000) GS:ffff8f938ec00000(0000) knlGS:0000000000000000
Aug 26 00:41:05 TehachapiMtn kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 26 00:41:05 TehachapiMtn kernel: CR2: 00006f104003a000 CR3: 00000003d87fc002 CR4: 00000000001706f0
Aug 26 00:41:05 TehachapiMtn kernel: Call Trace:
Aug 26 00:41:05 TehachapiMtn kernel:  <TASK>
Aug 26 00:41:05 TehachapiMtn kernel:  ? dma_buf_vmap+0xf0/0x100
Aug 26 00:41:05 TehachapiMtn kernel:  ? __warn+0x7e/0x130
Aug 26 00:41:05 TehachapiMtn kernel:  ? dma_buf_vmap+0xf0/0x100
Aug 26 00:41:05 TehachapiMtn kernel:  ? report_bug+0x191/0x1c0
Aug 26 00:41:05 TehachapiMtn kernel:  ? handle_bug+0x3c/0x80
Aug 26 00:41:05 TehachapiMtn kernel:  ? exc_invalid_op+0x17/0x70
Aug 26 00:41:05 TehachapiMtn kernel:  ? asm_exc_invalid_op+0x1a/0x20
Aug 26 00:41:05 TehachapiMtn kernel:  ? dma_buf_vmap+0xf0/0x100
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_shmem_vmap_locked+0x27/0x1c0
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_shmem_object_vmap+0x31/0x50
Aug 26 00:41:05 TehachapiMtn kernel:  ? dma_resv_get_singleton+0x46/0x140
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_vmap+0x22/0x50
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_vmap_unlocked+0x2a/0x50
Aug 26 00:41:05 TehachapiMtn kernel:  drm_gem_fb_vmap+0x41/0x120
Aug 26 00:41:05 TehachapiMtn kernel:  drm_atomic_helper_prepare_planes+0x17b/0x210
Aug 26 00:41:05 TehachapiMtn kernel:  drm_atomic_helper_commit+0x78/0x140
Aug 26 00:41:05 TehachapiMtn kernel:  drm_mode_atomic_ioctl+0x9b2/0xbc0
Aug 26 00:41:05 TehachapiMtn kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Aug 26 00:41:05 TehachapiMtn kernel:  drm_ioctl_kernel+0xcd/0x170
Aug 26 00:41:05 TehachapiMtn kernel:  drm_ioctl+0x26f/0x4a0
Aug 26 00:41:05 TehachapiMtn kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Aug 26 00:41:05 TehachapiMtn kernel:  __x64_sys_ioctl+0x97/0xd0
Aug 26 00:41:05 TehachapiMtn kernel:  do_syscall_64+0x60/0x90
Aug 26 00:41:05 TehachapiMtn kernel:  entry_SYSCALL_64_after_hwframe+0x77/0xe1
Aug 26 00:41:05 TehachapiMtn kernel: RIP: 0033:0x6f106690c9df
Aug 26 00:41:05 TehachapiMtn kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Aug 26 00:41:05 TehachapiMtn kernel: RSP: 002b:000078411f5b3340 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 26 00:41:05 TehachapiMtn kernel: RAX: ffffffffffffffda RBX: 00000fc3383528b0 RCX: 00006f106690c9df
Aug 26 00:41:05 TehachapiMtn kernel: RDX: 000078411f5b33e0 RSI: 00000000c03864bc RDI: 0000000000000014
Aug 26 00:41:05 TehachapiMtn kernel: RBP: 000078411f5b33e0 R08: 0000000000000020 R09: 0000000000000001
Aug 26 00:41:05 TehachapiMtn kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 00000000c03864bc
Aug 26 00:41:05 TehachapiMtn kernel: R13: 0000000000000014 R14: 00000fc3380b11e0 R15: 00000fc338352af0
Aug 26 00:41:05 TehachapiMtn kernel:  </TASK>
Aug 26 00:41:05 TehachapiMtn kernel: ---[ end trace 0000000000000000 ]---
Aug 26 00:41:05 TehachapiMtn sddm-helper-start-wayland[718]: "kwin_wayland_drm: Atomic commit failed! Cannot allocate memory\nkwin_wayland_drm: Presentation failed! Cannot allocate memory\n"
Aug 26 00:41:05 TehachapiMtn kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Failed to ioremap_wc NvKmsKapiMemory 0x00000000f44a2c32

Can you reliably start the multi-user.target on all kernels and it's then only the graphical.target (SDDM, apparently wayland) that fails?
Does SDDM on X11 work?

Online

#11 2023-08-26 21:42:16

GaryScottMartin
Member
From: Tehachapi, California, U.S.
Registered: 2021-08-21
Posts: 20
Website

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

Successful Boot of linux-zen to graphical target

From the bottom up:

X11 vs Wayland: I have been experimenting with Wayland on and off for a couple of years. There was about a one-year period ending several months ago that I used Wayland exclusively. However, about four months ago I began to notice that Wayland hung whenever I left the computer long enough for the screen to blank, necessitating a hard shutdown. I was unable to switch to a TTY in these situations, no graphics card output apparently. SDDM is set up so that I can choose an X11 session or a Wayland session at login. All of the failures in Linux standard kernel boots and Linux-zen kernel boots have come after logging into an X11 session through SDDM. Once previously, Linux-zen gave me a usable Plasma session for about a minute before the screen went blank. This current session (boot log above) is the first extended successful session on Linux-zen in four attempts.

I can't say for sure about reliably booting all three kernels to multi-user targets, but I can tell you that I just finished booting each kernel to a multiuser target without issue three consecutive times. These boots were followed by logging on to my standard user account and then typing an immediate "reboot" command. (I was a flight test engineer for 40+ years, and don't consider three data points sufficient for making a reliability judgment, but I don't want to be here endlessly rebooting either).

The standard Linux kernel and Linux-zen boot logs from last night followed a change of target to multi-user (I wasn't aware that Systemctl had that capability until last night). When I first posted, xf86-video-vesa and perhaps some other unwanted graphics-related packages were still installed in the system. I have since scrubbed anything that I thought might remotely be associated with the AMDGPU driver or be otherwise superfluous to the NVIDIA driver. At that time, I also had not yet added 'nomodeset' to the kernel parameters or the NVIDIA modules to mkinitcpio.conf. By the way, do you really want me to remove everything, including 'nomodeset', from the kernel parameters?

Finally, I stopped and disabled NetworkManager, systemd-networkd, and dhclient. I also reconfigured dhcpcd so that it only runs on the working ethernet interface (a PCIEx1 card). The ethernet interface on the motherboard rolled over dead some years ago. Since the symptomology of this failure mode seems like it might be a kernel-level race condition, I ought to come clean on the hardware configuration. The motherboard (ASUS H87M-E) and processor (Intel(R) Core(TM) i3-4330) have been running in this box since January 2014. The recent replacement of the GPU and the PSU is the first (but really-big) step in the transition to a new water-cooled system in a Mini-ITX chassis.

P.S. A few hundred rapid-fire CTRL-ALT-DELs have so far failed to interrupt whatever causes the screen to blank. However, a momentary press of the system power button causes an apparent orderly poweroff and during that process I see a screen with a few "^@"s spaced across the top for a half-second to a second immediately before the power goes off. I looked over the SysRq page briefly last night, but need more time to digest it.

Last edited by GaryScottMartin (2023-08-26 21:51:32)

Offline

#12 2023-08-26 22:13:52

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,313

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

Aug 26 13:52:16 TehachapiMtn sddm[740]: Greeter session started successfully
Aug 26 13:52:16 TehachapiMtn sddm-helper-start-wayland[787]: Starting Wayland process "kwin_wayland --no-lockscreen" "sddm"

you seem to have https://wiki.archlinux.org/title/SDDM#R … er_Wayland ?

By the way, do you really want me to remove everything, including 'nomodeset', from the kernel parameters?

Not everything, but the parameters I pointed out:
- "quiet" to get you more output during the boot (more fine-grained stall position)
- "lsm=landlock,lockdown,yama,apparmor,bpf" to see whether apparmor or the lockdown pose a relevant problem here
- "nomodeset nouveau.modeset=0" because nouveau is blacklisted anyway and nomodeset would at best get in the way of your actual goal
-  "audit=l" because that's nonsense, you'd be looking for "audit=1" and that's default anyway

Using a short power button press is fine, too.

FYI:
--

Aug 26 13:51:50 TehachapiMtn modprobe[532]: libkmod: kmod_config_parse: /etc/modprobe.d/blacklist.conf line 1: ignoring bad line starting with '*'

--

Aug 26 13:52:02 TehachapiMtn ntfs-3g[546]: Mounted /dev/sdd2 (Read-Write, label "WindowsData", NTFS 3.1)

In case there's a parallel windows installation, see the 3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.
--
dhcpcd and systemd-networkd/resolved are both enabled in the latest journal

Online

#13 2023-08-27 04:43:48

GaryScottMartin
Member
From: Tehachapi, California, U.S.
Registered: 2021-08-21
Posts: 20
Website

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

I forgot that I configured SDDM to use the Wayland greeter when I first started experimenting with Wayland. I have now deleted the Wayland config file. It no longer reports that its starting the wayland greeter. SDDM still works.

Here are the current kernel parameters:

GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 nvidia_drm.modeset=1"

I removed the extraneous * from /etc/modprobe.d/blacklist.conf. It was on the line:

blacklist nouveau

That is the only line in the file.

There is a parallel Windows install. Although, I was sure that Fast-Start was disabled, I just now specifically reverified from the Windows Power Options Settings that Fast-Start remains disabled. (I was sure because I use Terabyte Unlimited's BootIt Bare Metal to manage disk configs for boot and booting. BootIt Bare Metal also requires that Fast-Start be disabled.)

Evidently when I previously stopped systemd-networkd, I overlooked the warning that systemd-networkd.service could still be activated by systemd-networkd.socket. I have now stopped and disabled both systemd-networkd.service and systemd-networkd.socket. Hopefully, systemd-networkd will stay dead this time.

I am now running a succesfull Linux-zen kernel boot. I will reboot with these latest changes and test all three kernels.

Does the boot journal remain intact after a momentary power button initiated poweroff? Is it not overwritten by the subsequent boot?

Last edited by GaryScottMartin (2023-08-27 04:55:36)

Offline

#14 2023-08-27 05:14:02

GaryScottMartin
Member
From: Tehachapi, California, U.S.
Registered: 2021-08-21
Posts: 20
Website

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

Posting this from the first successful boot of the standard Linux kernel with this GPU. The logs from the successful boots of all three kernels to graphical targets:
Successful boot of Linux-zen kernel to graphical target.
Successful boot of Linux-hardened kernel to graphical target.
Successful boot of Linux kernel to graphical target.

I have had no failures in the last half dozen boots, which have involved all three kernels. 

Systemd-networkd was still resurrecting itself in the logs above. However, from my most recent boot, It appears that I have finally killed it off. (In the process, I also killed DNS resolution for an hour, but it's obviously back).

Last edited by GaryScottMartin (2023-08-27 07:40:21)

Offline

#15 2023-08-27 06:38:17

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,313

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

Aug 26 22:03:36 TehachapiMtn systemd[1]: Created slice Slice /system/systemd-networkd-wait-online.

Edit: Post the output of

find /etc/systemd -type l -exec test -f {} \; -print | awk -F'/' '{ printf ("%-40s | %s\n", $(NF-0), $(NF-1)) }' | sort -f

You can try to re-introduce "lsm=landlock,lockdown,yama,apparmor,bpf" (and ultimately quiet, if you want  it's not gonna break things, just hide them wink but my money is on SDDM/wayland being a problem w/ the nvidia driver.

Last edited by seth (2023-08-27 06:38:42)

Online

#16 2023-08-27 07:30:10

GaryScottMartin
Member
From: Tehachapi, California, U.S.
Registered: 2021-08-21
Posts: 20
Website

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

The requested output:

apparmor.service                         | multi-user.target.wants
auditd.service                           | multi-user.target.wants
avahi-daemon.service                     | multi-user.target.wants
avahi-daemon.socket                      | sockets.target.wants
cups.path                                | multi-user.target.wants
cups.service                             | multi-user.target.wants
cups.service                             | printer.target.wants
cups.socket                              | sockets.target.wants
dbus-org.freedesktop.Avahi.service       | system
dbus-org.freedesktop.timesync1.service   | system
default.target                           | system
dhcpcd@enp3s0.service                    | multi-user.target.wants
display-manager.service                  | system
docker.service                           | multi-user.target.wants
getty@tty1.service                       | getty.target.wants
nftables.service                         | multi-user.target.wants
numLockOnTty.service                     | multi-user.target.wants
p11-kit-server.socket                    | sockets.target.wants
pipewire-media-session.service           | pipewire.service.wants
pipewire-session-manager.service         | user
pipewire.socket                          | sockets.target.wants
pulseaudio.socket                        | sockets.target.wants
reflector.timer                          | timers.target.wants
remote-fs.target                         | multi-user.target.wants
saned.socket                             | sockets.target.wants
systemd-timesyncd.service                | sysinit.target.wants
xdg-user-dirs-update.service             | default.target.wants

Offline

#17 2023-08-27 11:42:31

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,313

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

You probably want to install pipewire-pulse, https://wiki.archlinux.org/title/PipeWi … io_clients but nothing seems to trigger networkd w/ those services?
Does the GUI now work reliably?

In case, please always remember to mark resolved threads by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.

Online

#18 2023-08-28 09:46:18

GaryScottMartin
Member
From: Tehachapi, California, U.S.
Registered: 2021-08-21
Posts: 20
Website

Re: [SOLVED] Can't boot kernel after replacing AMD GPU with RTX4060

The final trigger for 'systemd-networkd' was apparently 'systemd-networkd-wait-online@enp3s0.service'. The warning that is produced when you disable 'systemd-networkd.service' tells you that it may be activated again by 'systemd-networkd.socket', but doesn't mention the 'systemd-networkd-wait-online' service. The generic wait-online service is configured when 'systemd-networkd' is enabled. I must have disabled it when I enabled the 'enp3s0' specific service years ago. Anyway, 'systemd-networkd' no longer shows in the log, and 'systemctl' reports it is disabled and inactive (dead).

I have removed the remaining PulseAudio packages and installed the corresponding PipeWire packages. Thanks for the tip.

Some additional testing with Wayland strongly supports your hypothesis that there is a conflict between Wayland and the NVIDIA driver. I have had no failures on occasions when SDDM was configured to invoke its X11 greeter and was starting an X11 Plasma Session. Starting a Wayland Plasma Session from the X11 greeter also reproduced the failure.  However, the failure does not occur in every instance where Wayland is used either by the SDDM greeter or by Plasma. It's perhaps a 50/50 proposition overall and the probability of failure seems to vary with the specific kernel you are using. Failures seem to occur most often with the standard kernel and less often with the hardened and zen kernels (N.B., I don't have a statistically significant sample of failures and don't plan any further testing).

As you can see, I have updated the subject line to mark this issue solved. Thank you very much for your kind assistance with my problem.

Last edited by GaryScottMartin (2023-08-28 09:47:38)

Offline

Board footer

Powered by FluxBB