You are not logged in.
My PC worked perfectly fine yesterday. Today, it decided to annoy me a bit.
My system won't boot at all. It's basically stuck at "Starting version 250.5-1-arch"
I logged into my system via a live-USB and first completely updated the system. That didn't really do anything, sadly.
I noticed that the grub boot menu would "load" very slow. It started at the top and slowly loaded to the bottom, and inputs in the menu were extremely slow, like 1 - 2 seconds per press. So I tried to reinstall grub and regenerate the config/initramfs, but the behavior didn't change. It's still extremely slow. I tried to set the grub-resolution to 1280x1024x32 in the config, now it loads faster, but it's still quite noticeable that it loads from the top to bottom, it takes around half a second.
Another thing I tried is following an advice found in another thread and booting into multi-user.target instead of graphical.target, but that didn't help either. I'm always booting into TTY tho and manually start the DE via startx. So I kinda knew that wouldn't go anywhere.
I checked journalctl for any errors that might occur, but there is nothing. And with nothing, I mean literally nothing. There is my "normal" boot log. That goes until 12:24:46. After that, there are 9 lines at 12:25:06 regarding networking. And the next thing is the power off log at 12:26:01. So in the meantime, while it's stuck at "Starting version 250.5-1-arch", there is nothing happening. You can find a log of the entire boot process here: http://ix.io/3WJb
I also tried removing the quiet option from grub so I might be able to get some warnings or errors, but I didn't get any additonal output besides "::running early hook [udev]" before the starting version stuff.
I also SMART-Checked the drive with a long test, but the results were flawless. The drive is also relatively new, I think I bought it 5 months ago. I'm also sure it's not a hardware error, because I tried to add another NVME with a debian install and it worked flawlessly.
I'm kinda out of ideas here. Does anyone know why this could be happening and how I could solve it?
Last edited by realitaetsverlust (2022-05-03 18:11:06)
Offline
May 01 12:24:35 exodus kernel: Linux version 5.17.5-arch1-1 (linux@archlinux) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Wed, 27 Apr 2022 20:56:11 +0000
…
May 01 12:24:36 exodus systemd[1]: Reached target Multi-User System.No delay there
May 01 12:24:38 exodus kernel: amdgpu 0000:2f:00.0: [drm] Cannot find any crtc or sizesDoes booting w/ "nomodeset" work?
Offline
May 01 12:24:35 exodus kernel: Linux version 5.17.5-arch1-1 (linux@archlinux) (gcc (GCC) 11.2.0, GNU ld (GNU Binutils) 2.38) #1 SMP PREEMPT Wed, 27 Apr 2022 20:56:11 +0000 … May 01 12:24:36 exodus systemd[1]: Reached target Multi-User System.No delay there
May 01 12:24:38 exodus kernel: amdgpu 0000:2f:00.0: [drm] Cannot find any crtc or sizesDoes booting w/ "nomodeset" work?
No change in behavior
New bootlog: http://ix.io/3WOs
Offline
Does the LTS kernel still work and does it help to force the output as "enabled", https://raw.githubusercontent.com/torva … modedb.rst ?
Offline
Does the LTS kernel still work and does it help to force the output as "enabled", https://raw.githubusercontent.com/torva … modedb.rst ?
I installed the lts kernel via pacman -S linux-lts and regenerated the grub-config, after a reboot, I selected the linux LTS kernel. However, the result is exactly the same. Nothing is happening after the "Starting version 250.5-1-arch" string.
Offline
If you blacklist the amdgpu module is there any change?
Offline
Next question: skip "vfio-pci.ids=10de:1b80,10de:10f0" (that's some nvidia GPU?)
Offline
If you blacklist the amdgpu module is there any change?
I added module_blacklist=amdgpu to my grub settings and regenerated the config, but no, it didn't help.
Offline
Next question: skip "vfio-pci.ids=10de:1b80,10de:10f0" (that's some nvidia GPU?)
I'm going to eat a broom, that was it.
How can that be tho? That card has been sitting in my rig for months used in a windows KVM. How can it just break like that?
Edit:
After trying to boot into the system and trying to execute startx, I get this error now tho:
Is there a hardware error or something? I did enable the amdgpu again btw, the only cmd params in grub are log level and the trust_cpu thingy.
Last edited by realitaetsverlust (2022-05-02 19:55:04)
Offline
You're now running on the nvidia chip, but there's nothing to drive it (I guess, please post an updated journal)
It should however allow you to check the status of the AMD chip, https://wiki.archlinux.org/title/AMDGPU#Monitoring (nb that the AMD one might be card1)
Did you recently run a BIOS update or so?
Offline
You're now running on the nvidia chip, but there's nothing to drive it (I guess, please post an updated journal)
It should however allow you to check the status of the AMD chip, https://wiki.archlinux.org/title/AMDGPU#Monitoring (nb that the AMD one might be card1)Did you recently run a BIOS update or so?
Updated journal: ix.io/3WPG
I did check the status of the card, but I'm not quite sure what I should see or what I should pay special attention to. I'll gladly provide the info you need if you tell me where I can find it.
What I did notice while executing `watch -n 0.5 cat /sys/kernel/debug/dri/0/amdgpu_pm_info` is that the card seems to be under a constant 99% load. And the temps are constantly rising. This is definitely worrisome because I'm not quite sure what could cause it. I'm not on a DE, I'm just idling in the TTY.
I did not do any bios updates nor any system updates prior to this happening. I did a system update after it broke down, which didn't do anything.
Another thing I noticed while looking through the journal was a lot of amdgpu related ... warnings, errors, infos? I'M not sure. I can't remember ever seeing them, but then again, I'm not checking my journal that often.
Last edited by realitaetsverlust (2022-05-02 21:48:49)
Offline
Okay, so I was able to fix this. Thanks for pointing me into the direction of the GPU.
Basically, I removed the vfio-drivers and blacklisted the nouveau drivers. That ... completely broke everything as I had no output, but I removed mesa-git and installed the mesa package from the main repo. That fixed it and I can use my PC again.
In the long run, I will probably switch towards proxmox as I really need to be able to virtualize a windows environment with GPU for my job. But for now, I can work again.
Thanks again for your patience ![]()
Last edited by realitaetsverlust (2022-05-03 18:10:48)
Offline