You are not logged in.
I rolled forward my system on Fri 7th Oct 2022 to 5.19.13 (well it was a full system upgrade of course) and it kept crashing with kernel panics in various different forms.
I had also had trouble loading the new grub config as per the news on the homepage about the requirement to reinstall grub's bootloader, so I used the `--removable` option. This broke grub, but I could still boot into it by booting the hard-drive directly which gave me a grub menu and allowed me to boot into my system.
So, I decided to rollback to 6th Sept. 2022 which is the last full system upgrade at which my system was stable (very stable, in fact). Unfortunately this has not stopped the crashes. Sometimes I'm not even able to log into a text tty without it crashing nearly immediately, other times I can startx and have a session for about 15 minutes because it locks up. It doesn't always log useful things to the syslog so I can't work out what is going on. Most recently, lightdm crashed and so I dropped into tty2 and tried to log in, at which point I got a huge number of error traces in the tty and was unable to use it. The only thing which worked was holding down the power button (for a long time, during which more messages were spat into the terminal, seemingly relating to me pressing the power button).
This is on a system76 lemur pro. I've posted some of the journals from the different boots here:
0: https://pastebin.com/VVMxk7PN
1: https://pastebin.com/5TdLLF9K
2: https://pastebin.com/xcCh066r
3: https://pastebin.com/JJWdxXLN
Perhaps this is relevant, but I then tried to boot from a USB stick with ubuntu on it. The first time I booted, it crashed almost immediately on loading the desktop. The second time I booted with "safe graphics" and that is the system I'm now writing this from (running on an Ubuntu live-usb).
I am at a bit of a loss as to what to do here since rolling back the system did not prevent it from crashing. I tried reinstalling the kernel which didn't help. The only other thing I can think of is that the failure to get grub to properly install had some effect, but I'm not sure.
Offline
I also have photos of the screen when it totally crashed in non-graphical model which might help point the way?
Offline
Perhaps this is relevant, but I then tried to boot from a USB stick with ubuntu on it. The first time I booted, it crashed almost immediately on loading the desktop. The second time I booted with "safe graphics" and that is the system I'm now writing this from (running on an Ubuntu live-usb).
If you're having these problems even from a USB stick, then it's looking like a hardware problem. My suspicion (I'm not the best person to interpret the logs) is some kind of memory problem perhaps the CPU cache,
I would suggest running MemTest including the CPU memory cache and perhaps doing SMART tests as well.
Offline
Yeah, I can’t even boot the arch install medium as I get a kernel panic as per this image:
Offline
Although I am able to work from an Ubuntu live USB booted into “safe graphics”. It has now been three hours and no crashes. I just can’t use anything else.
I don’t know if that definitely rules in or out a hardware error as I’m not totally sure yet what safe graphics means beyond using X11 instead of wayland.
Offline
https://www.dropbox.com/s/3ci57uc05dvco … 1.jpg?dl=0
Is a corrupted iso, maybe the specific USB key is damaged.
https://www.dropbox.com/sh/wyxce5x2r6ii … +11+10.jpg
tells me that at least systemd-networkd and dhcpcd are enabled and you unsurprisingly get network issues.
find /etc/systemd -type l -exec test -f {} \; -print | awk -F'/' '{ printf ("%-40s | %s\n", $(NF-0), $(NF-1)) }' | sort -fhttps://www.dropbox.com/sh/wyxce5x2r6ii … +58+14.jpg
https://www.dropbox.com/sh/wyxce5x2r6ii … +00+18.jpg
Could be RAM issues or follow up problems from a bogus kernel module - is that also in one of the journals you linked?
Offline
> Is a corrupted iso, maybe the specific USB key is damaged.
Gotcha -- it's fairly new but I can try to reflash to see if that helps. I guess I was getting exasperated by the sheer number of things going wrong.
> tells me that at least systemd-networkd and dhcpcd are enabled and you unsurprisingly get network issues.
But that shouldn't cause full-on kernel panics should it? I had tried to reconcile these different networking stacks, removing anything conflicting, but ended up with other problems (beyond the scope of this post). I'm not sure if this really points to the key issue though.
> Could be RAM issues or follow up problems from a bogus kernel module - is that also in one of the journals you linked?
No, the photos and the journals are different occasions. There have been many more crashes but they don't always put anything useful in the journals, nor do they dump to the screen as I've been in an X11 session.
When you say a bogus kernel module: is there a way of me removing kernel modules until I get a stable system or should I just blitz the system and reinstall arch? The reason I'm a bit perplexed by this is that, normally, if I have issues with a kernel upgrade, rolling back works fine. This just hasn't.
Offline
But that shouldn't cause full-on kernel panics should it?
No, but https://www.youtube.com/watch?v=5RyYrs5tu60
If the NIC driver doesn't like it and trips, you could get anything.
normally, if I have issues with a kernel upgrade, rolling back works fine. This just hasn't.
1. clean up the network situation
2. https://wiki.archlinux.org/title/Stress … MemTest86+ (you need to run this for many cycles to be able to say "RAM is likely ok")
3. Please post a system journal covering those __list_add_valid RIPs.
Offline
Just a small test: I wrote a new usb stick from a different machine and I also get (different) kernel panics when trying to boot. Ubuntu safe-graphics usb still boots.
https://www.dropbox.com/s/mnmy0mqxbvqde … .HEIC?dl=0
At the moment my system is barely stable enough to boot so I’m not sure how much success I’ll have with the networking situation but will try.
I’ll see if I can run memtest86 — I need to get my
Laptop to boot in bios mode first I think.
Offline
Yep, I can no longer boot into arch at all, and the normal ubuntu boot mode doesn’t work either (100% reproducible). Given the extent to which this has degraded I think I can at least guess that hardware is the issue here — maybe something to do with the integrated graphics but I don’t know.
Offline
Can you boot arch w/ "nomodeset"?
Try to downclock the RAM in the BIOS/UEFI settings and to let the system cool down.
Offline
Firstly: memtest86 did four passes with no errors last night which reduces likelihood of a RAM issue (I guess).
Tried to boot arch with “nomodeset” and it got stuck at some systemd messages “[Failed] Failed to start Waiting for Network to be Configured” and the like (just two actually).
Booting ubuntu liveusb with nomodeset, however, worked and didn’t crash this time.
Offline
got stuck at some systemd messages “[Failed] Failed to start Waiting for Network to be Configured” and the like
https://bbs.archlinux.org/viewtopic.php?id=57855
Try to only boot the multi-user.target (2nd link below) and in any even provide a complete system journal from any arch boot (can be a slightly older, ie. the last successful one)
Offline
Unfortunately it has now degraded to the point at which I can not even boot into arch at all. I get past the decryption of my disk, then it locks up before it even lets me login. I've wiped my harddrive and installed ubuntu, but the same thing happens pretty much, so I'm going to get the hardware repaired as I can't think what else it could be at this point.
Offline
Try to downclock the RAM in the BIOS/UEFI settings
It's your best shot next to HW replacement and anecdotally, the combined access patterns of IGP and CPU might not be easily reflected by memtest86
Offline