You are not logged in.
Hi,
I've been using standby/resume on my workstation for a few weeks now, without any issue.
Today, for the first time, resuming failed - the computer just hanged. I had to force a reset.
After rebooting from this - successfully - I looked at dmesg and noticed this type of message for each CPU core:
mce: [Hardware Error]: TSC 0 ADDR be000000 MISC bebb1898
[ 11.912644] mce: [Hardware Error]: PROCESSOR 0:406f1 TIME 1687804938 SOCKET 0 APIC 0 microcode b000040
[ 11.912648] mce: [Hardware Error]: Machine check events logged
[ 11.912649] mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: be00000000800400From what I've read, I'm assuming these messages are reports from the last errors that happened during the failed resume.
Kernel: 6.3.9-arch1-1
Processor: Xeon E5-2690 v4
GPU: AMD RX 6650 XT
RAM: 64GB non-ECC
My kernel boot line includes the following options:
pci=noaer threadirqs quiet loglevel=3 systemd.show_status=auto libahci.ignore_sss=1 splash(Note: the 'pci=noaer' option is because without it, I got tons of PCI AER errors in dmesg due to an apparently well-known issue with the NVMe controller of the Samsung 960 Pro NVMe (which is one of the NVMe's I have in my machine, but not the one I boot Linux from.)
Any idea what could have suddenly happened, could it be related to the latest kernel version? Any other clue or things I can investigate?
Offline