You are not logged in.
When I hibernate and try to resume again, sometimes it fails. When the issue happens, after selecting Arch Linux in the GRUB menu, I will get a black screen with blinking cursor and no messages.
I have no idea if it's stuck or if I wait a long time it would resume after all, after a few minutes I give up and do a hard reset.
I didn't use to have this problem and was able to resume without an issue for years. In recent months I started getting it occasionally. It doesn't always happen, sometimes it resumes successfully.
How do I troubleshoot this?
* I've gone through https://wiki.archlinux.org/title/Power_ … nsistently and it didn't help
* Swap seems large enough: I have 16GB RAM and my swap partition is 16GB
* Kernel parameter for resume is correct: `/etc/default/grub` used to specify `resume` as UUID, which worked previously so I know the syntax is correct. Just in case, I tried using `/dev/vg/root` instead of UUID, and I still get the problem (hibernate sometimes works, sometimes not). Also I use the same configuration on other computers and I don't get this problem on those.
When it fails, it must be something very early in the boot process, because it happens before I get any boot messages from systemd. There must be logs in systemd/journald about the resume attempt, but what would I be looking for? What is the first message logged by systemd when attempting to resume from hibernate, and what message is logged when resume successfully completes?
Last edited by lfitzgerald (2024-11-05 20:34:26)
Offline
https://wiki.archlinux.org/title/Genera … l_messages
And avoid using the power button, setup and try to use the https://wiki.archlinux.org/title/Keyboa … el_(SysRq)
Generic questions:
* How frequent is "sometimes"?
* Do you have an nvidia GPU?
* Is there a parallel windows installation?
Offline
How frequent is "sometimes"?
Very roughly, for every 3 attempts to resume from hibernate, 1 or 2 will get stuck. That's why I'd like to figure out what log messages indicate successful/failed resume though. I already have lots of logs. I can just look through them and see exactly what the frequency is. I can also see if it's more likely to fail for longer uptimes.
Do you have an nvidia GPU?
Yes. Some time back, I also took out my old nvidia card and switched to the onboard AMD card, and then put in a new nvidia card after that. I also had to install the amd driver, then uninstall it and install the nvidia again.
The timing of this might possibly align with the hibernate issue, though I didn't record exact dates for this so I'm not sure.
Is nvidia known to cause hibernate issues? And if so, how would I confirm that nvidia is the cause?
Is there a parallel windows installation?
No.
Is this to try and get some more output during the failed hibernate, instead of just a blank screen? If I want to add the `debug` parameter, does that go in `/etc/default/grub` under `GRUB_CMDLINE_LINUX_DEFAULT`? That already contains `loglevel=3` so I assume I would remove that and put in `debug` instead, is that right?
And avoid using the power button, setup and try to use the https://wiki.archlinux.org/title/Keyboa … el_(SysRq)
I haven't used SysRq on this install so I'd probably need to enable it. I'll give it a try.
Is there any useful information about hibernate I can get with SysRq, or are you just saying it as a general best practice?
Offline
I think I can answer my own question about the `debug` kernel parameter - yes it goes in `/etc/default/grub` under `GRUB_CMDLINE_LINUX_DEFAULT` and then you have to run `sudo grub-mkconfig -o /boot/grub/grub.cfg`
Now that I've added it, I get a lot more output on the console while booting. It also obscures the LUKS prompt, which makes it look like the boot is stuck, but actually after pressing Enter it gives the LUKS prompt again.
Offline
With the debug parameter, I now see some messages related to resuming from hibernate when attempting that.
Funnily enough, after I added the parameter and successfully booted, I tried to hibernate and resume. It got stuck. Transcribing from my photo, the last console output was:
Filesystems sync: 0.036 seconds
Freezing user space processes
Freezing user space processes completed (elapsed 0.001 seconds)
OOM killer disabled.
PM: hibernation: Marking nosave pages: [mem 0x...-0x...]
PM: hibernation: Marking nosave pages: [mem 0x...-0x...]
PM: hibernation: Marking nosave pages: [mem 0x...-0x...]
PM: hibernation: Marking nosave pages: [mem 0x...-0x...]
PM: hibernation: Marking nosave pages: [mem 0x...-0x...]
PM: hibernation: Marking nosave pages: [mem 0x...-0x...]
PM: hibernation: Marking nosave pages: [mem 0x...-0x...]
PM: hibernation: Basic memory bitmaps created
PM: hibernation: Preallocating image memory
PM: hibernation: Allocated ... pages for snapshot
PM: hibernation: Allocated ... kbytes in 0.65 seconds (4096.54 MB/s)
Freezing remaining freezable tasks
Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
printk: Suspending console(s) (use no_console_suspend to debug)
The lines were prefixed with timestamps, top line ~63.56... and bottom line 64.22... so I guess these are all messages from 1 second.
My second monitor was also showing some OSD messages that looked like it was turning on and off every second or so.
When it got stuck with these messages, I tried to use Ctrl+Alt+F* to change TTY. The screen went black, and seemed to flash as if the combination worked. No console appeared for any of the F* keys, only black screen with not even blinking cursor.
I rebooted with Alt+SysRq+b and tried the hibernate again, and it succeeded. During the succesful resume, I saw the printk message appear briefly, but it quickly scrolled past that.
I had 99 boot IDs in journalctl. I searched for `marking nosave` and 17 of them contained very similar messages to the above. The memory addresses seem like they're the same every time, but number of allocated pages changes from log to log. I think this type of block does not necessarily mean that resume failed. For example `journalctl -b` shows messages from the current session, which was a successful resume, and I see similar messages. Seems like in a successful resume, the printk message is followed by `ata1.00: Entering standby power mode`, some more about CPUs powering down, then immediately after messages about CPUs and other devices powering up, and eventually normal messages from my user programs like my DNS service.
Offline
I have now observed this issue with both the LTS kernel and the regular one. It actually seems to be happening quite frequently now that I enabled debug logs. Before today, I seem to remember that hibernate right after a fresh boot and resuming again will usually succeed, and the failed resumes seem to be after using the computer for some hours, but I may be misremembering.
Also, it seems like when it gets stuck, if I try Ctrl+Alt+F1 as the first thing it will succeed in switching to another TTY which shows systemd messages about enabling color profiles (maybe graphics trying to come online?). After that none of the TTY combinations work anymore.
I also noticed that even during failed resumes, just before the block I quoted above, I see messages like "xx% of resume image loaded" (can't find these in journalctl), so it seems like it is able to find and load the hibernated image from swap, but somehow the image fails to start up soon after (about 1 s later).
Last edited by lfitzgerald (2024-11-06 01:04:36)
Offline
Is nvidia known to cause hibernate issues? And if so, how would I confirm that nvidia is the cause?
https://bbs.archlinux.org/viewtopic.php?id=285508
https://bbs.archlinux.org/viewtopic.php?id=300676
https://wiki.archlinux.org/title/NVIDIA … er_suspend - disable those services and add "nvidia.NVreg_PreserveVideoMemoryAllocations=0" to the https://wiki.archlinux.org/title/Kernel_parameters
Is there any useful information about hibernate I can get with SysRq, or are you just saying it as a general best practice?
The sysrq REISUB dance ideally allows you to sync the journal to disc before a controlled reboot, ie. you'll preserve the logs of the previous (failing) boot which you can then access w/ "sudo journalctl -b -1"
On a formal note, please don't bump, edit your previous post to mend it if nobody has yet replied.
Offline
Both of those nvidia services are disabled, and I also added the NVreg kernel param, and I still get failed resumes.
I tried to use SysRq to do (s)ync fs, (u)nmount fs, (r)eboot after a failed resume. Then I checked journalctl -b -1. Near the end, I see:
Nov 15 15:55:31 HOSTNAME systemd-sleep[2605]: Performing sleep operation 'hibernate'...
Nov 15 15:55:31 HOSTNAME systemd-udevd[530]: vcs63: Device is queued (SEQNUM=4643, ACTION=add)
Nov 15 15:55:31 HOSTNAME kernel: PM: hibernation: hibernation entry
And after that there are no more messages from kernel or systemd-sleep. There are a few from udev and logind that don't seem relevant. All are the same timestamp.
It looks to me like the log ends at the hibernate. There do not appear to be messages from the resume. "Performing sleep operation 'hibernate'" sounds like it's from when I ran "systemctl hibernate". So the messages from resume should have come at least 1 min later (how long I waited after hibernate to try to turn on the computer again). There aren't any past 15:55:31, so there appears to be no log.
Offline
tried to use SysRq to do (s)ync fs, (u)nmount fs, (r)eboot
There's a reason why REISUB is a mnemonic.
"r" doesn't reboot but
Switch keyboard mode for the current virtual console from the raw mode to ASCII mode (also known as XLATE mode)
Did you also push the power button after you "try"?
Offline