You are not logged in.
Pages: 1
Hi all,
In order to set-up hibernation on my system, I configured everything as explained in https://wiki.archlinux.org/index.php/Po … _hibernate
By doing this, I was able to hibernate my system using systemctl-hibernate command, and it works most of the time. Despite that, sometimes the resume hook fails unexpectedly and the system doesn't resume. Instead, it boots normally and all my saved state is gone.
I have been observing this behavior for the last few weeks, and I couldn't determine the cause of the issue. But I think it may have something to do with the way I use hibernation: My typical workflow consists on hibernating arch in order to boot into an hibernated Windows 10 and vice-versa.
After digging a bit further, I looked at the output of journalctl. As expected, an error is shown during the resume process whenever my system fails to resume. This is the relevant section of the log (I've marked with arrows the red lines with the error):
May 05 20:44:48 jsf-Arch kernel: PM: Starting manual resume from disk
May 05 20:44:48 jsf-Arch kernel: PM: Hibernation image partition 8:9 present
May 05 20:44:48 jsf-Arch kernel: PM: Looking for hibernation image.
May 05 20:44:48 jsf-Arch kernel: PM: Image signature found, resuming
May 05 20:44:48 jsf-Arch kernel: PM: Preparing processes for restore.
May 05 20:44:48 jsf-Arch kernel: Freezing user space processes ... (elapsed 0.001 seconds) done.
May 05 20:44:48 jsf-Arch kernel: PM: Loading hibernation image.
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0x00000000-0x00000fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0x0009e000-0x000fffff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0x20000000-0x201fffff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0x40004000-0x40004fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc8093000-0xc8093fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc80a1000-0xc80a1fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9752000-0xc9d55fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9d6d000-0xc9d72fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9d75000-0xc9d82fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9f15000-0xc9f18fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9f62000-0xc9f86fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9f8a000-0xc9f8bfff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9fa3000-0xc9fa8fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9fb1000-0xc9fb1fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9fc1000-0xc9fc1fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9fcd000-0xc9fd1fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xc9ffe000-0xc9ffefff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xca00f000-0xca035fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xca04b000-0xca04bfff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xca04d000-0xca04efff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xca050000-0xca054fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xca06b000-0xca892fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xca894000-0xca8d6fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xcace4000-0xcaff3fff]
May 05 20:44:48 jsf-Arch kernel: PM: Marking nosave pages: [mem 0xcb000000-0xffffffff]
May 05 20:44:48 jsf-Arch kernel: PM: Basic memory bitmaps created
May 05 20:44:48 jsf-Arch kernel: PM: Using 3 thread(s) for decompression.
                                 PM: Loading and decompressing image data (785259 pages)...
May 05 20:44:48 jsf-Arch kernel: Hibernate inconsistent memory map detected!  <------------------------------------
May 05 20:44:48 jsf-Arch kernel: PM: Image mismatch: architecture specific data  <----------------------------------
May 05 20:44:48 jsf-Arch kernel: PM: Read 3141036 kbytes in 0.01 seconds (314103.60 MB/s)
May 05 20:44:48 jsf-Arch kernel: PM: Error -1 resuming
May 05 20:44:48 jsf-Arch kernel: PM: Failed to load hibernation image, recovering. <--------------------------------
May 05 20:44:48 jsf-Arch kernel: PM: Basic memory bitmaps freed
May 05 20:44:48 jsf-Arch kernel: Restarting tasks ... done.
May 05 20:44:48 jsf-Arch kernel: PM: Hibernation image not present or could not be loaded.
May 05 20:44:48 jsf-Arch kernel: random: crng init done
May 05 20:44:48 jsf-Arch kernel: EXT4-fs (sda7): mounted filesystem with ordered data mode. Opts: (null)Trying to google my issue, I have come across this patch description: https://patchwork.kernel.org/patch/9386175/ which I think might be relevant, but at this point I am a bit lost on where to continue.
Any ideas or directions would be much appreciated!
Last edited by setzer22 (2017-05-06 12:28:00)
Offline
I had same problem after switched to systemd-boot from grub two years ago but I'm not using hibernate anymore.
What's you bootloader?
Offline
My bootloader is systemd-boot as well... Could it be a bootloader issue?
I assume from your message that you were unable to solve your issue... I will try moving to grub from systemd-boot and see if that fixes it.
Offline

Don't hold your breath - this is likely caused by windows (eg. an update or hybrid sleep or so), resulting in a change of the system memory map (eg. variation in reserved memory?)
You're looking for a hypervisor
Offline
Don't hold your breath - this is likely caused by windows (eg. an update or hybrid sleep or so), resulting in a change of the system memory map (eg. variation in reserved memory?)
You're looking for a hypervisor
I do suspect windows for messing things up, but I was thinking that the memory map was outside of its scope. Isn't the memory map in the linux side or is it a machine-wide thing? Is this a common issue when dual-booting and hibernating, then?
I've had a quick look at hypervisors, which I didn't know about. But would I get the same performance using something that is essentially running virtual machines for multiple OS (even if at a lower level than, say, virutalbox)?
Offline

You're sharing resources; w/ proper CPU support the problem is RAM. W/o hardware virtualization, you won't get windows at all.
Windows (any other side) can either conduct a BIOS/UEFI update or trigger changes in the memory reserved for onboard devices (or alter their internal state) more or less just by using them.
=> You could store and diff 
dmesg | sed '/e820/!d; s/^[^]]*] //g'on each boot (w/ a hibernated windows) to see whether something has changed
The typical issue so far has however been data loss on shared NTFS/FAT partitions (which is a very real risk if you mount them rw from both sides!)
Offline
You're sharing resources; w/ proper CPU support the problem is RAM. W/o hardware virtualization, you won't get windows at all.
Even if I got proper CPU support for virtualization (which I haven't checked), there's still the GPU, which I know for sure I can't passthrough on my machine, so I don't think that'll work for me. Even though, I'm actually thinking about a similar setup for my next machine, so thanks for the tip!
Windows (any other side) can either conduct a BIOS/UEFI update or trigger changes in the memory reserved for onboard devices (or alter their internal state) more or less just by using them.
=> You could store and diff
dmesg | sed '/e820/!d; s/^[^]]*] //g'
on each boot (w/ a hibernated windows) to see whether something has changed
I'll set up a script to record this at every boot so I can check for changes. Since this only happens every now and then, it may take a while since I get proper results...
The typical issue so far has however been data loss on shared NTFS/FAT partitions (which is a very real risk if you mount them rw from both sides!)
Yes, I am aware of this. Luckily I don't have a shared windows/linux partition.
Thanks for your help, I'll try to get back with some more info.
Offline
You were right, I was able to replicate the issue, and the e820 memory dump is different between logins. This is the diff output from the two dmesg outputs. I couldn't make any sense of it myself:
diff login0605171903 login0705171559
56c56
< e820: update [mem 0xc808b018-0xc8099057] usable ==> usable
---
> e820: update [mem 0xc8089018-0xc8097057] usable ==> usable
65c65
< e820: reserve RAM buffer [mem 0xc808b018-0xcbffffff]
---
> e820: reserve RAM buffer [mem 0xc8089018-0xcbffffff]Also, I haven't had time to replicate this to make sure, but I think what might be causing resume to fail is the following sequence: Boot Arch -> Hibernate Arch -> Boot Windows -> Hibernate Windows -> Resume Windows -> Hibernate Windows -> Resume Arch (fails). Could it be that windows changes something on the e820 memory when resuming from hibernation?
Last edited by setzer22 (2017-05-07 14:10:37)
Offline

Hard to say from the diff, but apparently some memory portion was downshifted by 8kB (and in the end you got more free RAM! 8kB, that is ... ;-)
It's really hard to say what causes those changes (you'd need to figure what the BIOS is using the altered reserved memory for), but you now know for sure that it happens and hibernation relies on an untouched system (and very importantly: untouched memory)
Windows could trigger this change for ACPI uses (but you can't blame MS here at all - it merely makes indirect requests which may require the BIOS to alter what it requires of RAM)
Offline
It happened once again today (lost all my linux state). Again, the e820 dump is different from last time (and in fact, also different from the first one). It seems whatever it is, now it has reserved more memory.
I've been trying to try different things in order to see if I can get a repeatable behaviour, but there has been no luck thus far. Something has crossed my mind, though: Could the e820 memory be affected by which devices, and how many of them, are plugged in the USB ports when I hibernate my computer? Because that's one of the things that has changed during this last boot which broke my linux hibernation. I'd like to hear your thoughts on that before I start experimenting with this.
Last edited by setzer22 (2017-05-13 18:41:31)
Offline

Your typical usb key is not covered by the memory map (it concerns memory that is accessed by the CPU directly), but since e820 is nowadays usually a UEFI emulation, basically everything can happen, given a weird enough uefi implementation. sigh.
THe real problem is is likely that the BIOS/UEFI gets confused about the S4 state.
Usually the system "knows" that it's in S4 and that it should really not mess with the memory map, because that will knock out the reloaded system. The strategy for this is to overallocate memory so that it can deal with changes during the S4 and still present the same memory map to the system.
Going in and out of S4 from two sides may just get it loosing track on this and then it thinks "it's ok to use a random new memory map" - maybe it's a problem if windows reboots inbetween?
(because we all know it loves to do that after eg. "a new mouse has been detected, downloading the 500MB+ driver package and going to reboot three or four times" ;-)
Offline
Going in and out of S4 from two sides may just get it loosing track on this and then it thinks "it's ok to use a random new memory map" - maybe it's a problem if windows reboots inbetween?
That also crossed my mind as well, but to the best of my knowledge windows hasn't rebooted at all during this whole week, only hibernations, yet the problem stil occurred. But I will test if a reboot breaks Linux hibernation, since it may be one of the causes.
Offline
I experienced the same issue again, but this time I hadn't booted windows in-between. After three hibernation cycles in arch, this morning the resume hook failed. Same error, and the e820 dump is different.
Could it be that that this is actually a problem with my linux (or maybe bootloader) configuration after all? I had been thinking this was due to Windows all this time, but in fact I hadn't really thoroughly tested several hibernation cycles without switching OS. I will try to replicate this before drawing any conclusions, but any ideas are highly welcome!
Offline

Rather a broken BIOS?
Try passing
acpi_osi=or
acpi_osi=! acpi_osi='Windows 2009'to the kernel command line parameters.
Offline

Have you tried installing the acpi packages and appending acpi to your modules?
Offline

The acpi package provides a userspace util to read various PM related settings, it has nothing to do with S4.
The kernel parameters in comment #14 lie misinform the BIOS reg. the running OS.
The idea is that this might have impact on how the BIOS treats the memory map on S4 resumes.
Offline

Right, as the acpi packages are for power management, I've always assumed hibernation made use of the libraries.
Offline
Thank you for your answers!
I tried setting the acpi_osi command line parameter. The first version gave me an error:
ACPI Error [_OSI] Namespace lookup failure, AE_NOT_FOUND ...If the error may be relevant please let me know and I'll upload an image with the full messages, but I just guessed that my kernel version did not recognise that syntax for the parameter.
As for the second version, I tried it and there was no error message. The system boots as usual. I'll reply back when there are any news.
Offline

As long as your system boots, all is fine.
You explicitly unset the ACPI_OSI field and since 4.9.6 or so, the kernel spams all ACPI "errors" into the boot message (search this forum on this, there's like 100 "what's this" posts ;-)
Offline
I was caught in same problem. solved by adding nobootwait option in /etc/fstab (only on root partiton)
UUID=301c17d1-2dc7-4cae-aa11-8621e4510a52       /               ext4            rw,nobootwait,relatime,data=ordered     0 1
and then problem is solved 
ps: i am not using pm-utils and properitery amd drivers
Last edited by vvk (2017-06-05 18:13:00)
Offline

nobootwait should no longer (since 6 years or so?) be a valid option, also that implies to just continue if the root filesystem cannot be monted, but everything is mounted into the root filesystem.
iow this is probably a false positive coincident or weird side effect of handling the invalid option
dmesg | grep nobootwaitLast edited by seth (2017-06-05 21:01:50)
Offline
nobootwait should no longer (since 6 years or so?) be a valid option, also that implies to just continue if the root filesystem cannot be monted, but everything is mounted into the root filesystem.
iow this is probably a false positive coincident or weird side effect of handling the invalid option
dmesg | grep nobootwait
Man you are right this is just a weird coincidence and dmeseg states that nobootwait is an invalid option (but believe me I tried to hibernate and resume 15-20 times yesterday and it works like a charm) but today morning this problem happen again. Do you have any solution I've searched over the internet (including other distro forums) but nothing works.
Last edited by vvk (2017-06-06 03:57:33)
Offline

"When in doubt, lie" ;-)
https://bbs.archlinux.org/viewtopic.php … 3#p1712303
Offline
Already tried with "Windows 2012" (and now with "Windows 2009" in /etc/boot/refind_linux.conf) but has no effect, although the frequency of failure is reduced.
I am using rEFInd boot manager may be that is the reason?
Last edited by vvk (2017-06-06 09:50:53)
Offline

I doubt so - the boards BIOS/UEFI somehow looses track on the S4 state (thus dares to alter the memory map) that should happen before and regardless of any boot manager.
rEFInd *might* be *one* reason why the board feels the (soft, this should not be necessary) requirement to alter the memory map, but since it's in some S4 it should know that this is not a viable option.
Offline
Pages: 1