You are not logged in.

#1 2018-03-10 08:40:14

Inglebard
Member
Registered: 2016-05-10
Posts: 33

[SOLVED] Suspend/Resume corrupt the filesystem

Hi,

I notice since the 18 February 2 fsck at boot.
After investigation, the corruption seems to happen after suspend/resume.

mars 09 21:03:41 Computer systemd[1]: Starting Suspend...
mars 09 21:03:43 Computer ntpd[613]: 2001:418:3ff::1:53 local addr 2a01:cb18:5e2:3a00:8b69:b1c3:7ae4:4db3 -> <null>
mars 09 21:03:42 Computer systemd-sleep[9407]: Suspending system...
mars 09 21:03:43 Computer ntpd[613]: Deleting interface #6 wlp3s0, fe80::13f8:8e94:dacf:afaf%3#123, interface stats: received=0, sent=0, dropped=0, active_time=9014 secs
mars 09 21:03:44 Computer kernel: PM: Syncing filesystems ... done.
mars 09 21:03:44 Computer gsd-rfkill[1362]: g_object_notify: object class 'CcRfkillGlib' has no property named 'kernel-noinput'
mars 09 22:29:36 Computer kernel: rfkill: input handler enabled
mars 09 22:29:36 Computer kernel: Freezing user space processes ... (elapsed 0.072 seconds) done.
mars 09 22:29:36 Computer kernel: OOM killer disabled.
mars 09 22:29:36 Computer kernel: Freezing remaining freezable tasks ... (elapsed 0.063 seconds) done.
mars 09 22:29:36 Computer kernel: Suspending console(s) (use no_console_suspend to debug)
mars 09 22:29:36 Computer kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
mars 09 22:29:36 Computer kernel: sd 0:0:0:0: [sda] Stopping disk
mars 09 22:29:36 Computer kernel: sd 5:0:0:0: [sdb] Synchronizing SCSI cache
mars 09 22:29:36 Computer kernel: sd 5:0:0:0: [sdb] Stopping disk
mars 09 22:29:36 Computer kernel: ACPI: Preparing to enter system sleep state S3
mars 09 22:29:36 Computer kernel: PM: Saving platform NVS memory
mars 09 22:29:36 Computer kernel: Disabling non-boot CPUs ...
mars 09 22:29:36 Computer kernel: smpboot: CPU 1 is now offline
mars 09 22:29:36 Computer kernel: smpboot: CPU 2 is now offline
mars 09 22:29:36 Computer kernel: smpboot: CPU 3 is now offline
mars 09 22:29:36 Computer kernel: smpboot: CPU 4 is now offline
mars 09 22:29:36 Computer kernel: smpboot: CPU 5 is now offline
mars 09 22:29:36 Computer kernel: smpboot: CPU 6 is now offline
mars 09 22:29:36 Computer kernel: smpboot: CPU 7 is now offline
mars 09 22:29:36 Computer kernel: ACPI: Low-level resume complete
mars 09 22:29:36 Computer kernel: PM: Restoring platform NVS memory
mars 09 22:29:36 Computer kernel: Enabling non-boot CPUs ...
mars 09 22:29:36 Computer kernel: x86: Booting SMP configuration:
mars 09 22:29:36 Computer kernel: smpboot: Booting Node 0 Processor 1 APIC 0x2
mars 09 22:29:36 Computer kernel:  cache: parent cpu1 should not be sleeping
mars 09 22:29:36 Computer kernel: CPU1 is up
mars 09 22:29:36 Computer kernel: smpboot: Booting Node 0 Processor 2 APIC 0x4
mars 09 22:29:36 Computer kernel:  cache: parent cpu2 should not be sleeping
mars 09 22:29:36 Computer kernel: CPU2 is up
mars 09 22:29:36 Computer kernel: smpboot: Booting Node 0 Processor 3 APIC 0x6
mars 09 22:29:36 Computer kernel:  cache: parent cpu3 should not be sleeping
mars 09 22:29:36 Computer kernel: CPU3 is up
mars 09 22:29:36 Computer kernel: smpboot: Booting Node 0 Processor 4 APIC 0x1
mars 09 22:29:36 Computer kernel:  cache: parent cpu4 should not be sleeping
mars 09 22:29:36 Computer kernel: CPU4 is up
mars 09 22:29:36 Computer kernel: smpboot: Booting Node 0 Processor 5 APIC 0x3
mars 09 22:29:36 Computer kernel:  cache: parent cpu5 should not be sleeping
mars 09 22:29:36 Computer kernel: CPU5 is up
mars 09 22:29:36 Computer kernel: smpboot: Booting Node 0 Processor 6 APIC 0x5
mars 09 22:29:36 Computer kernel:  cache: parent cpu6 should not be sleeping
mars 09 22:29:36 Computer kernel: CPU6 is up
mars 09 22:29:36 Computer kernel: smpboot: Booting Node 0 Processor 7 APIC 0x7
mars 09 22:29:36 Computer kernel:  cache: parent cpu7 should not be sleeping
mars 09 22:29:36 Computer kernel: CPU7 is up
mars 09 22:29:36 Computer kernel: ACPI: Waking up from system sleep state S3
mars 09 22:29:36 Computer kernel: rtlwifi: rtlwifi: wireless switch is on
mars 09 22:29:36 Computer kernel: sd 0:0:0:0: [sda] Starting disk
mars 09 22:29:36 Computer kernel: sd 5:0:0:0: [sdb] Starting disk
mars 09 22:29:36 Computer kernel: [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
mars 09 22:29:36 Computer kernel: [drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
mars 09 22:29:36 Computer kernel: radeon 0000:01:00.0: WB enabled
mars 09 22:29:37 Computer kernel: radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x000000006d1f8d7d
mars 09 22:29:37 Computer kernel: radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x0000000039c7354b
mars 09 22:29:37 Computer kernel: radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0x000000003b08f30a
mars 09 22:29:37 Computer kernel: [drm] ring test on 0 succeeded in 3 usecs
mars 09 22:29:37 Computer kernel: [drm] ring test on 3 succeeded in 6 usecs
mars 09 22:29:37 Computer kernel: r8169 0000:06:00.0 enp6s0: link down
mars 09 22:29:37 Computer kernel: [drm] ring test on 5 succeeded in 2 usecs
mars 09 22:29:37 Computer kernel: [drm] UVD initialized successfully.
mars 09 22:29:37 Computer kernel: [drm] ib test on ring 0 succeeded in 0 usecs
mars 09 22:29:37 Computer kernel: [drm] ib test on ring 3 succeeded in 0 usecs
mars 09 22:29:37 Computer kernel: ata3: SATA link down (SStatus 0 SControl 300)
mars 09 22:29:37 Computer kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
mars 09 22:29:37 Computer kernel: ata4: SATA link down (SStatus 0 SControl 300)
mars 09 22:29:37 Computer kernel: ata2: SATA link down (SStatus 0 SControl 300)
mars 09 22:29:37 Computer kernel: ata5.00: configured for UDMA/100
mars 09 22:29:37 Computer kernel: [drm] ib test on ring 5 succeeded
mars 09 22:29:37 Computer kernel: OOM killer enabled.
mars 09 22:29:37 Computer kernel: Restarting tasks ... done.
mars 09 22:29:37 Computer kernel: EXT4-fs error (device sdb2): __ext4_get_inode_loc:4619: inode #10492110: block 41943468: comm nmbd: unable to read itable block
mars 09 22:29:37 Computer kernel: Buffer I/O error on dev sdb2, logical block 0, lost sync page write
mars 09 22:29:38 Computer kernel: EXT4-fs error (device sdb2) in ext4_reserve_inode_write:5754: IO failure
mars 09 22:29:38 Computer kernel: EXT4-fs (sdb2): previous I/O error to superblock detected
mars 09 22:29:38 Computer kernel: Buffer I/O error on dev sdb2, logical block 0, lost sync page write
mars 09 22:29:38 Computer kernel: EXT4-fs error (device sdb2) in ext4_orphan_add:2819: IO failure
mars 09 22:29:38 Computer kernel: EXT4-fs (sdb2): previous I/O error to superblock detected
mars 09 22:29:39 Computer kernel: Buffer I/O error on dev sdb2, logical block 0, lost sync page write
mars 09 22:29:39 Computer kernel: EXT4-fs error (device sdb2): __ext4_get_inode_loc:4619: inode #10492110: block 41943468: comm nmbd: unable to read itable block
mars 09 22:29:39 Computer kernel: EXT4-fs (sdb2): previous I/O error to superblock detected
mars 09 22:29:39 Computer kernel: Buffer I/O error on dev sdb2, logical block 0, lost sync page write
mars 09 22:29:39 Computer kernel: EXT4-fs error (device sdb2) in ext4_reserve_inode_write:5754: IO failure
mars 09 22:29:39 Computer kernel: EXT4-fs (sdb2): previous I/O error to superblock detected
mars 09 22:29:40 Computer kernel: Buffer I/O error on dev sdb2, logical block 0, lost sync page write
mars 09 22:29:40 Computer kernel: EXT4-fs error (device sdb2): __ext4_get_inode_loc:4619: inode #10492110: block 41943468: comm nmbd: unable to read itable block
mars 09 22:29:40 Computer kernel: EXT4-fs (sdb2): previous I/O error to superblock detected
mars 09 22:29:40 Computer kernel: Buffer I/O error on dev sdb2, logical block 0, lost sync page write
mars 09 22:29:40 Computer kernel: EXT4-fs error (device sdb2) in ext4_reserve_inode_write:5754: IO failure
mars 09 22:29:40 Computer kernel: EXT4-fs (sdb2): previous I/O error to superblock detected
mars 09 22:29:40 Computer kernel: Buffer I/O error on dev sdb2, logical block 0, lost sync page write
mars 09 22:29:40 Computer kernel: EXT4-fs warning (device sdb2): ext4_evict_inode:284: couldn't mark inode dirty (err -5)
mars 09 22:29:40 Computer kernel: PM: suspend exit
mars 09 22:29:40 Computer kernel: rfkill: input handler disabled

How can this happen if all process are freezed ?
How to avoid this ?

Last edited by Inglebard (2018-07-21 18:52:11)

Offline

#2 2018-07-20 14:33:52

sourcejedi
Member
Registered: 2018-04-04
Posts: 8

Re: [SOLVED] Suspend/Resume corrupt the filesystem

Please read this comment: https://bbs.archlinux.org/viewtopic.php … 3#p1798483

Your experiences sounds the same as what I hit with this bug.  Quoting myself from the last link in the above comment

Does anyone else see "Read-error on swap-device" / "EXT4-fs error" / "Buffer I/O error" if they look in their system logs, that happen around suspend/resume time?

To view historical kernel logs together with SIGBUS reports:

    $ journalctl _TRANSPORT=kernel + COREDUMP_SIGNAL_NAME=SIGBUS

I suggest using a search for the error text (`/` key).  It doesn't look like it happens every time.

A recent one triggered a fsck, which made me pay attention.  Disk errors would make the SIGBUS analysis a lot less mysterious.

Offline

#3 2018-07-20 14:37:20

sourcejedi
Member
Registered: 2018-04-04
Posts: 8

Re: [SOLVED] Suspend/Resume corrupt the filesystem

Specifically, note the IO errors happen *after* "Restarting tasks".  However, they occur before the SATA disk device is resumed.

(The final resume actually happens even later than "PM: suspend exit"; the resume of the SATA device is deferred and allowed to occur asynchronously.  This lets user programs make a lot of progress e.g. redrawing the lock screen, if everything they need is still in RAM.  But there was a bug in the kernel code responsible for making the user programs wait once they need the disk).

Last edited by sourcejedi (2018-07-20 14:38:25)

Offline

#4 2018-07-20 14:41:49

sourcejedi
Member
Registered: 2018-04-04
Posts: 8

Re: [SOLVED] Suspend/Resume corrupt the filesystem

If you still do not have a new enough kernel, and you don't want to install an older kernel (switch to the linux-lts package mentioned in the very first Arch thread about this?), there is a workaround.

Add the option "scsi_mod.scan=sync" to the end of your kernel command line e.g. in GRUB.

Last edited by sourcejedi (2018-07-20 14:53:32)

Offline

#5 2018-07-21 17:03:27

Inglebard
Member
Registered: 2016-05-10
Posts: 33

Re: [SOLVED] Suspend/Resume corrupt the filesystem

Hi,
Thanks for your answer and your explanation.

If you still do not have a [g]new enough kernel[/g], and you don't want to install an older kernel (switch to the linux-lts package mentioned in the very first Arch thread about this?), there is a workaround.

I do not suspend my computer since the problem appear. Do you know if the issue have been patched in the last kernel (I use 4.17.8-1) or linux-lts is the only solution ?

Offline

#6 2018-07-21 18:26:42

sourcejedi
Member
Registered: 2018-04-04
Posts: 8

Re: [SOLVED] Suspend/Resume corrupt the filesystem

Then you should hopefully be able to suspend without worrying about this problem. Any kernel marked 4.17+ or 4.16.8+ will have the fix.

There is a bug which causes this introduced in upstream kernel v4.14, which was fixed in v4.17 and v4.16.8.

Last edited by sourcejedi (2018-07-21 18:28:27)

Offline

#7 2018-07-21 18:51:41

Inglebard
Member
Registered: 2016-05-10
Posts: 33

Re: [SOLVED] Suspend/Resume corrupt the filesystem

Ok, again thanks for the details.

Then you should hopefully be able to suspend without worrying about this problem.

I hope so, because random fsck at boot are not really fun.

So I consider the topic as solved.

Offline

Board footer

Powered by FluxBB