You are not logged in.

#1 2018-03-02 14:36:54

Registered: 2018-03-02
Posts: 2

Secondary hard disk doesn't work after suspend

Hello archlinux community!

I run arch on my laptop for quite some time but I'm still improving the configuration here and there. The root and home partitions are on the primary disk (SSD). I have another backup partition on the secondary disk in the hdd caddy that I don't use very often. I want to let the secondary hdd spin down when I don't use it so I read about the APM levels and hdparm here. The issue I have now is that no matter what I try, some processes are stuck waiting on IO after wake up from suspend. Of course, I tried to find solutions to this problem but I couldn't find anything IO specific.

Now, several things happened. At first, I noticed that the APM level seems to reset after wake up from suspend. On the hdparm archwiki page there is a suggestion to create a systemd unit to run hdparm on wakeup but that didn't work very well. At this point I noticed huge "load average" (my cpu runs 4 cores, load average is currently 8) even when CPU is idle, memory usage is low and the system is not using swap. The reason for such a high "load average" value seems to be the "zombie" processes.

The first zombie that I found was the laptop-mode. I tried to disable my APM systemd unit, and let only laptop-mode manage the APM level but that didn't help. Currently both my systemd unit and laptop-mode tools hdd power managament are disabled, but I still get the zombies on wakeup. Now it's the udisks2 daemon. I hear that the secondary hdd is spinning but when I try to mount any of its partitions, try dd, or issue another hdparm command manually, all these processes are again stuck waiting on IO (the D process state in htop) and load average is rising with each of them.

This issue is with me for about two weeks. I can't even go back to previous state where hdd is not spinning down and udisks doesn't get stuck. So, few things changed since the problem appeared. I installed hdparm so laptop-mode possibly started doing hdd power managament only after that. There were few (at least 2 I think) kerenel updates. I am running linux-lts. I am willing to spend some time and figure this out, but I am running out of ideas. I will also leave here warning from dmesg, but it just confirms that the issue is IO related. Maybe you see something interesting in it. Anyway, I hope we can figure out together how to fix this. Thanks!

[47782.876882] INFO: task pool:17503 blocked for more than 120 seconds.
[47782.876890]       Tainted: G           O    4.14.22-1-lts #1
[47782.876892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[47782.876895] pool            D    0 17503      1 0x00000000
[47782.876900] Call Trace:
[47782.876913]  ? __schedule+0x294/0x8a0
[47782.876920]  ? work_busy+0xa0/0xa0
[47782.876923]  schedule+0x28/0x80
[47782.876927]  schedule_timeout+0x1f6/0x360
[47782.876932]  ? check_preempt_wakeup+0x102/0x230
[47782.876936]  ? work_busy+0xa0/0xa0
[47782.876939]  ? wait_for_completion+0xba/0x140
[47782.876943]  wait_for_completion+0xba/0x140
[47782.876947]  ? wake_up_q+0x70/0x70
[47782.876950]  flush_work+0x13a/0x1c0
[47782.876954]  ? worker_detach_from_pool+0xa0/0xa0
[47782.876959]  __cancel_work_timer+0x123/0x1b0
[47782.876963]  ? disk_map_sector_rcu+0x70/0x70
[47782.876969]  ? kobj_lookup+0x113/0x160
[47782.876972]  disk_block_events+0x78/0x90
[47782.876979]  __blkdev_get+0x63/0x440
[47782.876982]  blkdev_get+0x11d/0x300
[47782.876986]  ? bd_acquire+0xd0/0xd0
[47782.876991]  do_dentry_open+0x1b0/0x2d0
[47782.876995]  ? __inode_permission+0x85/0xc0
[47782.876998]  path_openat+0x4f9/0x12f0
[47782.877001]  ? enqueue_entity+0x740/0x780
[47782.877004]  ? select_idle_sibling+0x26/0x410
[47782.877007]  ? check_preempt_curr+0x7e/0x90
[47782.877010]  ? try_to_wake_up+0x54/0x480
[47782.877013]  do_filp_open+0x9b/0x110
[47782.877017]  ? __blkdev_put+0x1c0/0x1f0
[47782.877021]  ? __check_object_size+0xaf/0x1b0
[47782.877025]  ? do_sys_open+0x1bd/0x250
[47782.877028]  do_sys_open+0x1bd/0x250
[47782.877033]  do_syscall_64+0x67/0x120
[47782.877037]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[47782.877040] RIP: 0033:0x7ff16ce43390
[47782.877042] RSP: 002b:00007ff169e7aac0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
[47782.877045] RAX: ffffffffffffffda RBX: 0000557f0dd32120 RCX: 00007ff16ce43390
[47782.877046] RDX: 0000000000000800 RSI: 0000557f0dd5d040 RDI: ffffffffffffff9c
[47782.877048] RBP: 00007ff160007730 R08: 0000000000000000 R09: 00646975752f6d64
[47782.877049] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ff169e7acc0
[47782.877051] R13: 00007ff169e7abe8 R14: 0000000000000000 R15: 00007ff160007730
[51346.632475] pci_bus 0000:01: Allocating resources
[51346.632518] pci_bus 0000:02: Allocating resources
[51346.632601] pci_bus 0000:03: Allocating resources
[51346.632734] pci_bus 0000:08: Allocating resources
[51346.649596] PM: suspend entry (deep)
[51346.649598] PM: Syncing filesystems ...
[55606.089030] pci_bus 0000:01: Allocating resources


#2 2018-03-26 19:44:02

Registered: 2018-03-02
Posts: 2

Re: Secondary hard disk doesn't work after suspend

Removing aur/laptop-mode-tools definitely helped. I still have extra/udisks2 because it's a dependency of some other packages I need. I will remove the udisks and install just laptop-mode-tools to check if the problem is just with the laptop-mode-tools or it's really the combination of the two that makes troubles. Any other suggestions how to debug this?


Board footer

Powered by FluxBB