System freeze caused by 100% iowait

saponace · 2019-05-26 20:14:14

Hi,

I have had an issue I tried to solve for the past couple of months and it's really making me go nuts.

I have a home server on which all services are dockerized.
I made a script to install everything on the server from scratch.
about two months ago I decided everything was ready for a first "release", do wiped out my test server and ran the script on it.

I face an issue I had never encountered while working on the script (and having the server running for days in a row): System hangs after running for a while (10min to hours, so very annoying to test)

I start the server in two steps:
* Mount all the drives (2*2TB data drives (LUKS+ext4), 2*500GB MDADM RAID1 (LUKS+ext4), 1*240GB proxy drive (ext4)
* Docker-compose up

Problem seems to only occure after both steps are completed.
My issue is I have no idea what to look for. I've been going from dead-end to dead-end for the past two months, it's driving me crazy.

Only leads I have are it seems to come from a (several ?) process constantly writing on the MDADM RAID1 (md0):

ps auxf | awk '{if($8=="D") print $0;}':

root       633  0.0  0.0      0     0 ?        D    13:06   0:00  \_ [md0_raid1]
root      4545  0.0  0.0      0     0 ?        D    13:09   0:00  \_ [dmcrypt_write/2]
root      4558  0.0  0.0      0     0 ?        D    13:09   0:00  \_ [jbd2/dm-0-8]
root      5953  0.0  0.0      0     0 ?        D    15:03   0:00  \_ [kworker/u32:1+flush-253:0]

(lsof on these four processes does not show anything relevant)
Even tho jbd2 is constantly here, there is not journal written in /var/log/*log

ps -ef|grep jbd2:

 
root       364     2  0 17:34 ?        00:00:00 [jbd2/sda3-8]
root      4478     2  0 17:37 ?        00:00:00 [jbd2/dm-0-8]
root      4493     2  0 17:37 ?        00:00:00 [jbd2/dm-1-8]
root      4506     2  0 17:37 ?        00:00:00 [jbd2/dm-2-8]
root      4510     2  0 17:37 ?        00:00:00 [jbd2/sdf1-8]

iotop does not show any write on the drives (or very very reasonable occasionnal writes)

md0 is composed of sdb and sdc
Hdparm benchmarks:

hdparm -T -t /dev/sdb
/dev/sdb:
 Timing cached reads:   21836 MB in  2.00 seconds = 10938.52 MB/sec
 Timing buffered disk reads: 260 MB in  3.01 seconds =  86.48 MB/sec

hdparm -T -t /dev/sdc
/dev/sdc:
 Timing cached reads:   20402 MB in  2.00 seconds = 10216.23 MB/sec
 Timing buffered disk reads: 338 MB in  3.00 seconds = 112.61 MB/sec

hdparm -T -t /dev/sdc
/dev/md0:
 Timing cached reads:     2 MB in  3.72 seconds = 550.43 kB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads: 258 MB in  3.01 seconds =  85.69 MB/sec

Here the 550.43kB/sec is very very low

Here are the sleep delay of these two drives:

hdparm -B /dev/sdc
/dev/sdc:
 APM_level      = 128

hdparm -B /dev/sdb
/dev/sdb:
 APM_level      = 96

(I know they are not the same, but it shouldn't be a problem, right ?)

When the problem occurs, the OS itself is still usable, just any command acting on the md0 device (hdparm, docker srervices having mountpoints on this device, etc.) are totally unresponsive and hang straight.
From there the only solution is to physically power-off the machine (poweroff/shutdown put the machine in a weird state: I can't ssh anymore, but it does not turn off).

I could really appreciate any help right now. starting to loose hope I'll ever fix this issue...

Thanks

Last edited by saponace (2019-05-27 15:26:26)

saponace · 2019-06-03 19:58:29

No-one can help me ?
I could really use any piece of advice now. I have strictly no lead on how to start tackling the problem and start being desperate ...

sabroad · 2019-06-04 13:50:35

saponace wrote:

process constantly writing on the MDADM RAID1 (md0)

What's the status of the RAID array?

$ cat /proc/mdstat

saponace · 2019-06-04 19:18:08

Thanks for the answer.

mdstats is as follow:

Personalities : [raid1] 
md0 : active raid1 sdc[1] sdb[0]
      624999744 blocks super 1.2 [2/2] [UU]
      bitmap: 1/5 pages [4KB], 65536KB chunk

unused devices: <none>

bitmap is 1/5 when system is frozen, rebooted it, and before it froze, bitmap was at 0/5 (I don't think this piece of information is relevant, but never know ...)

sabroad · 2019-06-05 09:28:28

Do either of sdb/sdc report smart errors?

smartctl -H -l error /dev/sdb

Last edited by sabroad (2019-06-05 09:28:58)

saponace · 2019-06-05 21:19:00

/dev/sdb shows no error, /dev/sdc shows a few errors. Do you think these could be the reason it freezes the whole system the way it does ? The reason I'm surprised is it used to not do this error (before I re-install everything), and it started at the exact same day I reinstalled the whole system.

Here is the output of the command for /dev/sdc:

sudo smartctl -H -l error /dev/sdc
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.1.4-arch1-1-ARCH] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Error Log Version: 1
ATA Error Count: 5
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 5 occurred at disk power-on lifetime: 11336 hours (472 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 a0 bf 8b 0d 00  Error: ICRC, ABRT 160 sectors at LBA = 0x000d8bbf = 887743

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 c0 9f 8b 0d 40 00   5d+06:14:02.494  WRITE DMA EXT
  35 00 10 c7 40 01 40 00   5d+06:13:56.099  WRITE DMA EXT
  35 00 10 e7 3b 01 40 00   5d+06:13:55.809  WRITE DMA EXT
  25 00 20 9f f3 0c 40 00   5d+06:13:49.617  READ DMA EXT
  25 00 08 27 60 e9 40 00   5d+06:13:49.494  READ DMA EXT

Error 4 occurred at disk power-on lifetime: 8989 hours (374 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 08 a7 58 15 09  Error: ICRC, ABRT 8 sectors at LBA = 0x091558a7 = 152393895

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 18 97 58 15 40 00   5d+23:52:05.705  WRITE DMA EXT
  35 00 40 3f a9 06 40 00   5d+23:51:57.741  WRITE DMA EXT
  35 00 20 7f bb 06 40 00   5d+23:51:57.739  WRITE DMA EXT
  35 00 20 7f b1 06 40 00   5d+23:51:57.737  WRITE DMA EXT
  35 00 20 bf aa 06 40 00   5d+23:51:57.735  WRITE DMA EXT

Error 3 occurred at disk power-on lifetime: 8418 hours (350 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 68 17 56 e4 08  Error: ICRC, ABRT 104 sectors at LBA = 0x08e45617 = 149181975

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 78 07 56 e4 40 00   1d+20:44:39.274  WRITE DMA EXT
  35 00 40 7f 30 1e 40 00   1d+20:44:39.256  WRITE DMA EXT
  35 00 60 1f 30 1e 40 00   1d+20:44:39.241  WRITE DMA EXT
  25 00 20 5f 2a 1e 40 00   1d+20:44:39.215  READ DMA EXT
  35 00 20 5f 2f 1e 40 00   1d+20:44:38.976  WRITE DMA EXT

Error 2 occurred at disk power-on lifetime: 1818 hours (75 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 88 f2 04 00  Error: WP at LBA = 0x0004f288 = 324232

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 18 38 22 92 40 00      00:00:22.582  WRITE FPDMA QUEUED
  61 08 10 88 d0 96 40 00      00:00:22.269  WRITE FPDMA QUEUED
  60 40 08 00 e9 fc 40 00      00:00:21.632  READ FPDMA QUEUED
  60 08 00 88 f2 04 40 00      00:00:21.613  READ FPDMA QUEUED
  61 08 18 48 cb 96 40 00      00:00:21.613  WRITE FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 1818 hours (75 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 88 f2 04 00  Error: WP at LBA = 0x0004f288 = 324232

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 08 18 48 cb 96 40 00      00:00:19.420  WRITE FPDMA QUEUED
  60 08 10 a0 49 72 40 00      00:00:19.045  READ FPDMA QUEUED
  60 18 08 78 49 72 40 00      00:00:19.045  READ FPDMA QUEUED
  60 08 00 88 f2 04 40 00      00:00:19.007  READ FPDMA QUEUED
  60 08 00 08 f3 04 40 00      00:00:18.990  READ FPDMA QUEUED

sabroad · 2019-06-06 14:41:38

The SMART errors aren't frequent enough to explain your issue.

What's the output of iostat during an episode?

iostat sdb sdc md0 1 2

saponace · 2019-06-08 23:21:21

I really don't get it ...

I thought root cause of the issue could be aging/damaged drives. So I tried to replace these disks to make another array.
Only thing I had available was two pen drives. I did with these pen drives the exact same LUKS/MDADM config as my original setup.
I expected to either experience the exact same thing (100% iowait, nothing responds) or working smoothly. I got something in between: 45% iowait and everything working but very slow (like emby homepage taking 10-15 sec to load) ...

Since then, I rolled back to the orignial config (two hard drives in an external USB enclosure). Now I only have 15% iowait, but the services that used to not respond don't respond anymore.

I understand nothing about what's happening.

Here is the output of "iostat sdd sdg md0 1 2":

Linux 5.1.4-arch1-1-ARCH (tartiflette)  06/09/2019      _x86_64_        (12 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                  0.31     0.07      1.16       44.70     0.00     53.76

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
sdg                  0.33         3.24         4.56         0.00             830022     1169627          0
sdd                  0.40         7.35         4.56         0.00            1885216    1169627          0
md0                 0.59        10.59        4.55         0.00            2717607    1166348          0


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                  0.08     0.08     0.42       16.57      0.00     82.85

Device             tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd
sdg                  0.00         0.00         0.00           0.00            0               0              0
sdd                  0.00         0.00         0.00           0.00            0               0              0
md0                 0.00         0.00         0.00           0.00            0               0              0

I'm starting to be pretty desesperate.
Tomorrow I'll try the same with only one drive, not encrypted, not raided and see what happens

EDIT: I moves drives around, so the md0 drives are now sdc and sdg. But they stilll are the same drives as earlier

Last edited by saponace (2019-06-08 23:24:08)

sabroad · 2019-06-10 15:26:50

Could this be related to swapfile on LUKS?
What's the output of vmstat during an episode?

vmstat 1 5

saponace · 2019-07-24 15:13:42

I actually forgot to give an update.

Amongst all the different things I tried, moving the two HDDs sdc and sdd from USB to SATA fixed the issue.

I guess the USB protocol does not handle well some kind of file transfert (I guess it's very frequent tiny modifications). Even though it's USB3.

Sorry for the delay, and thank you for the help

Arch Linux

#1 2019-05-26 20:14:14

System freeze caused by 100% iowait

#2 2019-06-03 19:58:29

Re: System freeze caused by 100% iowait

#3 2019-06-04 13:50:35

Re: System freeze caused by 100% iowait

#4 2019-06-04 19:18:08

Re: System freeze caused by 100% iowait

#5 2019-06-05 09:28:28

Re: System freeze caused by 100% iowait

#6 2019-06-05 21:19:00

Re: System freeze caused by 100% iowait

#7 2019-06-06 14:41:38

Re: System freeze caused by 100% iowait

#8 2019-06-08 23:21:21

Re: System freeze caused by 100% iowait

#9 2019-06-10 15:26:50

Re: System freeze caused by 100% iowait

#10 2019-07-24 15:13:42

Re: System freeze caused by 100% iowait

Board footer