You are not logged in.

#1 2021-07-01 14:29:24

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

[SOLVED] I/O operations cause lags/stutters on zen kernel

Some info about my setup:
- Arch's installed on an HDD (WDC WD10EZEX-75WN4A0 @7200RPM)
- Swap space is 10GB
- Physical memory (RAM) is 16GB
- vm.swappiness is set to 60, the default value
- The I/O scheduler is bfq
- I have almost no info about Zswap and Zram, but wiki says ZSwap is enabled by default

What happens:
- Start an I/O operation like copying files to or from external HDD
- It starts filling up ram with buffer/cache
- Once RAM is full, swap starts filling up
- When swapping starts, system stutters and freezes
- KDE lags, the mouse cursor frequently freezes for half a second, firefox takes a while to switch tabs, terminal prompt has added latency, etc etc.
- Even after said I/O operation is finished, swap isn't cleared, requiring me to run swapoff -a && swapon -a to get rid of it

What I expected to happen:
- Smooth operation of the GUI while running the said I/O operations
- Swap being used only when absolutely needed, to minimize I/O performance drops
- Buffer/cache should not push out needed applications like the GUI to swap, as this possibly causes it to stutter.

I know very well that swapping will be slower than using RAM due to HDD being significantly slower than RAM, but still I would like to reduce the amount of swapping or make it so that swap is only used as a last resort in I/O operations. Thanks in advance!

UPDATE: The issue seems to be related to the linux-zen kernel. The linux-zen kernel behaves as described above. The vanilla kernel doesn't suffer from stutters and lags, mainly because it doesn't use swap, and if it does, the amount swapped is negligible even with vm.swappiness at 60. The problem doesn't lie in the external HDD, I tested it later on in the thread. Doesn't seem to be an I/O scheduler issue either since behaviour with BFQ and mq-deadline is the same. Setting vm.swappiness=1 doesn't help either, Zen prefers swap thrashing over dropping inactive file caches.

UPDATE #2 : It was indeed an issue with the zen kernel, and has been fixed thanks to the developers over at the zen-kernel Github page.

Last edited by HotDogEnemy (2022-01-13 13:17:06)

Offline

#2 2021-07-01 15:54:34

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Once RAM is full, swap starts filling up

Swap being used only when absolutely needed

https://lonesysadmin.net/2013/12/22/bet … rty_ratio/
https://stackoverflow.com/questions/279 … ound-ratio

sysctl -a | grep dirty

Offline

#3 2021-07-01 16:19:22

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Here's the output of sysctl -a | grep dirty:

~ > sudo sysctl -a | grep dirty
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 20
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200

From what I understand from the links you sent, the file caching caches the writes to be committed to disk in memory, and once this cache reaches 10% of available system memory the kernel starts writing them to disk, If the cache reaches 20% of system memory, the application blocks other applications so that I/O can be written to disk. So I should set vm.dirty_ratio to a higher value so that the blocking occurs less? Or should I set vm.dirty_background_ratio to a lower value so that cached I/O gets written to disk faster?

Offline

#4 2021-07-01 17:00:58

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Is the reading device much faster than the writing one?
The answer is most likely "both", raise the hard limit a bit and lower the soft limit a bit, but if you transfer huge data from a fast to a slow device you'll keep running into the hard limit.

However, this doesn't explain the claimed OOm/swap utilization (unless the RAM is nearly full anyway and you exceed it w/ the hard limit, in which case you actually want to lower both values, esp. if the swap is on the target drive)

Offline

#5 2021-07-01 17:33:48

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Actually I had a situation with this yesterday: Steam was installing a game update 120 MB in size, but the installation was taking abnormally long. So i checked htop, and the total memory used by apps was at 5GB, with the rest of it filled with cache, and the swap was 2.5GB full. Not sure if that's a problem with steam or with the IO.

And a month ago I was transferring data from an external HDD to my system storage (on hDD as well) and back again, shifting the data back to the external HDD caused a lot of data to be swapped. It would seem that the internal HDD was faster than the external so that explains why it was swapping that much.

Would a soft limit of 5 and a hard limit of 25 work?

Last edited by HotDogEnemy (2021-07-01 17:34:06)

Offline

#6 2021-07-01 19:19:22

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

120 MB in size

???
If you can trigger this, please check /proc/meminfo

Better™ limits depend on the actual setup, ie. the amount of data to be transferred and the speed of the involved devices.
I'd however look at the memory distribution unrelated to (triggering) IO, I don't think that's the immediate cause for your memory issues.
Eg. there's recently been https://bbs.archlinux.org/viewtopic.php?pid=1976160

Even after said I/O operation is finished, swap isn't cleared, requiring me to run swapoff -a && swapon -a to get rid of it

Ftr, that's not relevant, the swap isn't purged or read back into RAM w/o pressure (because you either need the swapped data or the swap)

Offline

#7 2021-07-01 19:49:29

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Now that you mention it, it may have been a memleak. However I don't think I can replicate it, the game update's already been applied and I don't know how to roll it back. The game is Team Fortress 2 in case that's important.

I don't have enough information right now to make an informed decision regarding the better limit settings, so I'll hold back on that and read some more wiki pages about memory performance, and maybe benchmark I/O and other stuff using the software mentioned here: https://wiki.archlinux.org/title/Benchmarking

I'd however look at the memory distribution unrelated to (triggering) IO

How should I go about checking my memory distribution?

Last edited by HotDogEnemy (2021-07-01 19:50:27)

Offline

#8 2021-07-01 19:54:21

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

How should I go about checking my memory distribution?

cat /proc/meminfo

You can look there right now, just to see what's "normal".

Offline

#9 2021-07-01 19:59:49

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Here's the normal state of /proc/meminfo:

~ > cat /proc/meminfo 
MemTotal:       16342660 kB
MemFree:         2020624 kB
MemAvailable:   10508108 kB
Buffers:          397840 kB
Cached:          8372684 kB
SwapCached:        17704 kB
Active:          1658012 kB
Inactive:       11366084 kB
Active(anon):      64564 kB
Inactive(anon):  4438784 kB
Active(file):    1593448 kB
Inactive(file):  6927300 kB
Unevictable:       14328 kB
Mlocked:           14328 kB
SwapTotal:      10485756 kB
SwapFree:       10345704 kB
Dirty:              1584 kB
Writeback:             0 kB
AnonPages:       4252280 kB
Mapped:           928176 kB
Shmem:            245612 kB
KReclaimable:     331384 kB
Slab:             451012 kB
SReclaimable:     331384 kB
SUnreclaim:       119628 kB
KernelStack:       16416 kB
PageTables:        57124 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18657084 kB
Committed_AS:   12120384 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       39832 kB
VmallocChunk:          0 kB
Percpu:             3856 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     1889104 kB
DirectMap2M:    13791232 kB
DirectMap1G:     1048576 kB 

Offline

#10 2021-07-01 20:05:39

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Unevictable and mlock are unsupicious so it's probably not the linked leak.
Most memory is used (which is expectable) but in inactive anon pages and file caches (what makes ~10GB available and another 4GB  easily swappable)
There's no apparent problem with that memory usage, so now we need a meminfo from a moment where it's bad™

Offline

#11 2021-07-02 20:13:34

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Here is the /proc/meminfo contents before copying FROM an external HDD:

MemTotal:       16342660 kB
MemFree:        14342844 kB
MemAvailable:   15138572 kB
Buffers:          162812 kB
Cached:           864164 kB
SwapCached:            0 kB
Active:           521304 kB
Inactive:        1150692 kB
Active(anon):       1488 kB
Inactive(anon):   661156 kB
Active(file):     519816 kB
Inactive(file):   489536 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485756 kB
SwapFree:       10485756 kB
Dirty:               648 kB
Writeback:             0 kB
AnonPages:        645028 kB
Mapped:           372260 kB
Shmem:             17616 kB
KReclaimable:      79804 kB
Slab:             136168 kB
SReclaimable:      79804 kB
SUnreclaim:        56364 kB
KernelStack:        8096 kB
PageTables:        17868 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18657084 kB
Committed_AS:    4129664 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       29360 kB
VmallocChunk:          0 kB
Percpu:             2624 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      232272 kB
DirectMap2M:     4962304 kB
DirectMap1G:    11534336 kB

Here is the meminfo while halfway through copying FROM it:

MemTotal:       16342660 kB
MemFree:        11397096 kB
MemAvailable:   15060260 kB
Buffers:         1168336 kB
Cached:          2670744 kB
SwapCached:            0 kB
Active:           573936 kB
Inactive:        3945992 kB
Active(anon):       1488 kB
Inactive(anon):   697260 kB
Active(file):     572448 kB
Inactive(file):  3248732 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485756 kB
SwapFree:       10485756 kB
Dirty:             92068 kB
Writeback:             0 kB
AnonPages:        680948 kB
Mapped:           431912 kB
Shmem:             17808 kB
KReclaimable:     179980 kB
Slab:             240480 kB
SReclaimable:     179980 kB
SUnreclaim:        60500 kB
KernelStack:        7680 kB
PageTables:        16428 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18657084 kB
Committed_AS:    3998480 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       28976 kB
VmallocChunk:          0 kB
Percpu:             2624 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      246608 kB
DirectMap2M:     7045120 kB
DirectMap1G:     9437184 kB

This is the meminfo 5 seconds after copying completes:

MemTotal:       16342660 kB
MemFree:        10465272 kB
MemAvailable:   15080248 kB
Buffers:         1483624 kB
Cached:          3256384 kB
SwapCached:            0 kB
Active:           623588 kB
Inactive:        4783568 kB
Active(anon):       1484 kB
Inactive(anon):   683488 kB
Active(file):     622104 kB
Inactive(file):  4100080 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485756 kB
SwapFree:       10485756 kB
Dirty:             30544 kB
Writeback:             0 kB
AnonPages:        667196 kB
Mapped:           363156 kB
Shmem:             17808 kB
KReclaimable:     230788 kB
Slab:             291616 kB
SReclaimable:     230788 kB
SUnreclaim:        60828 kB
KernelStack:        7424 kB
PageTables:        14924 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18657084 kB
Committed_AS:    3832084 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       28720 kB
VmallocChunk:          0 kB
Percpu:             2624 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      248656 kB
DirectMap2M:     7043072 kB
DirectMap1G:     9437184 kB

While copying from the external HDD, there were no freezes on the desktop. I do think that copying TO it will be slower and cause lag, since smartctl -i reports the internal storage having an RPM of 7200, while the external is 5400 RPM. I could try copying to the external HDD if you want me to, and give the same results as I gave above.

Also related might be the fact that smartctl -t short for the external gives this:

~ > sudo smartctl -H /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.47-1-lts] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   001   001   050    Pre-fail  Always   FAILING_NOW 2047

Offline

#12 2021-07-02 20:39:03

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Drive failure expected in less than 24 hours. SAVE ALL DATA.

I don't even think that you've a cache pressing issue…

sudo smartctl -a /dev/sdc

Offline

#13 2021-07-02 20:49:16

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

TBH this external hard drive is very old, so it might be the age that's giving me issues. Last time when I connected the drive and the system began lagging, KDE smartctl notifs didn't pop up to warn me about drive failure, so I never thought about checking it out at that point in time. I just remembered that awhile ago I used BleachBit to clean up my internal HDD, and I enabled the "Wipe free space" option without looking up what it was, and the entire system began to hang once bleachbit got to that part.

Here's the output of sudo smartctl -a /dev/sdc:

~ > sudo smartctl -a /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.47-1-lts] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA MK5076GSXN
Serial Number:    71CKB4XGB
LU WWN Device Id: 5 000039 369b0264f
Firmware Version: GB001M
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 1.5 Gb/s)
Local Time is:    Sat Jul  3 02:13:34 2021 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (  73)	The previous self-test completed having
					a test element that failed and the test
					element that failed is not known.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			(0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	(   2) minutes.
Extended self-test routine
recommended polling time: 	( 193) minutes.
SCT capabilities: 	      (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       2468
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       3251
  5 Reallocated_Sector_Ct   0x0033   001   001   050    Pre-fail  Always   FAILING_NOW 2047
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   083   083   000    Old_age   Always       -       7163
 10 Spin_Retry_Count        0x0033   164   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3181
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       403
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       469
193 Load_Cycle_Count        0x0032   094   094   000    Old_age   Always       -       60538
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       38 (Min/Max 10/63)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       329
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       36
222 Loaded_Hours            0x0032   085   085   000    Old_age   Always       -       6138
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       309
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 5326 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 5326 occurred at disk power-on lifetime: 7162 hours (298 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 11 08 80 53 02 40  Error: ABRT 8 sectors at LBA = 0x00025380 = 152448

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 a8 af d2 40 00      00:13:22.314  READ DMA EXT
  35 00 18 68 a6 02 40 00      00:13:22.313  WRITE DMA EXT
  35 00 18 38 a6 02 40 00      00:13:22.313  WRITE DMA EXT
  35 00 10 20 a6 02 40 00      00:13:22.313  WRITE DMA EXT
  35 00 18 00 a6 02 40 00      00:13:22.312  WRITE DMA EXT

Error 5325 occurred at disk power-on lifetime: 7155 hours (298 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 11 ef 61 b4 02 40  Error: ABRT 239 sectors at LBA = 0x0002b461 = 177249

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 f0 60 b4 02 40 00      00:14:11.054  WRITE DMA EXT
  35 00 f0 70 b3 02 40 00      00:14:11.051  WRITE DMA EXT
  35 00 f0 80 b2 02 40 00      00:14:11.048  WRITE DMA EXT
  35 00 f0 90 b1 02 40 00      00:14:11.045  WRITE DMA EXT
  35 00 f0 a0 b0 02 40 00      00:14:11.042  WRITE DMA EXT

Error 5324 occurred at disk power-on lifetime: 7155 hours (298 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 11 ef 01 8d 02 40  Error: ABRT 239 sectors at LBA = 0x00028d01 = 167169

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 f0 00 8d 02 40 00      00:14:07.001  WRITE DMA EXT
  35 00 f0 10 8c 02 40 00      00:14:06.998  WRITE DMA EXT
  35 00 f0 20 8b 02 40 00      00:14:06.995  WRITE DMA EXT
  35 00 f0 30 8a 02 40 00      00:14:06.992  WRITE DMA EXT
  35 00 f0 40 89 02 40 00      00:14:06.989  WRITE DMA EXT

Error 5323 occurred at disk power-on lifetime: 7155 hours (298 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 11 ef 41 6b 02 40  Error: ABRT 239 sectors at LBA = 0x00026b41 = 158529

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 f0 40 6b 02 40 00      00:14:00.110  WRITE DMA EXT
  35 00 f0 50 6a 02 40 00      00:14:00.107  WRITE DMA EXT
  35 00 f0 60 69 02 40 00      00:14:00.104  WRITE DMA EXT
  35 00 f0 70 68 02 40 00      00:14:00.102  WRITE DMA EXT
  35 00 f0 80 67 02 40 00      00:14:00.099  WRITE DMA EXT

Error 5322 occurred at disk power-on lifetime: 7155 hours (298 days + 3 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 11 ef 21 4f 02 40  Error: ABRT 239 sectors at LBA = 0x00024f21 = 151329

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 f0 20 4f 02 40 00      00:13:55.621  WRITE DMA EXT
  35 00 f0 30 4e 02 40 00      00:13:55.618  WRITE DMA EXT
  35 00 f0 40 4d 02 40 00      00:13:55.615  WRITE DMA EXT
  35 00 f0 50 4c 02 40 00      00:13:55.612  WRITE DMA EXT
  35 00 f0 60 4b 02 40 00      00:13:55.609  WRITE DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: unknown failure    90%      7162         0
# 2  Short offline       Completed: unknown failure    90%      7117         0
# 3  Short offline       Completed without error       00%         3         -
# 4  Short offline       Completed without error       00%         2         -
# 5  Short offline       Completed without error       00%         2         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Last edited by HotDogEnemy (2021-07-02 20:53:14)

Offline

#14 2021-07-02 21:05:25

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

  5 Reallocated_Sector_Ct   0x0033   001   001   050    Pre-fail  Always   FAILING_NOW 2047
…
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       329

The drive is falling apart. Secure all data as good as possible (dd_rescue) and stop using it.
You can run https://wiki.archlinux.org/title/Badblocks but that will only help on an isolated damage (eg. if sectors got physically damaged by an accident that crashed the IO head on the disc)
Otherwise blocks will keep falling out and you losing data.


PSA: Also be careful w/ BleachBit.
It'll rather not damage your HW but it does have a habit to "securely delete" (ie. write) currently mmaped inodes - which will make random processes crash. And most stuff it deletes then just has to be re-created because it's required caches.

Offline

#15 2021-07-02 21:14:54

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Can ddrescue copy to a directory on disk? I don't want to unintentionally destroy my system by running the wrong command.

Also, considering the whole thread, would you say my install doesn't actually have any problems with swap, ram + buffer/cache and I/O except for writing data to slower devices from faster ones which makes file cache build up and run into the hard limit, blocking other processes?

Offline

#16 2021-07-02 21:19:35

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

dd_rescue is like dd - it allows you to eg. dump /dev/sdc into a file (disk image) on a different drive.
The difference between dd and dd_rescue is that dd_rescue will just continue on IO errors.

I think the IO stalls because of the failing IO on the broken drive. This can in theory still fill up the dirty pages, but it's not a regular condition either.
Did you ever have such problems when the dying drive wasn't involved?

Offline

#17 2021-07-02 21:37:18

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

So, I installed ddrescue and am using this arch wiki page to proceed to recover data from it: https://wiki.archlinux.org/title/disk_cloning

I tried running

sudo ddrescue -n /dev/sdc /home/<myusername>/HDDBACKUP/hdd.img rescue.map

and it says it needs 3h 30min to completely recover data, plus the archwiki says it'll take two rounds to recover the data. I'll try this later since it's late here.

Also, I don't recall any other instances of RAM, swap and I/O problems besides the external drive, and those which I do remember turned out to be BleachBit's quirks and a memleak (with the steam update). Is there any way to benchmark this so I can rule out my system being the source of the problem?

Last edited by HotDogEnemy (2021-07-02 21:37:34)

Offline

#18 2021-07-02 21:42:14

seth
Member
Registered: 2012-09-03
Posts: 51,215

Offline

#19 2021-07-03 20:42:25

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Okay so I rescued the data from the failing hard disk, but during the time I ran ddrescue, the system began to lag just like how I described it in the post. I captured the meminfo while it was lagging,here it is:

MemTotal:       16339584 kB
MemFree:          172884 kB
MemAvailable:   14349960 kB
Buffers:         6977688 kB
Cached:          7197228 kB
SwapCached:       190152 kB
Active:           644936 kB
Inactive:       14298296 kB
Active(anon):     488688 kB
Inactive(anon):   299236 kB
Active(file):     156248 kB
Inactive(file): 13999060 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485756 kB
SwapFree:        8855188 kB
Dirty:             24572 kB
Writeback:           136 kB
AnonPages:        672400 kB
Mapped:           236408 kB
Shmem:             19212 kB
KReclaimable:     359380 kB
Slab:             485012 kB
SReclaimable:     359380 kB
SUnreclaim:       125632 kB
KernelStack:       13424 kB
PageTables:        27540 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18655548 kB
Committed_AS:    7096024 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       36668 kB
VmallocChunk:          0 kB
Percpu:             3008 kB
HardwareCorrupted:     0 kB
AnonHugePages:    110592 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      496464 kB
DirectMap2M:    14135296 kB
DirectMap1G:     2097152 kB

===========FIRST STRESS TEST===========

After rebooting and nothing open besides a couple firefox tabs and a terminal, I ran a stress testing command:

stress --cpu 4 --vm 8 --hdd 8  --io 8 --timeout 120s

The meminfo during this:

MemTotal:       16339592 kB
MemFree:         3280572 kB
MemAvailable:   12623880 kB
Buffers:          272868 kB
Cached:          9090464 kB
SwapCached:            0 kB
Active:          4105380 kB
Inactive:        8210252 kB
Active(anon):    2980728 kB
Inactive(anon):    15912 kB
Active(file):    1124652 kB
Inactive(file):  8194340 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485756 kB
SwapFree:       10485756 kB
Dirty:           5702440 kB
Writeback:          8152 kB
AnonPages:       2895116 kB
Mapped:           553520 kB
Shmem:             44416 kB
KReclaimable:     361768 kB
Slab:             461048 kB
SReclaimable:     361768 kB
SUnreclaim:        99280 kB
KernelStack:       14992 kB
PageTables:        30864 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18655552 kB
Committed_AS:    9045692 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       39316 kB
VmallocChunk:          0 kB
Percpu:             4288 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1523712 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      432976 kB
DirectMap2M:    12101632 kB
DirectMap1G:     4194304 kB

This test didn't cause any lag in the system interface. I noticed that swap wasn't utilized as the memory workers didn't fill up the RAM, so I increased the number of memory workers to 60 in the next test.



===========SECOND STRESS TEST===========

For the second test, I ran:

 stress --cpu 4 --vm 60 --hdd 8  --io 8 --timeout 120s

The meminfo during this:

MemTotal:       16339592 kB
MemFree:         3522852 kB
MemAvailable:    6565212 kB
Buffers:            2604 kB
Cached:          3204160 kB
SwapCached:        74424 kB
Active:          8543728 kB
Inactive:        3164400 kB
Active(anon):    8462456 kB
Inactive(anon):    39716 kB
Active(file):      81272 kB
Inactive(file):  3124684 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485756 kB
SwapFree:        8653888 kB
Dirty:           2982164 kB
Writeback:          7972 kB
AnonPages:       7919860 kB
Mapped:            81928 kB
Shmem:               416 kB
KReclaimable:     173856 kB
Slab:             297308 kB
SReclaimable:     173856 kB
SUnreclaim:       123452 kB
KernelStack:       16352 kB
PageTables:        47676 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18655552 kB
Committed_AS:   22781904 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       40724 kB
VmallocChunk:          0 kB
Percpu:             4288 kB
HardwareCorrupted:     0 kB
AnonHugePages:   4423680 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      435024 kB
DirectMap2M:    12099584 kB
DirectMap1G:     4194304 kB

With this amount of memory workers, swap was utilised, and the system began to lag considerably.


Also, while ddrescue was freezing my pc, I was reminded about the time I tried verifying a GTA V install through the Legenday launcher. The verification process caused swapping and lags while reading the disk. Swap being used seems to be the common factor in all of this, and more often than not it usually happens when doing I/O stuff.

Offline

#20 2021-07-03 20:55:04

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

swap was utilised, and the system began to lag considerably

Yes, swap is slow…

You maintained ~3GB of inactive file caches - lowering vm.swappiness would help here but bear in mind that this is an artificial test.
Under normal circumstances high RAM pressure often™ goes along high disk IO. You can actively drop cached if you predict a special RAM usage pattern.
https://www.thegeekdiary.com/how-to-cle … der-linux/

About the lagginess: you could try to disable zswap, https://wiki.archlinux.org/title/Zswap
Also swap partitions inside a LVM are reported over and over again to be slow on this board (in which case you might want to try a swap file instead)

Offline

#21 2021-07-03 22:07:27

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

I tried disabling Zswap, but it didn't help much with the lag, I think it actually increased it. The stress test would complete after a longer time than when zswap is enabled, and it would take a longer time to recover from the lag. I also only use a swap parition on a non-LVM disk (unless arch sets it to LVM by default during manual install). So I guess my only way of dealing with this is to drop caches manually when doing high I/O.

Offline

#22 2021-07-04 06:53:11

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

"high I/O" doesn't imply memory pressure (what is the problem here)
Is the swap partition on the same physical drive as the root partition?

Ceterum censeo: if you constantly see yourself running out of RAM, get more RAM - but iirc the original problem where the freezes on copying data and that particular issue is likely down to the broken disk and it's IO blocking the USB.

Offline

#23 2021-07-04 19:04:35

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

I'm assuming you phrased the "what is the problem here" as a question, to which I say that I just want to keep my system responsive while I/O is going on, without applications being pushed out to swap by disk cache during large I/O operations. Swap is on the same drive as the root partition, here's the lsblk output incase it helps:

~ > lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda      8:0    0 223.6G  0 disk 
├─sda1   8:1    0   499M  0 part 
├─sda2   8:2    0   100M  0 part 
├─sda3   8:3    0    16M  0 part 
└─sda4   8:4    0   223G  0 part 
sdb      8:16   0 931.5G  0 disk 
├─sdb1   8:17   0   100G  0 part /
├─sdb2   8:18   0   820G  0 part /home
├─sdb3   8:19   0    10G  0 part [SWAP]
└─sdb4   8:20   0     1G  0 part /efi

I don't really run out of ram constantly, it's usually during when I run large I/O operations that the system swaps and lags. Plus I think 16GB of RAM is considered more than enough for a desktop PC by today's standards.
I did some reading over at the ArchWiki page on Swap, and found this article which describes my problems in a simple. concise manner: https://rudd-o.com/linux-and-free-softw … o-fix-that

Offline

#24 2021-07-04 19:27:38

seth
Member
Registered: 2012-09-03
Posts: 51,215

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

you phrased the "what is the problem here" as a question

Nope. Your immediate problem is the RAM pressure, causing the SWAP invocation causing the slowdown.
This is not a common pattern of file transfer.

you wrote:

it's usually during when I run large I/O operations that the system swaps and lags

your slightly younger self wrote:

I don't recall any other instances of RAM, swap and I/O problems besides the external drive

?

Offline

#25 2021-07-04 20:22:55

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

How should I go about fixing this abnormal ram pressure?

I tested whether the system actually swaps and lags during high I/O, and it did. What I did was copy a 100 GB file from one location in my home directory to another location in home. It started swapping and lagging well before having transferred even 10GB. So high I/O does result in swapping and lagging.

Offline

Board footer

Powered by FluxBB