You are not logged in.

#26 2021-07-04 20:28:43

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

You do not happen to have preserved the proc/meminfo?
Also "lsblk -f"?

That being said: lowering the vm.dirty values should help if you run OOM because you start w/ 15GB being used and then wait 1.6GB before flushing dirty pages.

Edit: also *how* do you copy and is a graphical file manager involed? Do you run a file indexer (baloo or tracker in KDE or gnome)?

Last edited by seth (2021-07-04 20:29:39)

Offline

#27 2021-07-04 20:53:45

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Actually I did record the meminfo before and during the copy operation. I used KDE's Dolphin to copy the files.

Here's the meminfo before copying:

MemTotal:       16339592 kB
MemFree:        10825668 kB
MemAvailable:   11418720 kB
Buffers:            3828 kB
Cached:          3641332 kB
SwapCached:            0 kB
Active:          1414696 kB
Inactive:        3581908 kB
Active(anon):     993628 kB
Inactive(anon):  3193316 kB
Active(file):     421068 kB
Inactive(file):   388592 kB
Unevictable:        9724 kB
Mlocked:            9724 kB
SwapTotal:      10485756 kB
SwapFree:       10485756 kB
Dirty:               140 kB
Writeback:             0 kB
AnonPages:       1361236 kB
Mapped:           516672 kB
Shmem:           2831268 kB
KReclaimable:      72756 kB
Slab:             161948 kB
SReclaimable:      72756 kB
SUnreclaim:        89192 kB
KernelStack:       10288 kB
PageTables:        22324 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18655552 kB
Committed_AS:    8495036 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       37428 kB
VmallocChunk:          0 kB
Percpu:             3600 kB
HardwareCorrupted:     0 kB
AnonHugePages:    270336 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:     14336 kB
FilePmdMapped:      4096 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     1499984 kB
DirectMap2M:    15228928 kB
DirectMap1G:           0 kB

And during:

MemTotal:       16339592 kB
MemFree:          163136 kB
MemAvailable:   13788368 kB
Buffers:           93900 kB
Cached:         14167352 kB
SwapCached:        13312 kB
Active:          1056556 kB
Inactive:       14121740 kB
Active(anon):     760400 kB
Inactive(anon):   829344 kB
Active(file):     296156 kB
Inactive(file): 13292396 kB
Unevictable:       11260 kB
Mlocked:           11260 kB
SwapTotal:      10485756 kB
SwapFree:        7858264 kB
Dirty:           5247692 kB
Writeback:          8140 kB
AnonPages:        919508 kB
Mapped:           379640 kB
Shmem:            668320 kB
KReclaimable:     374132 kB
Slab:             520988 kB
SReclaimable:     374132 kB
SUnreclaim:       146856 kB
KernelStack:       10224 kB
PageTables:        23048 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18655552 kB
Committed_AS:    8551548 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       37332 kB
VmallocChunk:          0 kB
Percpu:             3600 kB
HardwareCorrupted:     0 kB
AnonHugePages:    208896 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:      4096 kB
FilePmdMapped:      4096 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     1553232 kB
DirectMap2M:    15175680 kB
DirectMap1G:           0 kB

As for baloo and tracker, I mostly use KDE so I checked whether baloo was running :

~ > balooctl status
Baloo File Indexer is not running
Total files indexed: 6,727
Files waiting for content indexing: 0
Files failed to index: 0
Current size of index is 17.32 MiB

As for the vm.dirty values, is a soft limit of 5 and a hard limit of 25 okay?

Offline

#28 2021-07-04 21:01:31

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Almost all memory is in inactive file caches.
I assume you didn't raise vm.swappiness?

Please try from a console login (no KDE running) and use cp to rule out weird interference from that side.

Offline

#29 2021-07-04 21:08:16

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Quick question, how would I check /proc/meminfo while cp is running in tty? Do I use another tty?

Offline

#30 2021-07-04 21:13:18

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Do I use another tty?

Or tmux, but yes. Also to test the system responsiveness (though your HDD is obviously busy doing the copying, so loading stuff from there is going to be impeded)

Offline

#31 2021-07-04 21:34:07

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

I copied the same folder with the help of cp. The meminfo was taken before copying, and during copying just after it started swapping. The copying process was at around 14 GB when swapping began.

Meminfo before copying:

MemTotal:       16339592 kB
MemFree:        15364628 kB
MemAvailable:   15608232 kB
Buffers:           57412 kB
Cached:           426392 kB
SwapCached:            0 kB
Active:           566024 kB
Inactive:         246172 kB
Active(anon):     328644 kB
Inactive(anon):     3012 kB
Active(file):     237380 kB
Inactive(file):   243160 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485756 kB
SwapFree:       10485756 kB
Dirty:                68 kB
Writeback:             0 kB
AnonPages:        313252 kB
Mapped:           213008 kB
Shmem:              3248 kB
KReclaimable:      32104 kB
Slab:              66660 kB
SReclaimable:      32104 kB
SUnreclaim:        34556 kB
KernelStack:        3968 kB
PageTables:         4300 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18655552 kB
Committed_AS:    1367680 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       26744 kB
VmallocChunk:          0 kB
Percpu:             1984 kB
HardwareCorrupted:     0 kB
AnonHugePages:    180224 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      150352 kB
DirectMap2M:     5044224 kB
DirectMap1G:    11534336 kB

Meminfo during copying, shortly after swapping began:

MemTotal:       16339592 kB
MemFree:          155656 kB
MemAvailable:   15716196 kB
Buffers:           77056 kB
Cached:         15475104 kB
SwapCached:         4368 kB
Active:           142024 kB
Inactive:       15508356 kB
Active(anon):      77828 kB
Inactive(anon):    21952 kB
Active(file):      64196 kB
Inactive(file): 15486404 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485756 kB
SwapFree:       10243700 kB
Dirty:            117780 kB
Writeback:         12420 kB
AnonPages:         89564 kB
Mapped:            69148 kB
Shmem:              1716 kB
KReclaimable:     347392 kB
Slab:             410736 kB
SReclaimable:     347392 kB
SUnreclaim:        63344 kB
KernelStack:        3824 kB
PageTables:         4348 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18655552 kB
Committed_AS:    1376536 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       26648 kB
VmallocChunk:          0 kB
Percpu:             1984 kB
HardwareCorrupted:     0 kB
AnonHugePages:     20480 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      156496 kB
DirectMap2M:     5038080 kB
DirectMap1G:    11534336 kB

The inactive file pages are super large, seriously seems abnormal.

As for vm.swappiness, it's still at 60:

~ > sysctl vm.swappiness
vm.swappiness = 60

Last edited by HotDogEnemy (2021-07-04 21:39:58)

Offline

#32 2021-07-05 06:10:39

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

seth wrote:

Also "lsblk -f"?

(Basically the used filesystem)

I assume you won't have that issue w/ 100 1GB files?

Offline

#33 2021-07-05 09:35:53

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Here's the filesystem:

~ > lsblk -f
NAME   FSTYPE FSVER LABEL    UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                              
├─sda1 ntfs         Recovery C294004294003C03                                    
├─sda2 vfat   FAT32          6E01-6937                                           
├─sda3                                                                           
└─sda4 ntfs                  EA2C17E22C17A899                                    
sdb                                                                              
├─sdb1 ext4   1.0   root     0994c97d-207b-4556-876f-1142de972d3d   44.8G    49% /
├─sdb2 ext4   1.0            6bc55df8-39ab-4fd4-9f89-df85b3be7bcf  391.4G    46% /home
├─sdb3 swap   1              df75b189-88dd-4f8e-9d31-b40a0e1ffafd                [SWAP]
└─sdb4 vfat   FAT32          9E16-EE0F                            1019.8M     0% /efi

As for the 100 1GB files, I think the issue would still be the same, the folder I copied for the test was a game folder with multiple large files. But just in case, I'll test it out, I just need a command that'll make 100 1GB files in a directory to proceed.

Last edited by HotDogEnemy (2021-07-05 10:19:27)

Offline

#34 2021-07-05 11:04:42

sabroad
Member
Registered: 2015-05-24
Posts: 242

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Dirty:           5247692 kB

If the external HDD is about to die then copying large files to it will stall all disk IO as/when the kernel necessarily flushes dirty writes (and is delayed/stalled by bad/slow media). This stall will also affect any process requiring disk IO, and that includes any executables on disk.

Disk latency (and stalls) can be observed with vmstat/iostat:

vmstat 5 5
iostat -xk 5

The only way (TM) to avoid this [1] is to throttle the copy to match the failing/stalling external HDD bandwidth:

nocache rsync --bwlimit=KBPS src dst

[1] apart from buying another HDD
[2] nocache utility - https://aur.archlinux.org/packages/nocache/


--
saint_abroad

Offline

#35 2021-07-05 11:07:49

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

I just need a command that'll make 100 1GB file

for ((i=0;i<100;++i)); do fallocate -l 1G file$i; done

@sabroad, post #31 claims an issue beyond the - expectable - problem around the dying drive.

Offline

#36 2021-07-05 11:39:48

sabroad
Member
Registered: 2015-05-24
Posts: 242

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

seth wrote:

@sabroad, post #31 claims an issue beyond the - expectable - problem around the dying drive.

Inactive(file): 15486404 kB
[...]
SwapFree:       10243700 kB
Dirty:            117780 kB

As dirty grows the kernel can: drop caches; or swap pages. If write media is slow (bad media or not), the system hits this, hard:

Inactive(file): 13292396 kB
[...]
SwapFree:        7858264 kB
Dirty:           5247692 kB

5GB dirty will stall all sorts of disk IO (page/caches are shared). Swapping is unfortunate but the stalls are caused by flushing to disk - at this point kernel has not much choice but to stall on IO.

In any case, the alternative copy provided will solve both issues of: 1. drop caches (so kernel doesn't need to decide to swap); and 2. dirty buffers:

nocache rsync --bwlimit=KBPS src dst

The downside is the (explicit) tradeoff of throughput (eg. restricting to ~80MB/s copy on spinning disks) for latency (to minimise stalls/maximise interactivity). (The kernel doesn't have enough info to make this decision effectively, so it must be told.)

Last edited by sabroad (2021-07-05 11:47:08)


--
saint_abroad

Offline

#37 2021-07-05 13:12:30

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

@seth I created the 100 1GB files with the command you provided and copied it using cp in a tty after a reboot. Here's the meminfo:

Before copying:

MemTotal:       16339592 kB
MemFree:        15351856 kB
MemAvailable:   15597028 kB
Buffers:           55624 kB
Cached:           430500 kB
SwapCached:            0 kB
Active:           581416 kB
Inactive:         246036 kB
Active(anon):     341524 kB
Inactive(anon):     3036 kB
Active(file):     239892 kB
Inactive(file):   243000 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485756 kB
SwapFree:       10485756 kB
Dirty:              1396 kB
Writeback:             0 kB
AnonPages:        341328 kB
Mapped:           215760 kB
Shmem:              3232 kB
KReclaimable:      30536 kB
Slab:              64544 kB
SReclaimable:      30536 kB
SUnreclaim:        34008 kB
KernelStack:        3728 kB
PageTables:         4276 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18655552 kB
Committed_AS:    1365720 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       26436 kB
VmallocChunk:          0 kB
Percpu:             1936 kB
HardwareCorrupted:     0 kB
AnonHugePages:    245760 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      154448 kB
DirectMap2M:     5040128 kB
DirectMap1G:    11534336 kB

During copying:

MemTotal:       16339592 kB
MemFree:          147920 kB
MemAvailable:   15714816 kB
Buffers:           26804 kB
Cached:         15309648 kB
SwapCached:         5332 kB
Active:           133116 kB
Inactive:       15296984 kB
Active(anon):      65912 kB
Inactive(anon):    29452 kB
Active(file):      67204 kB
Inactive(file): 15267532 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485756 kB
SwapFree:       10222552 kB
Dirty:           6491020 kB
Writeback:         15252 kB
AnonPages:         90976 kB
Mapped:            60052 kB
Shmem:              1656 kB
KReclaimable:     569612 kB
Slab:             641076 kB
SReclaimable:     569612 kB
SUnreclaim:        71464 kB
KernelStack:        3792 kB
PageTables:         4400 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18655552 kB
Committed_AS:    1374888 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       26468 kB
VmallocChunk:          0 kB
Percpu:             1984 kB
HardwareCorrupted:     0 kB
AnonHugePages:     45056 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      156496 kB
DirectMap2M:     5038080 kB
DirectMap1G:    11534336 kB

While copying was going on, cache increased at a breakneck speed and swapped until swap stayed stable at around 257MB, as far as I could tell from htop.

@sabroad I checked out the nocache tool you mentioned. Correct me if I'm wrong, but from what I could tell it basically cleans up/prevents cache buildup during a specific command. The kernel doesn't know that we don't need the data to be cached for later use, so we have to tell it explicitly by invoking nocache?

Offline

#38 2021-07-05 13:22:46

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

[qutoe]As dirty grows the kernel can: drop caches; or swap pages. If write media is slow (bad media or not), the system hits this, hard:[/qutoe]
Yes, but the kernel should™ not overly rely on swap when there's plenty of inactive file cache to discard (and swappiness isn't too low)

The cause in the non-broken-drive scenario is likely one huge file being copied and the kernel not willing to sacrifice the opened fd…?

du -ah /path/to/game

Edit: tested a spinning drive and the dirty pages and inactive file cache stay reasonably low (<400MB and < 1.2GB)
No swapping required.

=> Better "smartctl -a /dev/sdb" as well and also "tunefs -l /dev/sdb2 # assuming that's what you ran the tests on" and ultimately "mount" for weird options. :-\

Last edited by seth (2021-07-05 13:43:31)

Offline

#39 2021-07-05 14:40:55

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

The game folder is made up of lots of small files, none of them exceed  a gigabyte, and as a result the output of du -ah is way too long for it to be posted, the website forbids it -_-;

Here's smartctl for /dev/sdb:

~ > sudo smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.12.14-zen1-1-zen] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue
Device Model:     WDC WD10EZEX-75WN4A0
Serial Number:    WD-WCC6Y2EJ59J2
LU WWN Device Id: 5 0014ee 20e2e584a
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jul  5 19:56:28 2021 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(11220) seconds.
Offline data collection
capabilities: 			(0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	(   2) minutes.
Extended self-test routine
recommended polling time: 	( 116) minutes.
Conveyance self-test routine
recommended polling time: 	(   5) minutes.
SCT capabilities: 	      (0x3035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   173   165   021    Pre-fail  Always       -       2316
  4 Start_Stop_Count        0x0032   094   094   000    Old_age   Always       -       6175
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -       10109
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   096   096   000    Old_age   Always       -       4921
192 Power-Off_Retract_Count 0x0032   199   199   000    Old_age   Always       -       1247
193 Load_Cycle_Count        0x0032   132   132   000    Old_age   Always       -       204703
194 Temperature_Celsius     0x0022   098   086   000    Old_age   Always       -       45
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
240 Head_Flying_Hours       0x0032   091   091   000    Old_age   Always       -       6616
241 Total_LBAs_Written      0x0032   200   200   000    Old_age   Always       -       33929686495
242 Total_LBAs_Read         0x0032   200   200   000    Old_age   Always       -       33788826133

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     10069         -
# 2  Extended offline    Completed without error       00%     10066         -
# 3  Short offline       Completed without error       00%      9679         -
# 4  Short offline       Completed without error       00%      3570         -
# 5  Short offline       Completed without error       00%      3561         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

And here's the tune2fs for /home ie /dev/sdb2:

~ > sudo tune2fs -l /dev/sdb2
tune2fs 1.46.2 (28-Feb-2021)
Filesystem volume name:   <none>
Last mounted on:          /home
Filesystem UUID:          6bc55df8-39ab-4fd4-9f89-df85b3be7bcf
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              53739520
Block count:              214958080
Reserved block count:     10747904
Free blocks:              87125876
Free inodes:              53146463
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      1024
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Fri Nov 27 03:48:50 2020
Last mount time:          Mon Jul  5 18:20:31 2021
Last write time:          Mon Jul  5 18:20:31 2021
Mount count:              167
Maximum mount count:      -1
Last checked:             Sat May  8 22:42:58 2021
Check interval:           0 (<none>)
Lifetime writes:          3113 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	         256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
First orphan inode:       5111825
Default directory hash:   half_md4
Directory Hash Seed:      3e8911dd-a9a2-4bd5-9613-475b4170e1f3
Journal backup:           inode blocks
Checksum type:            crc32c
Checksum:                 0x89d62f28

I'm assuming fstab would be good to check mount options so here it is:

~ > cat /etc/fstab
# Static information about the filesystems.
# See fstab(5) for details.

# <file system> <dir> <type> <options> <dump> <pass>
# /dev/sdb1
UUID=0994c97d-207b-4556-876f-1142de972d3d	/         	ext4      	rw,relatime	0 1

# /dev/sdb2
UUID=6bc55df8-39ab-4fd4-9f89-df85b3be7bcf	/home     	ext4      	rw,relatime	0 2

# /dev/sdb4
UUID=9E16-EE0F      	/efi      	vfat      	rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,utf8,errors=remount-ro	0 2

# /dev/sdb3
UUID=df75b189-88dd-4f8e-9d31-b40a0e1ffafd	none      	swap      	defaults  	0 0

Also, for your spinning drive, what kind of tests did you run?

Offline

#40 2021-07-05 15:23:29

sabroad
Member
Registered: 2015-05-24
Posts: 242

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

seth wrote:

Yes, but the kernel should™ not overly rely on swap when there's plenty of inactive file cache to discard (and swappiness isn't too low)

Kernel 5.8 changes

https://lwn.net/Articles/821105/ wrote:

code that balances between swapping and cache memory reclaim [...]
to optimize reclaim for least IO cost incurred [...]
It evicts the cold anon pages more aggressively in the presence of a thrashing cache and the absence of swapins.

I wouldn't be surprised if the stalls are caused by page-cache thrashing rather than swap thrashing, and that'd certainly explain the kernel's eviction choices.

To determine the extent of swap thrashing,

vmstat 5 5
seth wrote:

Edit: tested a spinning drive and the dirty pages and inactive file cache stay reasonably low (<400MB and < 1.2GB)
No swapping required.

Without causing large enough dirty buffers there's nothing to trigger memory pressure. So, for example, 7200RPM@120MB/s reads to 5400RPM@100MB/s writes; copying 7.2GB/m; accumulate dirty buffers ~1.2GB/m. If good media, OP is probably copying 4-5 minutes worth. I suspect you need a bigger test ;-)


--
saint_abroad

Offline

#41 2021-07-05 15:34:31

sabroad
Member
Registered: 2015-05-24
Posts: 242

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

HotDogEnemy wrote:

@sabroad I checked out the nocache tool you mentioned. Correct me if I'm wrong, but from what I could tell it basically cleans up/prevents cache buildup during a specific command. The kernel doesn't know that we don't need the data to be cached for later use, so we have to tell it explicitly by invoking nocache?

Yes: unless the kernel is told POSIX_FADV_DONTNEED, it'll keep reads around in the page cache. Moreover, memory pressure will evict old (LRU) pages - if these are executable pages, they'll need to be paged in to run again (ie. interactivity).

Can you get us the output of vmstat/iostat during the stalls please?

vmstat 5 3
iostat -xk 5 3

Last edited by sabroad (2021-07-05 16:23:35)


--
saint_abroad

Offline

#42 2021-07-05 20:24:49

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

So I ran the same test with the 100 1GB files and copying them. I let the operation run for some time until I could observe lags and poor interactivity (this was done while KDE Plasma was loaded), and after waiting a bit I took the vmstat and iostat at that moment.

Here's the vmstat:

~> vmstat 5 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 5  4 1990404 160268   7592 14736940    8   25   813  5649  595 2008  7  6 56 31  0
 0  5 2014344 161112   7640 14754152    0  157  5757 19988 1480 3558  2  7 17 75  0
 1 13 2029008 133680   7496 14778520   19   96  8668 28818 1257 2770  2  4 17 76  0

And here's the iostat:

~> iostat 5 3
Linux 5.12.14-zen1-1-zen (archlinux) 	07/06/2021 	_x86_64_	(4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.95    0.13    5.80   30.84    0.00   56.27

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.45     12.00     0.00   0.00    0.59    26.68    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.02
sdb             88.52   3226.38    65.57  42.55   25.24    36.45   43.22  22416.74    37.96  46.76   49.44   518.66    0.00      0.00     0.00   0.00    0.00     0.00    2.68   45.63    4.49  55.97


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.62    0.20    4.80   73.83    0.00   19.56

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb            207.60   8889.60    15.60   6.99   26.10    42.82   39.80  32810.40    24.60  38.20  148.83   824.38    0.00      0.00     0.00   0.00    0.00     0.00    0.40  155.50   11.40  95.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.67    0.10    3.90   86.29    0.00    8.05

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb            143.40   5536.00    25.20  14.95   52.96    38.61   49.20  25347.20    19.40  28.28  118.38   515.19    0.00      0.00     0.00   0.00    0.00     0.00    0.80  124.00   13.52  96.76

Offline

#43 2021-07-05 21:30:31

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Sanity check: behavior w/ the non-zen kernel?

Offline

#44 2021-07-06 10:32:43

sabroad
Member
Registered: 2015-05-24
Posts: 242

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.67    0.10    3.90   86.29    0.00    8.05
                                  ^ CPU has no processes runnable %time (waiting for page in)

On probability, interactivity falls within the 86% time where CPUs have no runnable tasks due to all waiting on IO (page in).

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1 13 2029008 133680   7496 14778520   19   96  8668 28818 1257 2770  2  4 17 76  0
                      not a swap storm ^        ^ some % of this are major page faults (stopping runnable)

In the absence of a swap thrashing, this strongly points to page-cache thrashing.

More metrics:

$ egrep 'workingset|pswp' /proc/vmstat
workingset_refault 313057391 # evicted pages refaulted
workingset_activate 40598936 # refaulted pages immediately activated
workingset_nodereclaim 320984
pswpin 0
pswpout 0

I'd give this a go:

nocache rsync --bwlimit=KBPS src dst

Last edited by sabroad (2021-07-06 11:04:58)


--
saint_abroad

Offline

#45 2021-07-06 12:19:46

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

And more sanity checks:

cat /sys/block/sd*/queue/scheduler
ionice cp -r /path/to/game /path/to/backup

I get high IO wait for the first three polls, but then it drops to ~30%
The size of the copied file has no impact on the behavior (unless it had to be redonkeylously huge) and there's no impact on the system performance.
:\

Offline

#46 2021-07-06 14:40:40

sabroad
Member
Registered: 2015-05-24
Posts: 242

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

seth wrote:

The size of the copied file has no impact on the behavior (unless it had to be redonkeylously huge) and there's no impact on the system performance.

I suspect, that even when copying to HDD, when the system executables are on SSD r_await will be a magnitude less (or ~5ms, enough to drive ~200fps):

Device            r/s     rkB/s   rrqm/s  %rrqm r_await
sdb            143.40   5536.00    25.20  14.95   52.96

--
saint_abroad

Offline

#47 2021-07-06 14:46:39

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

Nope, was curious enough to dust off an all HDD system ;-)
1 drive, seagate barracuda something - though r_await is indeed only 5.17…

Offline

#48 2021-07-06 19:50:00

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

@seth Non zen kernel sanity checks (100 1GB files were used for testing):

Kernel:

~ > uname -a
Linux archlinux 5.12.14-arch1-1 #1 SMP PREEMPT Thu, 01 Jul 2021 07:26:06 +0000 x86_64 GNU/Linux

Meminfo during copy:

~ > cat /proc/meminfo
MemTotal:       16341564 kB
MemFree:          156908 kB
MemAvailable:   13988416 kB
Buffers:           28540 kB
Cached:         13845876 kB
SwapCached:            0 kB
Active:           433400 kB
Inactive:       15044604 kB
Active(anon):        516 kB
Inactive(anon):  1635776 kB
Active(file):     432884 kB
Inactive(file): 13408828 kB
Unevictable:       10860 kB
Mlocked:           10860 kB
SwapTotal:      10485756 kB
SwapFree:       10485488 kB
Dirty:           2516284 kB
Writeback:         10544 kB
AnonPages:       1614488 kB
Mapped:           499620 kB
Shmem:             28500 kB
KReclaimable:     327256 kB
Slab:             415284 kB
SReclaimable:     327256 kB
SUnreclaim:        88028 kB
KernelStack:       12656 kB
PageTables:        25240 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18656536 kB
Committed_AS:    6365244 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       34816 kB
VmallocChunk:          0 kB
Percpu:             2880 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      611152 kB
DirectMap2M:    12972032 kB
DirectMap1G:     3145728 kB

Vmstat during copy:

~ > vmstat 5 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  5    268 172296  36684 14141188    0    0   256 13152  326 1702  5  5 68 22  0
 1  4    268 155616  37720 14154560    0    0   382 88567  487 1237  1  4 23 73  0
 2  4    268 152452  38020 14156556    0    0   272 117188  449 1227  1  4 32 64  0

Iostat during copy:

~ > iostat -xk 5 3
Linux 5.12.14-arch1-1 (archlinux) 	07/06/2021 	_x86_64_	(4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.73    0.07    4.88   22.83    0.00   67.49

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.18      4.77     0.00   0.00    0.73    26.39    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.01
sdb             19.29    660.67    19.68  50.49   31.96    34.25   38.02  34815.12     9.97  20.78   34.83   915.79    0.00      0.00     0.00   0.00    0.00     0.00    1.21   52.55    2.00  36.89


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          17.32    0.16   12.70   54.12    0.00   15.70

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb             32.60   1060.00    49.40  60.24   85.41    32.52   86.40  74538.40    26.00  23.13   39.76   862.71    0.00      0.00     0.00   0.00    0.00     0.00    3.20  107.56    6.56  90.14


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.12    0.11    7.91   71.10    0.00   13.77

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdb             39.40    648.00    74.80  65.50   69.08    16.45   84.60  81324.80    19.80  18.97   46.46   961.29    0.00      0.00     0.00   0.00    0.00     0.00    3.40   94.00    6.97  96.86

Gotta say, default Linux kernel avoided swap much more noticeably - I only saw 2.01MB of swap use even when the ram was full of buffers/caches. Apps that were already open retained the same interactivity as before, and the mouse cursor didn't lag. However I did notice some decrease in interactivity - KDE took longer to respond when I moved, resized or minimized windows, and loading new applications took much, much longer: loading Dolphin took almost a minute during copying, terminal prompt had noticeably more latency, etc. However once copying is done the system becomes responsive faster than when using Zen.


The scheduler for non-zen kernel:

~ > cat /sys/block/sd*/queue/scheduler
[mq-deadline] kyber bfq none
[mq-deadline] kyber bfq none

The scheduler for Zen kernel:

~ > cat /sys/block/sd*/queue/scheduler
mq-deadline kyber [bfq] none
mq-deadline kyber [bfq] none

As for ionice, it didn't give any console output but the outcome was the same as without using ionice.


@sabroad As noted above, Zen and default kernel perform differently, most likely due to the IO scheduler. So I will be including two different outputs of the egrep command after the copy test, one for each kernel.

On the default Linux kernel:

~ > egrep 'workingset|pswp' /proc/vmstat
workingset_nodes 419052
workingset_refault_anon 37
workingset_refault_file 22388
workingset_activate_anon 0
workingset_activate_file 2144
workingset_restore_anon 0
workingset_restore_file 276
workingset_nodereclaim 442951
pswpin 0
pswpout 0

On the Zen kernel:

~ > egrep 'workingset|pswp' /proc/vmstat
workingset_nodes 463785
workingset_refault_anon 64467
workingset_refault_file 177280
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 33
workingset_restore_file 454
workingset_nodereclaim 308254
pswpin 2852
pswpout 5082

Last edited by HotDogEnemy (2021-07-06 19:54:50)

Offline

#49 2021-07-06 19:56:30

seth
Member
Registered: 2012-09-03
Posts: 51,684

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

What if you move to mq-deadline on the zen kernel?
https://wiki.archlinux.org/title/Improv … schedulers

Are the differences down to only this?

Offline

#50 2021-07-06 20:43:49

HotDogEnemy
Member
Registered: 2020-11-24
Posts: 72

Re: [SOLVED] I/O operations cause lags/stutters on zen kernel

I changed the scheduler to mq-deadline using (I didn't reboot afterwards cause it seems this change isn't persistent):

# echo mq-deadline > /sys/block/sda/queue/scheduler

And verifying that it was the one in use:

~ > cat /sys/block/sdb/queue/scheduler
[mq-deadline] kyber bfq none

It didn't resolve anything, there were still lags and freezes and somehow the lags were even longer than running Zen with bfq (could just be me though). If you're interested, here's the meminfo:

~ > cat /proc/meminfo
MemTotal:       16339592 kB
MemFree:          156636 kB
MemAvailable:   14994652 kB
Buffers:           16264 kB
Cached:         14695548 kB
SwapCached:       103336 kB
Active:           341744 kB
Inactive:       14724516 kB
Active(anon):     247520 kB
Inactive(anon):   113880 kB
Active(file):      94224 kB
Inactive(file): 14610636 kB
Unevictable:        9748 kB
Mlocked:            9748 kB
SwapTotal:      10485756 kB
SwapFree:        9271364 kB
Dirty:           1724716 kB
Writeback:          8236 kB
AnonPages:        313720 kB
Mapped:           137020 kB
Shmem:              2716 kB
KReclaimable:     470608 kB
Slab:             587264 kB
SReclaimable:     470608 kB
SUnreclaim:       116656 kB
KernelStack:       12208 kB
PageTables:        22448 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18655552 kB
Committed_AS:    5505772 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       35400 kB
VmallocChunk:          0 kB
Percpu:             2912 kB
HardwareCorrupted:     0 kB
AnonHugePages:     28672 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      336720 kB
DirectMap2M:     6955008 kB
DirectMap1G:     9437184 kB

EDIT: Seems like the zen kernel is the only one swapping and freezing excessively, I tried both linux-ck and vanilla linux with both bfq and mq-deadline, and the most swap that was taken up by them during the test was 2.1MB. Applications which are already open do not suffer from lags or freezes. However the RAM still gets full of inactive file cache and apps take a substantially longer time to open (firefox took more than 1 minute to start, when it usually takes ~10 seconds) while running the test.

Last edited by HotDogEnemy (2021-07-06 22:08:07)

Offline

Board footer

Powered by FluxBB