You are not logged in.
Hi,
I have Arch linux installation on Lenovo T460p (i7-6700HQ, 16GB RAM), lightdm, Cinnamon, dicreete nvidia card used only for CUDA, Xorg running using Intel graphics.
Recently I cloned my system on a bigger SSD (500GB KINGSTON SA400S37480G) and I'm facing very unpleasant lagging now. During normal computer usage (browsing, editing files, media playback...) foreground application randomly freezes for several seconds. It hapens each lets say 15 minutes. sometimes there is a freeze of whole desktop enviroment including mouse cursor, sometimes mouse cursor moves but everything other is frozen. After cca 10-20 seconds it resumes to normal operation. I would share a journal log but there is nothing special, no errors, no warning messages, just normal log of healthy system.
The only thing I was able to log is lag in ioping. When the freeze occurs there is slow ioping interestingly almost everytime very close to 16s.
...
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8868 time=2.19 ms (fast)
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8869 time=3.87 ms (fast)
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8870 time=1.94 ms (fast)
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8871 time=16.6 s (slow)
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8872 time=585.8 us (fast)
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8873 time=4.48 ms (fast)
...
Probably the issue is connected with new SSD, but I do not have any spare one to test it. S.M.A.R.T test is passing without issues.
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.12-arch1-1-ARCH] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: KINGSTON SA400S37480G
Serial Number: 50026B76825B45AC
LU WWN Device Id: 5 0026b7 6825b45ac
Firmware Version: SBFKB1C2
User Capacity: 480103981056 bytes [480 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 T13/2161-D revision 4
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Jan 21 15:16:02 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM level is: 254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, frozen [SEC2]
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (65535) seconds.
Offline data collection
capabilities: (0x11) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 30) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate -O--CK 000 100 000 - 0
9 Power_On_Hours -O--CK 100 100 000 - 69
12 Power_Cycle_Count -O--CK 100 100 000 - 48
148 Unknown_Attribute ------ 100 100 000 - 0
149 Unknown_Attribute ------ 100 100 000 - 0
167 Unknown_Attribute ------ 100 100 000 - 0
168 Unknown_Attribute -O--C- 100 100 000 - 0
169 Unknown_Attribute ------ 100 100 000 - 11
170 Unknown_Attribute ------ 100 100 000 - 8
172 Unknown_Attribute -O--CK 100 100 000 - 0
173 Unknown_Attribute ------ 100 100 000 - 131076
181 Program_Fail_Cnt_Total -O--CK 100 100 000 - 0
182 Erase_Fail_Count_Total ------ 100 100 000 - 0
187 Reported_Uncorrect -O--CK 100 100 000 - 0
192 Power-Off_Retract_Count -O--C- 100 100 000 - 6
194 Temperature_Celsius -O---K 065 058 000 - 35 (Min/Max 16/42)
196 Reallocated_Event_Count -O--CK 100 100 000 - 0
199 UDMA_CRC_Error_Count -O--CK 100 100 000 - 0
218 Unknown_Attribute -O--CK 100 100 000 - 0
231 Temperature_Celsius ------ 001 001 000 - 99
233 Media_Wearout_Indicator -O--CK 100 100 000 - 722
241 Total_LBAs_Written -O--CK 100 100 000 - 301
242 Total_LBAs_Read -O--CK 100 100 000 - 68
244 Unknown_Attribute ------ 100 100 000 - 2
245 Unknown_Attribute ------ 100 100 000 - 4
246 Unknown_Attribute ------ 100 100 000 - 39520
246 Unknown_Attribute ------ 100 100 000 - 39520
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 51 Comprehensive SMART error log
0x03 GPL R/O 64 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x30 GPL,SL R/O 9 IDENTIFY DEVICE data log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
SMART Extended Comprehensive Error Log Version: 1 (64 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 69 -
Selective Self-tests/Logging not supported
SCT Commands not supported
Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 1) ==
0x01 0x008 4 48 --- Lifetime Power-On Resets
0x01 0x010 4 69 --- Power-on Hours
0x01 0x018 6 632369151 --- Logical Sectors Written
0x01 0x028 6 143093707 --- Logical Sectors Read
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 35 --- Current Temperature
0x05 0x020 1 42 --- Highest Temperature
0x05 0x028 1 16 --- Lowest Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x018 4 0 --- Number of Interface CRC Errors
0x07 ===== = = === == Solid State Device Statistics (rev 1) ==
0x07 0x008 1 0 --- Percentage Used Endurance Indicator
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 4 204 Transition from drive PhyRdy to drive PhyNRdy
0x000a 4 9 Device-to-host register FISes sent due to a COMRESET
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0010 2 0 R_ERR response for host-to-device data FIS, non-CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x0013 2 0 R_ERR response for host-to-device non-data FIS, non-CRC
Do anyone has any idea what is wrong here? I also noticed that the lags are much more frequent, when I enable swap partition which should be used only for hibernation (swappines set to 10).
best regards
Jan
Last edited by xhpohanka (2019-01-25 14:45:03)
Offline
Try passing "scsi_mod.use_blk_mq=0" to the kernel command line.
The swappiness does not do what you think it does (just saw the tip in the wiki, but that's BS - the partition will and will only be used when you run OOM, the swappiness just delays that state by rather dropping file caches what means those files will have to be re-read on demand)
Offline
I will try that, thanks. On the other hand from what I read disabling scheduler for SSD should lead to overal better performance, I do not understand why scheduler could bring 16s lags.
I also noticed this line in SMART output
SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
...
0x0009 4 220 Transition from drive PhyRdy to drive PhyNRdy
...
Isn't it suspicious?
best regards
Jan
Offline
https://superuser.com/questions/1027912 … -windows-7
It's not so much "disabling scheduler" but "not using multi-queue schedulers", though since they only recently became the default, experience is still rather sparse. (Since the single-queue schedulers are about to be dropped, we though better gather it fast ;-)
Otoh, the slow disk IO might be a secondary effect, ensure to check your journal/dmesg around the lags on whether there're other issues reported.
Offline
It was really some hadrware issue with new SSD. After replacing everything works as it should.
Transition from drive PhyRdy to drive PhyNRdy counter is still increasing even with new SSD.
Offline