You are not logged in.

#1 2019-01-21 14:37:08

xhpohanka
Member
Registered: 2014-10-21
Posts: 17

[Solved] Strange system lags after SSD replacement

Hi,
I have Arch linux installation on Lenovo T460p (i7-6700HQ, 16GB RAM), lightdm, Cinnamon, dicreete nvidia card used only for CUDA, Xorg running using Intel graphics.
Recently I cloned my system on a bigger SSD (500GB KINGSTON SA400S37480G) and I'm facing very unpleasant lagging now. During normal computer usage (browsing, editing files, media playback...) foreground application randomly freezes for several seconds. It hapens each lets say 15 minutes. sometimes there is a freeze of whole desktop enviroment including mouse cursor, sometimes mouse cursor moves but everything other is frozen. After cca 10-20 seconds it resumes to normal operation. I would share a journal log but there is nothing special, no errors, no warning messages, just normal log of healthy system.

The only thing I was able to log is lag in ioping. When the freeze occurs there is slow ioping interestingly almost everytime very close to 16s.

...
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8868 time=2.19 ms (fast)
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8869 time=3.87 ms (fast)
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8870 time=1.94 ms (fast)
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8871 time=16.6 s (slow)
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8872 time=585.8 us (fast)
4 KiB <<< /dev/sda4 (block device 381 GiB): request=8873 time=4.48 ms (fast)
...

Probably the issue is connected with new SSD, but I do not have any spare one to test it. S.M.A.R.T test is passing without issues.

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.12-arch1-1-ARCH] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KINGSTON SA400S37480G
Serial Number:    50026B76825B45AC
LU WWN Device Id: 5 0026b7 6825b45ac
Firmware Version: SBFKB1C2
User Capacity:    480103981056 bytes [480 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Jan 21 15:16:02 2019 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(65535) seconds.
Offline data collection
capabilities: 			 (0x11) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  30) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -O--CK   000   100   000    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    69
 12 Power_Cycle_Count       -O--CK   100   100   000    -    48
148 Unknown_Attribute       ------   100   100   000    -    0
149 Unknown_Attribute       ------   100   100   000    -    0
167 Unknown_Attribute       ------   100   100   000    -    0
168 Unknown_Attribute       -O--C-   100   100   000    -    0
169 Unknown_Attribute       ------   100   100   000    -    11
170 Unknown_Attribute       ------   100   100   000    -    8
172 Unknown_Attribute       -O--CK   100   100   000    -    0
173 Unknown_Attribute       ------   100   100   000    -    131076
181 Program_Fail_Cnt_Total  -O--CK   100   100   000    -    0
182 Erase_Fail_Count_Total  ------   100   100   000    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--C-   100   100   000    -    6
194 Temperature_Celsius     -O---K   065   058   000    -    35 (Min/Max 16/42)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   100   100   000    -    0
218 Unknown_Attribute       -O--CK   100   100   000    -    0
231 Temperature_Celsius     ------   001   001   000    -    99
233 Media_Wearout_Indicator -O--CK   100   100   000    -    722
241 Total_LBAs_Written      -O--CK   100   100   000    -    301
242 Total_LBAs_Read         -O--CK   100   100   000    -    68
244 Unknown_Attribute       ------   100   100   000    -    2
245 Unknown_Attribute       ------   100   100   000    -    4
246 Unknown_Attribute       ------   100   100   000    -    39520
246 Unknown_Attribute       ------   100   100   000    -    39520
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O     51  Comprehensive SMART error log
0x03       GPL     R/O     64  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log

SMART Extended Comprehensive Error Log Version: 1 (64 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        69         -

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              48  ---  Lifetime Power-On Resets
0x01  0x010  4              69  ---  Power-on Hours
0x01  0x018  6       632369151  ---  Logical Sectors Written
0x01  0x028  6       143093707  ---  Logical Sectors Read
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              35  ---  Current Temperature
0x05  0x020  1              42  ---  Highest Temperature
0x05  0x028  1              16  ---  Lowest Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1               0  ---  Percentage Used Endurance Indicator
                                |||_ C monitored condition met
                                ||__ D supports DSN
                                |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x0008  2            0  Device-to-host non-data FIS retries
0x0009  4          204  Transition from drive PhyRdy to drive PhyNRdy
0x000a  4            9  Device-to-host register FISes sent due to a COMRESET
0x000f  2            0  R_ERR response for host-to-device data FIS, CRC
0x0010  2            0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2            0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2            0  R_ERR response for host-to-device non-data FIS, non-CRC

Do anyone has any idea what is wrong here? I also noticed that the lags are much more frequent, when I enable swap partition which should be used only for hibernation (swappines set to 10).

best regards
Jan

Last edited by xhpohanka (2019-01-25 14:45:03)

Offline

#2 2019-01-21 16:36:56

seth
Member
Registered: 2012-09-03
Posts: 51,165

Re: [Solved] Strange system lags after SSD replacement

Try passing "scsi_mod.use_blk_mq=0" to the kernel command line.

The swappiness does not do what you think it does (just saw the tip in the wiki, but that's BS - the partition will and will only be used when you run OOM, the swappiness just delays that state by rather dropping file caches what means those files will have to be re-read on demand)

Offline

#3 2019-01-22 07:12:34

xhpohanka
Member
Registered: 2014-10-21
Posts: 17

Re: [Solved] Strange system lags after SSD replacement

I will try that, thanks. On the other hand from what I read disabling scheduler for SSD should lead to overal better performance, I do not understand why scheduler could bring 16s lags.

I also noticed this line in SMART output

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
...
0x0009  4          220  Transition from drive PhyRdy to drive PhyNRdy
...

Isn't it suspicious?

best regards
Jan

Offline

#4 2019-01-22 08:32:17

seth
Member
Registered: 2012-09-03
Posts: 51,165

Re: [Solved] Strange system lags after SSD replacement

https://superuser.com/questions/1027912 … -windows-7

It's not so much "disabling scheduler" but "not using multi-queue schedulers", though since they only recently became the default, experience is still rather sparse. (Since the single-queue schedulers are about to be dropped, we though better gather it fast ;-)

Otoh, the slow disk IO might be a secondary effect, ensure to check your journal/dmesg around the lags on whether there're other issues reported.

Offline

#5 2019-01-25 14:43:03

xhpohanka
Member
Registered: 2014-10-21
Posts: 17

Re: [Solved] Strange system lags after SSD replacement

It was really some hadrware issue with new SSD. After replacing everything works as it should.

Transition from drive PhyRdy to drive PhyNRdy counter is still increasing even with new SSD.

Offline

Board footer

Powered by FluxBB