Arch freezing for 10 seconds few times per day

Martinsos · 2015-08-27 11:50:28

Hi everybody,
I have HP Omen with Archlinux 4.0.7-2.
HP Omen specs:
- i7-4710HQ
- Nvidia GeForce GTX 860 and Intel® HD Graphics 4600
- 16GB RAM
- 256GB SSD
I am using XFCE as a desktop environment.

Since I installed Arch, which has been few months ago, I constantly have the following problem: sometimes (few times during the day) my machine freezes for about 10 seconds. Sometimes it happens more often, sometimes less often (it is not connected to load or what am I doing).
I can move the cursor, and my Conky is still working, but I can not click anything or type anything. When freeze ends, all letters that I typed appear at once, like they were queued.
As I said, Conky is working, and I can see on it that system load grows a lot during that time and falls immediately after. However, there is no extra load on CPU or RAM, but there is on I/O.
I am guessing that some process is killing the machine at that moment and blocking everything else, but I have no idea how to find more details about what is happening during the freeze, and how to fix it.
I also have Win8 installed along the Arch and this problem is not happening there, so I do not think it is hardware problem.
Let me know if you need some more information, and thank you in advance for advices!

Last edited by Martinsos (2015-08-27 11:51:06)

berbae · 2015-08-27 15:08:42

This seems to be systemd timers running some processes or crontab jobs.
Look into the log files (journalctl and cron log files) to see if something is reported when the freezes occur.

ewaller · 2015-08-27 15:14:27

Which kernel are you using? I have had all kinds of problems with the 4.1.x kernel on my HP Envy relating to ACPI functions blocking during periods of high loads. Things are much better with the release candidates of 4.2, and are perfect with the Arch lts kernel. I might suggest installing the lts kernel along side the mainline kernel, adding the lts to your boot config, and trying the lts kernel.

Martinsos · 2015-08-28 08:43:33

Thank you @berbae, I will observe journal and see what is happening!
@ewaller - I am using 4.0.7-2 as stated above. Interesting problem that you had, but I am not sure it is connected to my problem, since to me it seems that this high load should not be happening at all. If I cause a high load by running computations, there are no problems. But this high loads that happen randomly do cause freezing, and they are not intensive on CPU or RAM, but on I/O instead.

Martinsos · 2015-08-28 08:48:13

Aha, I got the logs!
Here is the output:

Aug 28 10:39:49 OmenArch kernel: ata5.00: exception Emask 0x0 SAct 0x5fffffff SErr 0x0 action 0x6 frozen
Aug 28 10:39:49 OmenArch kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Aug 28 10:39:49 OmenArch kernel: ata5.00: cmd 61/08:00:b8:b4:33/00:00:00:00:00/40 tag 0 ncq 4096 out
                                          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 28 10:39:49 OmenArch kernel: ata5.00: status: { DRDY }
Aug 28 10:39:49 OmenArch kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Aug 28 10:39:49 OmenArch kernel: ata5.00: cmd 61/08:08:68:6e:48/00:00:02:00:00/40 tag 1 ncq 4096 out
                                          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
the last message repeats many times ...

Aug 28 10:39:49 OmenArch kernel: ata5: hard resetting link
Aug 28 10:39:49 OmenArch kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Aug 28 10:39:49 OmenArch kernel: ata5.00: configured for UDMA/100
Aug 28 10:39:49 OmenArch kernel: ata5.00: device reported invalid CHS sector 0
the last message repeats many times...

Aug 28 10:39:49 OmenArch kernel: ata5: EH complete

Seems like something is wrong with my SSD. I guess now the question is if it is hardware problem or I misconfigured something.

Here is the output of smartctl, but it does not show any errors:

~$ sudo smartctl -a /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.0.7-2-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SanDisk SD6PP4M-256G-1006
Serial Number:    143794401166
LU WWN Device Id: 5 001b44 c86656b8e
Firmware Version: A200806
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      Unknown (0x0015)
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Aug 28 10:52:10 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			 (0x53) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  21) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   002    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   098   098   ---    Old_age   Always       -       1291
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       450
170 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
171 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0
173 Unknown_Attribute       0x0033   100   100   005    Pre-fail  Always       -       25772752897
174 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       17
183 Runtime_Bad_Block       0x0032   253   253   ---    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
188 Command_Timeout         0x0032   082   002   ---    Old_age   Always       -       3316
190 Airflow_Temperature_Cel 0x0022   054   029   014    Old_age   Always       -       46
196 Reallocated_Event_Count 0x0032   100   100   ---    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   ---    Old_age   Always       -       0
243 Unknown_Attribute       0x0032   100   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       711         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Last edited by Martinsos (2015-08-28 08:58:21)

Martinsos · 2015-08-31 08:54:37

I am stuck here, as it seems that there are no problems with SSD (on windows all works fine)! Do you have any ideas what to do, where to look?

mrlamud · 2015-08-31 11:46:21

Could you give me your "fstab" output?

Head_on_a_Stick · 2015-08-31 11:49:25

Potential power supply or cable problems, perhaps?
http://superuser.com/questions/438998/i … rive-dying

Or maybe a firmware update is called for: https://bbs.archlinux.org/viewtopic.php?id=168530

You really should at least try the LTS kernel; it will only take a few moments to set it up.

frank604 · 2015-08-31 14:58:32

I used to have a similar issue that was caused by TLP. I had to edit /usr/bin/tlp and comment out the line

set_sata_link_power $1

Martinsos · 2015-09-01 16:48:16

@mrlamud Here is the fstab output:

# <file system>	<dir>	<type>	<options>	<dump>	<pass>
# /dev/sda3
UUID=2e35395e-b260-4122-9490-e7d43bd5f020	/         	ext4      	rw,relatime,data=ordered,discard	0 1

# /dev/sda1
UUID=D293-0A10      	/boot     	vfat      	rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro	0 2

More info about my disk:

Device Model:     SanDisk SD6PP4M-256G-1006
Serial Number:    143794401166
LU WWN Device Id: 5 001b44 c86656b8e
Firmware Version: A200806
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      Unknown (0x0015)
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Sep  1 18:40:01 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

@Head_on_a_Stick Interesting theory about the power supply, but it does not happen on Win8, so I doubt it is a hardware issue. I will explore this option if all other options fail, but will leave it for the last as I do not want to open the laptop and fiddle with it if not needed.
Both firmware update and LTS kernel sound reasonable, I will try those (probably start with kernel)! Why do you think LTS kernel will help, because it is more stable?

@frank604 thank you for your suggestion, but I do not have tlp installed!

mrlamud · 2015-09-02 02:51:15

Martinsos wrote:

@mrlamud Here is the fstab output:

# <file system>	<dir>	<type>	<options>	<dump>	<pass>
# /dev/sda3
UUID=2e35395e-b260-4122-9490-e7d43bd5f020	/         	ext4      	rw,relatime,data=ordered,discard	0 1

# /dev/sda1
UUID=D293-0A10      	/boot     	vfat      	rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro	0 2

More info about my disk:

Device Model:     SanDisk SD6PP4M-256G-1006
Serial Number:    143794401166
LU WWN Device Id: 5 001b44 c86656b8e
Firmware Version: A200806
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      Unknown (0x0015)
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Sep  1 18:40:01 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Please try omitting "discard" flag option on every partitions of your SSD .
Mount them without "discard" flag then reboot and see if your system was OK. I'll explain later if it works.

Don't worry about "trim" at this experimental. If your system is OK, then we can try other method regarding "trim".

Martinsos · 2015-09-03 10:17:23

@mrlamud:
Interesting idea: why do you think discard is causing this problem? it should happen only when deleting files if I am correct, and I am surprised it could cause so much trouble -> it should not make disk unresponsive.
Here is support for TRIM that my SSD has:

*	Data Set Management TRIM supported (limit 16 blocks)
*	Deterministic read ZEROs after TRIM

I will try removing discard and let you know if problems still keep occurring. If they do not, I guess I will have to set up fstrim as cronjob? But then I may experience the problems again when it will be executing?

Martinsos · 2015-09-03 10:57:32

@mrlamud Unfortunately, the problem still persists. I removed discard option, and at one moment, when I saved the file in emacs, everything was blocked again, with the same errors in journalctl.

mrlamud · 2015-09-03 12:03:59

Martinsos wrote:

@mrlamud Unfortunately, the problem still persists. I removed discard option, and at one moment, when I saved the file in emacs, everything was blocked again, with the same errors in journalctl.

I'm sorry to hear that my advice didn't solve your problem.
At first, I thought that your problem is similar to mine (I'm on Samsung SSD 850 Pro) as I saw your journal which is very similar to mine.

Short version:

[    0.834312] ata1: SATA max UDMA/133 abar m2048@0xf7c39000 port 0xf7c39100 irq 30
[    1.152745] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.154732] ata1.00: supports DRM functions and may not be fully accessible
[    1.154868] ata1.00: ATA-9: Samsung SSD 850 PRO 128GB, EXM02B6Q, max UDMA/133
[    1.154870] ata1.00: 250069680 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    1.155207] ata1.00: supports DRM functions and may not be fully accessible
[    1.155355] ata1.00: configured for UDMA/133
[ 3160.294856] ata1.00: exception Emask 0x0 SAct 0x3000 SErr 0x0 action 0x6 frozen
[ 3160.294866] ata1.00: failed command: WRITE FPDMA QUEUED

Long version:
http://pastebin.com/cn01wcP4

Windows is fine with this SSD but not linux. My problem solved by omitting "discard" mount flag and using trim via cronie and it works perfectly till now.

This is present output using dmesg for SSD 850.

[lamud@archbox ~]$ dmesg | grep "ata1"
[    0.845026] ata1: SATA max UDMA/133 abar m2048@0xf7c39000 port 0xf7c39100 irq 27
[    1.149620] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.151567] ata1.00: supports DRM functions and may not be fully accessible
[    1.151668] ata1.00: disabling queued TRIM support
[    1.151670] ata1.00: ATA-9: Samsung SSD 850 PRO 128GB, EXM02B6Q, max UDMA/133
[    1.151671] ata1.00: 250069680 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    1.151985] ata1.00: supports DRM functions and may not be fully accessible
[    1.152051] ata1.00: disabling queued TRIM support
[    1.152100] ata1.00: configured for UDMA/133
[   28.625105] ata1.00: supports DRM functions and may not be fully accessible
[   28.625167] ata1.00: disabling queued TRIM support
[   28.625360] ata1.00: supports DRM functions and may not be fully accessible
[   28.625422] ata1.00: disabling queued TRIM support
[   28.625470] ata1.00: configured for UDMA/133
[   28.625471] ata1: EH complete

I wish you can find the problem and fix it soon.
Also, don't forget to share your solution here.

Last edited by mrlamud (2015-09-03 12:05:30)

technolog · 2015-09-03 12:16:54

Does your desktop freeze anytime a popup appears?

That is my exactly my problem.

Martinsos · 2015-09-08 21:12:20

@technolog - no, I get the freeze "randomly" - meaning I have no idea what is causing it.
@ewaller I tried the lts kernel, but the problem is still here!
I am running out of options - I can still try to update firmware, and I will do some more testing on windows to be completely sure that I do not have any problems there (I use them rarely so I may have missed it).
@mrlmaud I am glad that solution with removing discard works for you, and will certainly post solution when I find it.

Last edited by Martinsos (2015-09-08 21:13:15)

Godofgrunts · 2015-09-09 15:29:57

https://bugs.launchpad.net/ubuntu/+sour … bug/550559

Seems to be an issue with SATA3 ports. Can you plug your drive into a SATA2 port and see if that fixes it?

frank604 · 2015-09-09 18:30:23

Martinsos wrote:

Aha, I got the logs!
Here is the output:

Aug 28 10:39:49 OmenArch kernel: ata5.00: exception Emask 0x0 SAct 0x5fffffff SErr 0x0 action 0x6 frozen
Aug 28 10:39:49 OmenArch kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Aug 28 10:39:49 OmenArch kernel: ata5.00: cmd 61/08:00:b8:b4:33/00:00:00:00:00/40 tag 0 ncq 4096 out
                                          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 28 10:39:49 OmenArch kernel: ata5.00: status: { DRDY }
Aug 28 10:39:49 OmenArch kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Aug 28 10:39:49 OmenArch kernel: ata5.00: cmd 61/08:08:68:6e:48/00:00:02:00:00/40 tag 1 ncq 4096 out
                                          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
the last message repeats many times ...

Aug 28 10:39:49 OmenArch kernel: ata5: hard resetting link
Aug 28 10:39:49 OmenArch kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Aug 28 10:39:49 OmenArch kernel: ata5.00: configured for UDMA/100
Aug 28 10:39:49 OmenArch kernel: ata5.00: device reported invalid CHS sector 0
the last message repeats many times...

Aug 28 10:39:49 OmenArch kernel: ata5: EH complete

I wrote about TLP in hopes that the issue might be from a powersaving configuration. It doesn't have to be TLP but any powersaving configuration. Which powersaving methods/tools have you used? Look into the sata link power / ALPM power levels.

Here's a quote to better explain about ALPM:

Aggressive Link Power Management (ALPM) is a mechanism where a SATA AHCI controller can put the SATA link that connects to the disk into a very low power mode during periods of zero I/O activity and into an active power state when work needs to be done. Tests show that this can save around 0.5-1.5 Watts of power on a typical system.

ALPM is now available in several SATA controllers that use the Advanced Host Controller Interface (AHCI). However, there is some anecdotal evidence that some controllers may go into a low power state incorrectly and this ends up causing data loss.

Now I may as well be way off but my gut feelings based on the above dmesg errors and my own issues from the past really compel me to write more on this topic for you.

Martinsos · 2015-09-15 12:49:28

@frank604 thank you for sharing your idea with power management. It seems I do not have any special power management, but run

 hdparm -B /dev/sda

and got

 APM_level	= 128

-> that seems to be the most battery conserving mode, I could maybe try setting it to 255.

I did have some progress in the meantime - I removed discard flag and introduced periodical trimming with fstrim (I did this before but it did not help) and replaced relatime flag with noatime flag. After this, my machine was not freezing any more! I was tracking journalctl and occassionaly error would still happen, but with much less messages printed and not really affecting my work -> I have a feeling that I did not really remove the cause of problem, but am just triggering it less often (which is also an improvement ).

That lasted for a day or two, and now another error seems to appear: again my machine is freezing for some number of seconds (about 10 to 20), with this message:

Sep 15 14:34:33 OmenArch kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 15 14:34:33 OmenArch kernel: ata5.00: failed command: FLUSH CACHE EXT
Sep 15 14:34:33 OmenArch kernel: ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 24
                                          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 15 14:34:33 OmenArch kernel: ata5.00: status: { DRDY }
Sep 15 14:34:33 OmenArch kernel: ata5: hard resetting link
Sep 15 14:34:33 OmenArch kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 15 14:34:33 OmenArch kernel: ata5.00: configured for UDMA/100
Sep 15 14:34:33 OmenArch kernel: ata5.00: retrying FLUSH 0xea Emask 0x4
Sep 15 14:34:33 OmenArch kernel: ata5.00: device reported invalid CHS sector 0
Sep 15 14:34:33 OmenArch kernel: ata5: EH complete

I found some information on similar error message here https://bbs.archlinux.org/viewtopic.php … 6#p1562296 -> maybe I should upgrade my firmware, at least it seems so from here.

Last edited by Martinsos (2015-09-15 12:51:50)

dice · 2015-09-15 14:34:03

Some SSD's firmware does not work well with NCQ on linux. https://wiki.archlinux.org/index.php/So … NCQ_errors
The NCQ errors in those cases most often occur during queued trim ('discard' flag). Now you are trimming not that often and the errors do not appear as often. Just a theory, but you could try disabling NCQ and see if the errors go away.
I had this too with my 840EVO and the 850 are known to have problems as well.
In recent kernels Samsung's 8xx series seems to be blacklisted for those operations by default. Don't know if SanDisk devices are on that list too.

Last edited by dice (2015-09-15 14:37:49)

Arch Linux

#1 2015-08-27 11:50:28

Arch freezing for 10 seconds few times per day

#2 2015-08-27 15:08:42

Re: Arch freezing for 10 seconds few times per day

#3 2015-08-27 15:14:27

Re: Arch freezing for 10 seconds few times per day

#4 2015-08-28 08:43:33

Re: Arch freezing for 10 seconds few times per day

#5 2015-08-28 08:48:13

Re: Arch freezing for 10 seconds few times per day

#6 2015-08-31 08:54:37

Re: Arch freezing for 10 seconds few times per day

#7 2015-08-31 11:46:21

Re: Arch freezing for 10 seconds few times per day

#8 2015-08-31 11:49:25

Re: Arch freezing for 10 seconds few times per day

#9 2015-08-31 14:58:32

Re: Arch freezing for 10 seconds few times per day

#10 2015-09-01 16:48:16

Re: Arch freezing for 10 seconds few times per day

#11 2015-09-02 02:51:15

Re: Arch freezing for 10 seconds few times per day

#12 2015-09-03 10:17:23

Re: Arch freezing for 10 seconds few times per day

#13 2015-09-03 10:57:32

Re: Arch freezing for 10 seconds few times per day

#14 2015-09-03 12:03:59

Re: Arch freezing for 10 seconds few times per day

#15 2015-09-03 12:16:54

Re: Arch freezing for 10 seconds few times per day

#16 2015-09-08 21:12:20

Re: Arch freezing for 10 seconds few times per day

#17 2015-09-09 15:29:57

Re: Arch freezing for 10 seconds few times per day

#18 2015-09-09 18:30:23

Re: Arch freezing for 10 seconds few times per day

#19 2015-09-15 12:49:28

Re: Arch freezing for 10 seconds few times per day

#20 2015-09-15 14:34:03

Re: Arch freezing for 10 seconds few times per day

Board footer