You are not logged in.
Hi everybody,
I have HP Omen with Archlinux 4.0.7-2.
HP Omen specs:
- i7-4710HQ
- Nvidia GeForce GTX 860 and Intel® HD Graphics 4600
- 16GB RAM
- 256GB SSD
I am using XFCE as a desktop environment.
Since I installed Arch, which has been few months ago, I constantly have the following problem: sometimes (few times during the day) my machine freezes for about 10 seconds. Sometimes it happens more often, sometimes less often (it is not connected to load or what am I doing).
I can move the cursor, and my Conky is still working, but I can not click anything or type anything. When freeze ends, all letters that I typed appear at once, like they were queued.
As I said, Conky is working, and I can see on it that system load grows a lot during that time and falls immediately after. However, there is no extra load on CPU or RAM, but there is on I/O.
I am guessing that some process is killing the machine at that moment and blocking everything else, but I have no idea how to find more details about what is happening during the freeze, and how to fix it.
I also have Win8 installed along the Arch and this problem is not happening there, so I do not think it is hardware problem.
Let me know if you need some more information, and thank you in advance for advices!
Last edited by Martinsos (2015-08-27 11:51:06)
Offline
This seems to be systemd timers running some processes or crontab jobs.
Look into the log files (journalctl and cron log files) to see if something is reported when the freezes occur.
Offline
Which kernel are you using? I have had all kinds of problems with the 4.1.x kernel on my HP Envy relating to ACPI functions blocking during periods of high loads. Things are much better with the release candidates of 4.2, and are perfect with the Arch lts kernel. I might suggest installing the lts kernel along side the mainline kernel, adding the lts to your boot config, and trying the lts kernel.
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way
Online
Thank you @berbae, I will observe journal and see what is happening!
@ewaller - I am using 4.0.7-2 as stated above. Interesting problem that you had, but I am not sure it is connected to my problem, since to me it seems that this high load should not be happening at all. If I cause a high load by running computations, there are no problems. But this high loads that happen randomly do cause freezing, and they are not intensive on CPU or RAM, but on I/O instead.
Offline
Aha, I got the logs!
Here is the output:
Aug 28 10:39:49 OmenArch kernel: ata5.00: exception Emask 0x0 SAct 0x5fffffff SErr 0x0 action 0x6 frozen
Aug 28 10:39:49 OmenArch kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Aug 28 10:39:49 OmenArch kernel: ata5.00: cmd 61/08:00:b8:b4:33/00:00:00:00:00/40 tag 0 ncq 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 28 10:39:49 OmenArch kernel: ata5.00: status: { DRDY }
Aug 28 10:39:49 OmenArch kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Aug 28 10:39:49 OmenArch kernel: ata5.00: cmd 61/08:08:68:6e:48/00:00:02:00:00/40 tag 1 ncq 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
the last message repeats many times ...
Aug 28 10:39:49 OmenArch kernel: ata5: hard resetting link
Aug 28 10:39:49 OmenArch kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Aug 28 10:39:49 OmenArch kernel: ata5.00: configured for UDMA/100
Aug 28 10:39:49 OmenArch kernel: ata5.00: device reported invalid CHS sector 0
the last message repeats many times...
Aug 28 10:39:49 OmenArch kernel: ata5: EH complete
Seems like something is wrong with my SSD. I guess now the question is if it is hardware problem or I misconfigured something.
Here is the output of smartctl, but it does not show any errors:
~$ sudo smartctl -a /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.0.7-2-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: SanDisk SD6PP4M-256G-1006
Serial Number: 143794401166
LU WWN Device Id: 5 001b44 c86656b8e
Firmware Version: A200806
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: Unknown (0x0015)
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Aug 28 10:52:10 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x53) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 21) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 002 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
9 Power_On_Hours 0x0032 098 098 --- Old_age Always - 1291
12 Power_Cycle_Count 0x0032 100 100 --- Old_age Always - 450
170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 0
171 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0
173 Unknown_Attribute 0x0033 100 100 005 Pre-fail Always - 25772752897
174 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 17
183 Runtime_Bad_Block 0x0032 253 253 --- Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 097 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 --- Old_age Always - 0
188 Command_Timeout 0x0032 082 002 --- Old_age Always - 3316
190 Airflow_Temperature_Cel 0x0022 054 029 014 Old_age Always - 46
196 Reallocated_Event_Count 0x0032 100 100 --- Old_age Always - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 --- Old_age Always - 0
243 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 711 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Last edited by Martinsos (2015-08-28 08:58:21)
Offline
I am stuck here, as it seems that there are no problems with SSD (on windows all works fine)! Do you have any ideas what to do, where to look?
Offline
Could you give me your "fstab" output?
Offline
Potential power supply or cable problems, perhaps?
http://superuser.com/questions/438998/i … rive-dying
Or maybe a firmware update is called for: https://bbs.archlinux.org/viewtopic.php?id=168530
You really should at least try the LTS kernel; it will only take a few moments to set it up.
Offline
I used to have a similar issue that was caused by TLP. I had to edit /usr/bin/tlp and comment out the line
set_sata_link_power $1
Offline
@mrlamud Here is the fstab output:
# <file system> <dir> <type> <options> <dump> <pass>
# /dev/sda3
UUID=2e35395e-b260-4122-9490-e7d43bd5f020 / ext4 rw,relatime,data=ordered,discard 0 1
# /dev/sda1
UUID=D293-0A10 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro 0 2
More info about my disk:
Device Model: SanDisk SD6PP4M-256G-1006
Serial Number: 143794401166
LU WWN Device Id: 5 001b44 c86656b8e
Firmware Version: A200806
User Capacity: 256,060,514,304 bytes [256 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: Unknown (0x0015)
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue Sep 1 18:40:01 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
@Head_on_a_Stick Interesting theory about the power supply, but it does not happen on Win8, so I doubt it is a hardware issue. I will explore this option if all other options fail, but will leave it for the last as I do not want to open the laptop and fiddle with it if not needed.
Both firmware update and LTS kernel sound reasonable, I will try those (probably start with kernel)! Why do you think LTS kernel will help, because it is more stable?
@frank604 thank you for your suggestion, but I do not have tlp installed!
Offline
@mrlamud Here is the fstab output:
# <file system> <dir> <type> <options> <dump> <pass> # /dev/sda3 UUID=2e35395e-b260-4122-9490-e7d43bd5f020 / ext4 rw,relatime,data=ordered,discard 0 1 # /dev/sda1 UUID=D293-0A10 /boot vfat rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro 0 2
More info about my disk:
Device Model: SanDisk SD6PP4M-256G-1006 Serial Number: 143794401166 LU WWN Device Id: 5 001b44 c86656b8e Firmware Version: A200806 User Capacity: 256,060,514,304 bytes [256 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: Unknown (0x0015) Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Sep 1 18:40:01 2015 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled
Please try omitting "discard" flag option on every partitions of your SSD .
Mount them without "discard" flag then reboot and see if your system was OK. I'll explain later if it works.
Don't worry about "trim" at this experimental. If your system is OK, then we can try other method regarding "trim".
Offline
@mrlamud:
Interesting idea: why do you think discard is causing this problem? it should happen only when deleting files if I am correct, and I am surprised it could cause so much trouble -> it should not make disk unresponsive.
Here is support for TRIM that my SSD has:
* Data Set Management TRIM supported (limit 16 blocks)
* Deterministic read ZEROs after TRIM
I will try removing discard and let you know if problems still keep occurring. If they do not, I guess I will have to set up fstrim as cronjob? But then I may experience the problems again when it will be executing?
Offline
@mrlamud Unfortunately, the problem still persists. I removed discard option, and at one moment, when I saved the file in emacs, everything was blocked again, with the same errors in journalctl.
Offline
@mrlamud Unfortunately, the problem still persists. I removed discard option, and at one moment, when I saved the file in emacs, everything was blocked again, with the same errors in journalctl.
I'm sorry to hear that my advice didn't solve your problem.
At first, I thought that your problem is similar to mine (I'm on Samsung SSD 850 Pro) as I saw your journal which is very similar to mine.
Short version:
[ 0.834312] ata1: SATA max UDMA/133 abar m2048@0xf7c39000 port 0xf7c39100 irq 30
[ 1.152745] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.154732] ata1.00: supports DRM functions and may not be fully accessible
[ 1.154868] ata1.00: ATA-9: Samsung SSD 850 PRO 128GB, EXM02B6Q, max UDMA/133
[ 1.154870] ata1.00: 250069680 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.155207] ata1.00: supports DRM functions and may not be fully accessible
[ 1.155355] ata1.00: configured for UDMA/133
[ 3160.294856] ata1.00: exception Emask 0x0 SAct 0x3000 SErr 0x0 action 0x6 frozen
[ 3160.294866] ata1.00: failed command: WRITE FPDMA QUEUED
Long version:
http://pastebin.com/cn01wcP4
Windows is fine with this SSD but not linux. My problem solved by omitting "discard" mount flag and using trim via cronie and it works perfectly till now.
This is present output using dmesg for SSD 850.
[lamud@archbox ~]$ dmesg | grep "ata1"
[ 0.845026] ata1: SATA max UDMA/133 abar m2048@0xf7c39000 port 0xf7c39100 irq 27
[ 1.149620] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.151567] ata1.00: supports DRM functions and may not be fully accessible
[ 1.151668] ata1.00: disabling queued TRIM support
[ 1.151670] ata1.00: ATA-9: Samsung SSD 850 PRO 128GB, EXM02B6Q, max UDMA/133
[ 1.151671] ata1.00: 250069680 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.151985] ata1.00: supports DRM functions and may not be fully accessible
[ 1.152051] ata1.00: disabling queued TRIM support
[ 1.152100] ata1.00: configured for UDMA/133
[ 28.625105] ata1.00: supports DRM functions and may not be fully accessible
[ 28.625167] ata1.00: disabling queued TRIM support
[ 28.625360] ata1.00: supports DRM functions and may not be fully accessible
[ 28.625422] ata1.00: disabling queued TRIM support
[ 28.625470] ata1.00: configured for UDMA/133
[ 28.625471] ata1: EH complete
I wish you can find the problem and fix it soon.
Also, don't forget to share your solution here.
Last edited by mrlamud (2015-09-03 12:05:30)
Offline
Does your desktop freeze anytime a popup appears?
That is my exactly my problem.
Offline
@technolog - no, I get the freeze "randomly" - meaning I have no idea what is causing it.
@ewaller I tried the lts kernel, but the problem is still here!
I am running out of options - I can still try to update firmware, and I will do some more testing on windows to be completely sure that I do not have any problems there (I use them rarely so I may have missed it).
@mrlmaud I am glad that solution with removing discard works for you, and will certainly post solution when I find it.
Last edited by Martinsos (2015-09-08 21:13:15)
Offline
https://bugs.launchpad.net/ubuntu/+sour … bug/550559
Seems to be an issue with SATA3 ports. Can you plug your drive into a SATA2 port and see if that fixes it?
Offline
Aha, I got the logs!
Here is the output:Aug 28 10:39:49 OmenArch kernel: ata5.00: exception Emask 0x0 SAct 0x5fffffff SErr 0x0 action 0x6 frozen Aug 28 10:39:49 OmenArch kernel: ata5.00: failed command: WRITE FPDMA QUEUED Aug 28 10:39:49 OmenArch kernel: ata5.00: cmd 61/08:00:b8:b4:33/00:00:00:00:00/40 tag 0 ncq 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Aug 28 10:39:49 OmenArch kernel: ata5.00: status: { DRDY } Aug 28 10:39:49 OmenArch kernel: ata5.00: failed command: WRITE FPDMA QUEUED Aug 28 10:39:49 OmenArch kernel: ata5.00: cmd 61/08:08:68:6e:48/00:00:02:00:00/40 tag 1 ncq 4096 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) the last message repeats many times ... Aug 28 10:39:49 OmenArch kernel: ata5: hard resetting link Aug 28 10:39:49 OmenArch kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300) Aug 28 10:39:49 OmenArch kernel: ata5.00: configured for UDMA/100 Aug 28 10:39:49 OmenArch kernel: ata5.00: device reported invalid CHS sector 0 the last message repeats many times... Aug 28 10:39:49 OmenArch kernel: ata5: EH complete
I wrote about TLP in hopes that the issue might be from a powersaving configuration. It doesn't have to be TLP but any powersaving configuration. Which powersaving methods/tools have you used? Look into the sata link power / ALPM power levels.
Here's a quote to better explain about ALPM:
Aggressive Link Power Management (ALPM) is a mechanism where a SATA AHCI controller can put the SATA link that connects to the disk into a very low power mode during periods of zero I/O activity and into an active power state when work needs to be done. Tests show that this can save around 0.5-1.5 Watts of power on a typical system.
ALPM is now available in several SATA controllers that use the Advanced Host Controller Interface (AHCI). However, there is some anecdotal evidence that some controllers may go into a low power state incorrectly and this ends up causing data loss.
Now I may as well be way off but my gut feelings based on the above dmesg errors and my own issues from the past really compel me to write more on this topic for you.
Offline
@frank604 thank you for sharing your idea with power management. It seems I do not have any special power management, but run
hdparm -B /dev/sda
and got
APM_level = 128
-> that seems to be the most battery conserving mode, I could maybe try setting it to 255.
I did have some progress in the meantime - I removed discard flag and introduced periodical trimming with fstrim (I did this before but it did not help) and replaced relatime flag with noatime flag. After this, my machine was not freezing any more! I was tracking journalctl and occassionaly error would still happen, but with much less messages printed and not really affecting my work -> I have a feeling that I did not really remove the cause of problem, but am just triggering it less often (which is also an improvement ).
That lasted for a day or two, and now another error seems to appear: again my machine is freezing for some number of seconds (about 10 to 20), with this message:
Sep 15 14:34:33 OmenArch kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Sep 15 14:34:33 OmenArch kernel: ata5.00: failed command: FLUSH CACHE EXT
Sep 15 14:34:33 OmenArch kernel: ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 24
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 15 14:34:33 OmenArch kernel: ata5.00: status: { DRDY }
Sep 15 14:34:33 OmenArch kernel: ata5: hard resetting link
Sep 15 14:34:33 OmenArch kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 15 14:34:33 OmenArch kernel: ata5.00: configured for UDMA/100
Sep 15 14:34:33 OmenArch kernel: ata5.00: retrying FLUSH 0xea Emask 0x4
Sep 15 14:34:33 OmenArch kernel: ata5.00: device reported invalid CHS sector 0
Sep 15 14:34:33 OmenArch kernel: ata5: EH complete
I found some information on similar error message here https://bbs.archlinux.org/viewtopic.php … 6#p1562296 -> maybe I should upgrade my firmware, at least it seems so from here.
Last edited by Martinsos (2015-09-15 12:51:50)
Offline
Some SSD's firmware does not work well with NCQ on linux. https://wiki.archlinux.org/index.php/So … NCQ_errors
The NCQ errors in those cases most often occur during queued trim ('discard' flag). Now you are trimming not that often and the errors do not appear as often. Just a theory, but you could try disabling NCQ and see if the errors go away.
I had this too with my 840EVO and the 850 are known to have problems as well.
In recent kernels Samsung's 8xx series seems to be blacklisted for those operations by default. Don't know if SanDisk devices are on that list too.
Last edited by dice (2015-09-15 14:37:49)
I put at button on it. Yes. I wish to press it, but I'm not sure what will happen if I do. (Gune | Titan A.E.)
Offline