You are not logged in.
My computer is kinda new I build it like few weeks ago. Everything seems to work quite okay but there is an quite annoiying issue. Once a while when my computer has being on for like few hours it hangs totally. I am using i3wm most of the time but I doubt that is the problem. I might be just writing something small by my Vim and then computer hangs and the mouse, keyboard are unresponsive also sometimes the screen just goes off.. and I have to reboot computer losing everything I have done earlier.
Just wonderin what would be good troubleshooting ways for this issue. Any tips?
My build is:
GPU: Nvidia GTX1070
CPU: AMD Ryzen 5 1600X
Kernel: 4.11.2-1-ARCH 4.11.2-1-ARCH
Xorg-server: 1.19.3
Nvidia: 381.22-2
RAM: 16gb Ripjaws ddr4
Offline
Read your journal for a hung session. Run top while you are working to monitor your system. Check your memory.
Offline
Try to switch VT, to ping (hello ssh) the hung machine and ultimately https://fedoraproject.org/wiki/QA/Sysrq
Offline
What is the output of
journalctl -p err..alert
Last edited by nasser (2017-06-26 18:36:28)
Offline
How fast are you running your RAM? A lot of Ryzen instability issues are down to memory timing.
It's also worth checking that you're running the latest firmware for your motherboard.
Last edited by Slithery (2017-06-26 18:44:28)
Offline
Nasser: https://pastebin.com/HT5fzvVf
In that file there are my logs from that command. Thanks for it! Never knew u could see the errors by it.
it seems that the my hanging stopped after deleting dunst.. yet I am unsure if that was the problem.
Slithery. Well I dont feel comfortable to update my bios by usb yet.. but maybe need to do it if some problems still occurs.
Offline
See whether you can trigger it. Re-install (and start) dunst and run
notify-send what
This should bring up a message and if your system hangs instead, well ...
Maybe you need to configure dunst windows to be floating (though they're override redirects here, so the WM should ignore them anyway)
Edit: skimming the log excerpt, I'd not bet on dunst being the reason, there're quite some usb errors ...
Try to blacklist sp5100_tco (that's a HW watchdog) - in doubt use "install sp5100_tco /bin/false" to prevent it from any loading.
Last edited by seth (2017-07-05 19:34:31)
Offline
Damn so the problem wasnt really dunst.
blacklist sp5100_tco (that's a HW watchdog) I don't have watchdog installed. How do I use "install sp5100_tco /bin/false" ? Can you explain more?
Offline
Offline
Are there any issues in blacklisting sp5100_tco?
Offline
It's a watchdog. As long as the rest of the system behaves better, you won't spot any difference.
Offline
There's at least a few threads here with similar issues with Ryzen processors.
https://bbs.archlinux.org/viewtopic.php?id=228412
Spoiler: There's not much we can do to fix the problem right now.
Offline
Build the kernel using the Arch Build System applying this patch to the PKGBUILD file and tell me if it helped?
--- core-x86_64/PKGBUILD 2017-07-26 14:06:14.049991303 +0100
+++ core-x86_64.new/PKGBUILD 2017-07-26 19:04:39.076766669 +0100
@@ -16,6 +16,11 @@
"[url]https://www.kernel.org/pub/linux/kernel/v4.x/${_srcname}.tar.sign[/url]"
"[url]https://www.kernel.org/pub/linux/kernel/v4.x/patch-${pkgver}.xz[/url]"
"[url]https://www.kernel.org/pub/linux/kernel/v4.x/patch-${pkgver}.sign[/url]"
+ "0001-Extend-the-request_region-infrastructure.patch::https://bugzilla.kernel.org/attachment.cgi?id=257119"
+ "0002-Modify-behaviour-of-request_-muxed_region.patch::https://bugzilla.kernel.org/attachment.cgi?id=257121"
+ "0003-usb-pci-quirks-Protect-the-I-O-port-pair-of-SB800.patch::https://bugzilla.kernel.org/attachment.cgi?id=257123"
+ "0004-i2c-i2c-piix4-Use-request_declared_muxed_region.patch::https://bugzilla.kernel.org/attachment.cgi?id=257125"
+ "0005-watchdog-sp5100_tco-Use-request_declared_-muxed_region.patch::https://bugzilla.kernel.org/attachment.cgi?id=257127"
# the main kernel config files
'config.i686' 'config.x86_64'
# pacman hook for initramfs regeneration
@@ -27,6 +32,11 @@
'SKIP'
'a112d1330817bac401dbbd1e2c8aacb1b725bc28239e2ca58281ea3754deceb5'
'SKIP'
+ 'f1e9748f423eed1934a966440122f0cda68bf7fffe712199829783cca0ec20df'
+ '73656ab3beddbd8ec489f1eb6517b48cb4d3ec46e0e4fd8a8b47be4f65791b41'
+ '664fa9b015256c0942d60b9606abbd8c67bc2e4f1ef30f8a16c9afb4efd66ef1'
+ 'e7ecedbab6ffb352e5eb1078e78591f944bcc16a2b1f7b8081900406dfb80ff4'
+ '3997f7f0ab6688e4397655826cd9284ab588fca22c38bca8e4f7931fe6fbed97'
'f330007da72867bb86556d1f8b84b8a4c8148a5ed5195ae25570a5da61428733'
'9dd9aa4a8ec613cc8261e40db897685d75e3d426219ed8d21fa3a6bc72a27a32'
'834bd254b56ab71d73f59b3221f056c72f559553c04718e350ab2a3e2991afe0'
@@ -43,6 +53,11 @@
# add upstream patch
patch -p1 -i "${srcdir}/patch-${pkgver}"
+ patch -p1 -i "${srcdir}/0001-Extend-the-request_region-infrastructure.patch"
+ patch -p1 -i "${srcdir}/0002-Modify-behaviour-of-request_-muxed_region.patch"
+ patch -p1 -i "${srcdir}/0003-usb-pci-quirks-Protect-the-I-O-port-pair-of-SB800.patch"
+ patch -p1 -i "${srcdir}/0004-i2c-i2c-piix4-Use-request_declared_muxed_region.patch"
+ patch -p1 -i "${srcdir}/0005-watchdog-sp5100_tco-Use-request_declared_-muxed_region.patch"
# security patches
Last edited by modjohn (2017-07-26 18:29:09)
Offline
What version of the kernel is that for? I can't find an exactly matching PKGBUILD file (which isn't a problem, I'm just curious).
EDIT: I looked at the patch files, then applied them to the Arch Linux kernel package for 4.12.3-1. I'll find out if it helps in a day or week from now.
Last edited by drcouzelis (2017-07-27 16:05:05)
Offline
Well, I don't know who you are modjohn, but I've been running the kernel with the patches you suggested and haven't seen my computer lock up / reboot since.
Looks like the patches are being posted here (currently from 6/22/2017): https://patchwork.kernel.org/project/LK … tter=56671
Are these 5 patches going to be included in future kernel releases?
EDIT: Surprise! After about 3 days of running, my computer rebooted itself at 5:00 am today.
Last edited by drcouzelis (2017-08-01 13:21:09)
Offline
Now it hangs on boot once a while.. :c
Offline
Hi! According to the vast amount of study of the issue that has been done (https://community.amd.com/thread/215773) the only reliable workaround is to disable "C6 states" in BIOS. I went 10 days without a reboot after making that change.
But, you really need to do an RMA (exchange) with AMD for a new CPU. AMD has been replacing CPUs that are faulty. You need to open a new service ticket with AMD, and eventually you will mail them your broken CPU and they will mail you a new-in-box CPU as a replacement. I plan to do it too for my faulty CPU, but I decided to wait a little bit more, because they are kind of busy with returns and to see if they can further resolve the issue.
Good luck!
Offline
Hi!
Are there any tests to check if the issue really is because of this? Like some kind of gcc stress thingie? It really does make things harder if I need to send my cpu out for a while.
Damn I think I need to update my bios and then disable the C6 states. So its some kind of cpu power saving option.
Offline
http://www.phoronix.com/scan.php?page=n … Stress-Run
https://www.phoronix.com/scan.php?page= … v-Response
Last edited by Slithery (2017-08-30 14:46:16)
Offline
https://github.com/suaefar/ryzen-test
This looked interesting .. bad luck that its not arch compatible.
Also my gpu is too new for ubuntu.. maybe need to try it in liveusb and ubuntu server version.
Offline
This looked interesting .. bad luck that its not arch compatible.
?? It's totally Arch Linux compatible! It's the test I used.
Just comment out ("#") any line that contains "apt"... which is essentially one line in one file.
Also, the test can fail if you only ("only" ha) have 16 GB of RAM, so I chose to create another 16 GB of memory in the form of a swap file, as described on the Arch Linux Wiki, before running the test. The test will attempt to compile 12 instances of GCC simultaneously. My CPU failed about 5 of the 12 builds after about 3 hours. Also, stupid reboot issues...
Last edited by drcouzelis (2017-08-30 16:02:41)
Offline
I tried running it by sudo.
output:
sudo: apt: command not found
MY BAD.. noticed after reading the script and earlier comment.
Last edited by simplisticways (2017-08-31 13:53:09)
Offline
As has been said, comment out - or remove for that matter the line regarding apt install. You just have to have the base-devel group installed and all of the other tools that aren't apt should work just the same.
Last edited by V1del (2017-08-31 07:33:03)
Online
Damn.. I think I should add more swap file size.. as it has being in the loop-11 for like 3 hours.
Offline
Damn.. I think I should add more swap file size.. as it has being in the loop-11 for like 3 hours.
Three hours sounds pretty normal to me.
So, the reason the test needs so much RAM is because it uses a RAM drive to compile the files, and the files can become pretty big. You can choose to just use your internal drive instead (read the script, you'll see where to change a variable from "true" to "false"). Instead, I chose to add an additional 16 GB of memory in the form of a temporary swap file. Also, it helps that my swap file was on my NVME M.2 drive, which is super fast.
When I first did the test and ran out of RAM, I knew because my computer rebooted itself and there were red messages in "journalctl" that said something like "OMG THERE'S NO MORE MEMORY I HAVE NO IDEA WHAT TO DO SHUT DOWN EVERYTHINGBLRGRCGOEHUHRH".
My point is, you'll know if you run out of memory.
Last edited by drcouzelis (2017-08-31 15:27:36)
Offline