You are not logged in.

#1 2017-05-01 14:51:20

LandoR
Member
Registered: 2014-04-23
Posts: 16

[solved] Random System freezes since new Hardware

Hello Arch Community,

please help me figure out the random system freezes on my System.

I upgraded my Hardware recently and replaced my old CPU, RAM and Mainboard with some used parts from ebay. (i7-6700K, ASRock Z170 Extreme4, 16GB Ram)

Since then I get random System freezes where I can't move the mouse pointer or change to tty2. Even SysRQ doesn't work anymore and I have to use the power switch to turn the pc off. These freezes happen to me from 3-10 times a week. I can't find anything regarding to this in the logs.

There is another issue with Suspend to Ram on this system. I am not sure if they are connected or not. Sympthoms: on some resumes I get back to a disorted screen (huge triangles all over the sceen), it is possible to move to tty2 for the first few seconds, then the system comes unresponsive as above -> power switch.

Here are the logfiles of my last 3 boots. What happened:

boot1: worked for almost 24 hours -> suspend -> freeze
https://gist.github.com/rolandg/5ab8a69 … 162423a051

boot2: worked for almost 2 minutes: boot & started some software -> freeze
https://gist.github.com/rolandg/fc829bd … c87f9b282b

boot3: boot after freeze, still running
https://gist.github.com/rolandg/28ce38d … 30b4e5382f

I thought it was an issue with my SSD since at some point I saw some I/O errors from my system partition. Could that be my problem? Would a defect disk cause a freeze?
Any other ideas?

Thank You in advance
LandoR

Last edited by LandoR (2017-05-19 12:31:07)

Offline

#2 2017-05-01 15:31:18

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [solved] Random System freezes since new Hardware

https://wiki.archlinux.org/index.php/Kernel_Panics
https://wiki.archlinux.org/index.php/S.M.A.R.T.
https://www.archlinux.org/packages/extr … emtest86+/

Since you replaced cpu, mb & RAM with used parts, try memtest first. If you're lucky, the memory timings/clocks are just set too aggressive.

Offline

#3 2017-05-01 15:33:06

blahhumbug
Member
Registered: 2016-10-08
Posts: 64

Re: [solved] Random System freezes since new Hardware

Nothing obvious jumped out from the logs.   I would recommend installing mcelog and enabling mcelog.service

After you reboot from a hang, check journalctl for the prior boot and look for errors.

Offline

#4 2017-05-03 13:30:34

LandoR
Member
Registered: 2014-04-23
Posts: 16

Re: [solved] Random System freezes since new Hardware

Hi,
thanks for your replies.
I Forgot to mention: I use Windows for gaming and never had a freeze in Windows. I just played ~ 15h since the new hardware.

My RAM is new, but I tried memtest86+ and it doesn't report any errors.

The output of smartctl looks good imo:

$ sudo smartctl /dev/sda -a         
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.10.13-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Corsair Force GT SSD
Serial Number:    11246511000006930004
LU WWN Device Id: 0 000000 000000000
Firmware Version: 1.2
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed May  3 15:22:48 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7f) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Abort Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  48) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x0021) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   120   120   050    Pre-fail  Always       -       0
  5 Reallocated_Sector_Ct   0x0033   100   100   003    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       5092 (211 228 0)
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       2031
171 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       0
172 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       0
174 Unknown_Attribute       0x0030   000   000   000    Old_age   Offline      -       76
177 Wear_Leveling_Count     0x0000   000   000   000    Old_age   Offline      -       6
181 Program_Fail_Cnt_Total  0x0032   000   000   000    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   000   000   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   030   080   000    Old_age   Always       -       30 (Min/Max 7/80)
195 Hardware_ECC_Recovered  0x001c   100   100   000    Old_age   Offline      -       0
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail  Always       -       0
201 Unknown_SSD_Attribute   0x001c   100   100   000    Old_age   Offline      -       0
204 Soft_ECC_Correction     0x001c   100   100   000    Old_age   Offline      -       0
230 Unknown_SSD_Attribute   0x0013   100   100   000    Pre-fail  Always       -       429496729700
231 Temperature_Celsius     0x0013   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0000   000   000   000    Old_age   Offline      -       9633
234 Unknown_Attribute       0x0032   000   000   000    Old_age   Always       -       9623
241 Total_LBAs_Written      0x0032   000   000   000    Old_age   Always       -       9623
242 Total_LBAs_Read         0x0032   000   000   000    Old_age   Always       -       8298

SMART Error Log not supported

SMART Self-test Log not supported

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I enabled mcedit to report errors. No freeze since then.

Offline

#5 2017-05-03 14:03:35

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [solved] Random System freezes since new Hardware

I use Windows for gaming

Fastboot issue?

Offline

#6 2017-05-03 14:44:22

LandoR
Member
Registered: 2014-04-23
Posts: 16

Re: [solved] Random System freezes since new Hardware

seth wrote:

Fastboot issue?

If you mean fast start-up (https://wiki.archlinux.org/index.php/Du … t_Start-Up), it's disabled in windows. I had some issues with this earlier on the old system. I didn't reinstall win on the upgrade.

Offline

#7 2017-05-04 18:23:04

LandoR
Member
Registered: 2014-04-23
Posts: 16

Re: [solved] Random System freezes since new Hardware

I now also disabled fastboot option in bios. Didn't help.
Here is a log with mcelog enabled.
https://gist.github.com/rolandg/b9427d1 … 4469e3f8c5
Can't find anything in it.

Offline

#8 2017-05-04 20:00:07

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [solved] Random System freezes since new Hardware

The BIOS setting is unrelated (it just skips some self tests where the MS feature leaves HW in an undefined state)
Whatever is /dev/sdf reports a temperature > 100°C !

Either the device is already broken and reports junk or it's pretty hot in there - too hot for most things (notably RAM, Disks, it's even hot for a CPU - GPUs can usually take that much heat, but that's it)
Also the airflow cell is > 60° - in apparently a desktop system/tower?

Something seems weary about temperatures. Either a device heats up too much or the fancontrol is badly configured.

Offline

#9 2017-05-06 11:16:14

LandoR
Member
Registered: 2014-04-23
Posts: 16

Re: [solved] Random System freezes since new Hardware

Yes it's even a big tower.
The disk(s) both are a bit older and they get warm. I don't think they were 100°C when I touched them, even if smartctl said it. They were in the lower front corner of the tower where they don't get cooled very well.

I unplugged them for now, freezing persists.

Not sure if this is related:
I have these housing fans with 4 LEDs on each of them.
When I upgraded my system I clipped one of the cables to each LED.
I did not isolate them so it might be that one of them is connected to GND.

Could that be related?

Thanks

Offline

#10 2017-05-06 13:11:59

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [solved] Random System freezes since new Hardware

I'd generally suggest to rule out *any* cause by electricity stunts, yes.
An even brief shortcut could cause entirely random things (ask me whether dust can turn conductive ;-)

Offline

#11 2017-05-06 16:41:49

ewaller
Administrator
From: Pasadena, CA
Registered: 2009-07-13
Posts: 19,739

Re: [solved] Random System freezes since new Hardware

LandoR wrote:

The disk(s) both are a bit older and they get warm. I don't think they were 100°C when I touched them,

Oh, you would know if they were.  It would leave blisters.

60°C exhaust air temperature for a tower is not unreasonable.  Maybe a bit high.   
100°C for a disk drive is too high.  Be aware that this is probably not the case temperature, but some ambient sensor on the circuit board.  It could also be an airflow temperature

Temperature may be a red herring.
Have you installed and configured microcode updates for your processor?


Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
Sometimes it is the people no one can imagine anything of who do the things no one can imagine. -- Alan Turing
---
How to Ask Questions the Smart Way

Offline

#12 2017-05-10 19:49:21

LandoR
Member
Registered: 2014-04-23
Posts: 16

Re: [solved] Random System freezes since new Hardware

Ok.
I think I can be sure the hard drive was the reason for the freezes.
No freeze since 3 days.

Thanks for your help.

Any idea about the graphic glitches after resume from suspend to ram?
picture 1 2 3
restarting kwin_x11 from tty2 helps. Thaks

Offline

#13 2017-05-11 06:17:26

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [solved] Random System freezes since new Hardware

May 01 16:13:04 wks-rd kernel: NVRM: Your system is not currently configured to drive a VGA console
May 01 16:13:04 wks-rd kernel: NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
May 01 16:13:04 wks-rd kernel: NVRM: requires the use of a text-mode VGA console. Use of other console
May 01 16:13:04 wks-rd kernel: NVRM: drivers including, but not limited to, vesafb, may result in
May 01 16:13:04 wks-rd kernel: NVRM: corruption and stability problems, and is not supported.

https://wiki.archlinux.org/index.php/GR … ramebuffer

Also check the dmesg tail right after wakeup (though more than ten lines) and simply try to suspend the kwin compositor before the S3 and resume it after wakeup.

Offline

#14 2017-05-11 10:32:52

SmallAndSimple
Member
Registered: 2015-11-25
Posts: 50

Re: [solved] Random System freezes since new Hardware

LandoR wrote:

Ok.
I think I can be sure the hard drive was the reason for the freezes.
No freeze since 3 days.

Thanks for your help.

Any idea about the graphic glitches after resume from suspend to ram?
picture 1 2 3
restarting kwin_x11 from tty2 helps. Thaks

I am intrigued that the error only occured on Linux and not on Windows. Have you inspected the hard drive using tools described here: https://wiki.archlinux.org/index.php/S.M.A.R.T.?

And small edit: the microcode thing might be related, sooo, have you tried that?

Last edited by SmallAndSimple (2017-05-11 10:52:39)

Offline

#15 2017-05-12 11:56:20

LandoR
Member
Registered: 2014-04-23
Posts: 16

Re: [solved] Random System freezes since new Hardware

@seth
I'll try that in a minute.

@SmallAndSimple
Yes microcode was enabled since reinstallation.
It was a drive formated in ext4 so windows has never accessed it.

Thank You guys

EDIT:
After setting the framebufffer the system resumes to a black screen or the lock screen. I can move the mouse for some seconds then it freezes. No changing to tty2 or SysRQ possible.
Here are the logs of two tried resumes:
log 1
log 2

Last edited by LandoR (2017-05-12 12:15:18)

Offline

#16 2017-05-12 15:44:07

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [solved] Random System freezes since new Hardware

What do you mean by "after setting the framebuffer" - the logs indicate that you're still using a framebuffer console, but the idea is to do not (because that's not officially supported by the nvidia driver)
The system is btw. fully alive (SysRq is just not enabled, see https://wiki.archlinux.org/index.php/Sysctl) - the error seems to be

nvidia-modeset: ERROR: GPU:0: Idling display engine timed out

Piping that into google will get you here https://devtalk.nvidia.com/default/topi … ia-370-/11
There doesn't seem to be a known cause/solution for this, though :-(

Offline

#17 2017-05-15 19:18:25

LandoR
Member
Registered: 2014-04-23
Posts: 16

Re: [solved] Random System freezes since new Hardware

I had two freezes out of nothing yesterday. So it wasn't the disc. After one of them i was able to change to tty2 and could read some I/O errors from my system partition. So i went to the shop and got myself a brand new ssd. I hope this helps now.

seth wrote:

What do you mean by "after setting the framebuffer" - the logs indicate that you're still using a framebuffer console, but the idea is to do not (because that's not officially supported by the nvidia driver)

At least i set

GRUB_TERMINAL_OUTPUT=console

+ grub-mkconfig. After that the border of the grub screen disappeared. Might be a log from before it was set.

seth wrote:

The system is btw. fully alive (SysRq is just not enabled

Oh. You are right. I didn't reenable sysrq after the installation. But if it were fully alive it would be possible to change to tty2. This is sometimes possible, sometimes the system correctly resumes after 30s. Sometimes tty2 freezes after switching to it. I already had all variations.

Thanks

Offline

#18 2017-05-15 19:30:57

seth
Member
Registered: 2012-09-03
Posts: 49,951

Re: [solved] Random System freezes since new Hardware

Framebuffer/graphics issue - you could try to ssh into it for an inspection.

Do you still load the nouveau module at some point? Eg. from/in the initramfs?
The tty2 output should "look like DOS" ie. only have 25 lines of text.

Offline

#19 2017-05-19 12:30:38

LandoR
Member
Registered: 2014-04-23
Posts: 16

Re: [solved] Random System freezes since new Hardware

Ok seems fixed since i've got the new ssd.
I also had no more problems resuming from suspend.
Even if I've got no idea why, I'll close this thread.

Offline

Board footer

Powered by FluxBB