You are not logged in.

#1 2016-06-04 12:19:05

avojevlavo
Member
Registered: 2010-05-27
Posts: 40

System freeze, faulty gpu?

Hello everyone, need help pinpointing the issue,

My system is intel i5820k, gigabyte X99-ud4p and Asus GeForce GTX 760 (rev a1), nothing is overclocked.

I am experiencing system freezes while gaming, I suspect either nvidia drivers (I hope, for it might be fixable), faulty gpu (most likely) or mobo (not likely I hope, cuz its a new one, but still).

However I had the same GPU on my old system (10yo dualcore conroe) with Arch and had none of these problems!

I cant switch TTY when it freezes and have to go for the hard restart, but the freeze does not kill the system entirely, the mouse is still responding (usualy but not always) if I wait I am sometimes able to switch to desktop as the system appears to be responsive even after the freeze but only for a split second every 5 or so seconds, but I never managed to switch the TTY an kill X.

I know of one time when the freeze occured when steam and firefox were the only apps running (no games).
And even thought the freeze locks everything it does not seem to affect youtube (wtf?), the audio kept playing while the whole system was freezed (this happened once), but smplayer always freezes with the system if I have music or video playing on the background and I get the usual "techno loop" smile

It is more common with 3d gaming (HL2 20s after new game, Kerbal 10-15 min even though I managed to get a 2hours with no crash once, CivV once a hour), 2d indie games are much better (prison architect and project zomboid), freezes occurs only once or twice a week.

The freeze never occured while doing anything else, be it heavy firefox usage, video transcoding, blender rendering, pacman istalling so that is why I suspect the gpu/drivers.

I tried multiple versions of nvidia drivers and older kernels before I tried reinstalling the whole system, no luck.

Right now using: nvidia-lts 364.19-2 and linux-lts 4.4.11-1-lts

Cant downgrade as it is a new install and using LTS because of ZFS (I know its not necessary for ZFS but it makes life easier).

As I said I suspect the faulty GPU but it worked OK on my old system with arch, I would try different GPU but does not have the money for it nor the possibility to borrow one temporarily.

Any suggestions?  (should I try live ubuntu, downgrade drivers, different kernel, anything)

System:    Host: Kernel: 4.4.11-1-lts x86_64 (64 bit gcc: 6.1.1) Desktop: Awesome 3.5.9
           Distro: Arch Linux
Machine:   System: Gigabyte product: N/A
           Mobo: Gigabyte model: X99-UD4P-CF v: x.x Bios: American Megatrends v: F21a date: 01/12/2016
CPU:       Hexa core Intel Core i7-5820K (-HT-MCP-) cache: 15360 KB
           flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) bmips: 39601
           clock speeds: max: 3600 MHz 1: 1199 MHz 2: 1210 MHz 3: 1199 MHz 4: 1200 MHz 5: 1212 MHz 6: 1199 MHz
           7: 1222 MHz 8: 1201 MHz 9: 1200 MHz 10: 1199 MHz 11: 1298 MHz 12: 1256 MHz
Graphics:  Card: NVIDIA GK104 [GeForce GTX 760] bus-ID: 03:00.0
           Display Server: N/A driver: nvidia Resolution: 141x55
Audio:     Card-1 Intel C610/X99 series HD Audio Controller driver: snd_hda_intel bus-ID: 00:1b.0
           Card-2 NVIDIA GK104 HDMI Audio Controller driver: snd_hda_intel bus-ID: 03:00.1
           Sound: Advanced Linux Sound Architecture v: k4.4.11-1-lts
Network:   Card: Intel Ethernet Connection (2) I218-V driver: e1000e v: 3.2.6-k port: f020 bus-ID: 00:19.0
           IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
Drives:    HDD Total Size: 4268.6GB (0.0% used) ID-1: /dev/sda model: WDC_WD2000F9YZ size: 2000.4GB
           ID-2: /dev/sdb model: Samsung_SSD_840 size: 128.0GB
           ID-3: /dev/sdc model: WDC_WD2000F9YZ size: 2000.4GB
           ID-4: /dev/sdd model: Samsung_SSD_840 size: 128.0GB
           ID-5: USB /dev/sde model: Patriot_Memory size: 7.7GB
           ID-6: USB /dev/sdf model: STORE_N_GO size: 4.0GB
Partition: ID-1: / size: 224G used: 88G (40%) fs: zfs dev: N/A
           ID-2: /boot size: 3.8G used: 57M (2%) fs: vfat dev: /dev/sdf1
Sensors:   None detected - is lm-sensors installed and configured?
Info:      Processes: 964 Uptime: 4:36 Memory: 9155.4/15879.8MB Init: systemd Gcc sys: 6.1.1
           Client: Shell (bash 4.3.421) inxi: 2.3.0 

Last edited by avojevlavo (2016-06-04 12:31:25)

Offline

#2 2016-06-04 12:32:26

WorMzy
Forum Moderator
From: Scotland
Registered: 2010-06-16
Posts: 11,858
Website

Re: System freeze, faulty gpu?

Check the journal entries from the end of a hung session. If there aren't any, enable sysrq and reboot "safely" the next time it happens, to make sure any journal entries from the end of the session are flushed to disk: https://wiki.archlinux.org/index.php/Ke … uts#Kernel


Sakura:-
Mobo: MSI MAG X570S TORPEDO MAX // Processor: AMD Ryzen 9 5950X @4.9GHz // GFX: AMD Radeon RX 5700 XT // RAM: 32GB (4x 8GB) Corsair DDR4 (@ 3000MHz) // Storage: 1x 3TB HDD, 6x 1TB SSD, 2x 120GB SSD, 1x 275GB M2 SSD

Making lemonade from lemons since 2015.

Offline

#3 2016-06-04 13:07:35

avojevlavo
Member
Registered: 2010-05-27
Posts: 40

Re: System freeze, faulty gpu?

WorMzy wrote:

Check the journal entries from the end of a hung session. If there aren't any, enable sysrq and reboot "safely" the next time it happens, to make sure any journal entries from the end of the session are flushed to disk: https://wiki.archlinux.org/index.php/Ke … uts#Kernel

Thanks, journalctl is full of this (almost 2000 lines for 1 second):

kernel: pcieport 0000:00:02.0: can't find device of ID0010
kernel: pcieport 0000:00:02.0: AER: Corrected error received: id=0010
kernel: pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0010(Transmitter ID)
kernel: pcieport 0000:00:02.0: device [8086:2f04] error status/mask=00001000/00002000
kernel: pcieport 0000:00:02.0: [12] Replay Timer Timeout  

after, some googling this forum thread comes up

http://www.overclock.net/t/1539708/ques … nd-to-poll

Where Blinux suggests

BLinux wrote:

1) Disable Message Signalled Interrupts (MSI) in the kernel with pci=nomsi option. This however, also disables MSI for all other devices so may be less than optimal.

2) Disable MMCONFIG (MMIO access method to PCI configuration space) in the kernel with pci=nommconf. I don't understand this enough to say what kind of drawback this has, but it is a way to workaround the PCIe bus errors while keeping MSI (option #1 above) enabled for other devices.

as I have no idea what these options do I am a bit scared of trying that smile

Offline

#4 2016-06-04 16:50:08

mich41
Member
Registered: 2012-06-22
Posts: 796

Re: System freeze, faulty gpu?

You can start with pci=noaer which simply disables this "advanced error reporting" thing. If your freezes are caused by 100% CPU load resulting from those reports, it may help. If they are caused by the underlying hardware issues which trigger those reports, it probably won't.

The options you quoted are safe, MSI is a minor performance improvement and MMCONFIG gives access to some extended PCIe configuration registers which shouldn't be needed for basic operation. It's even possible that those options help exactly because they have a side-effect of disabling AER wink But I'm not sure of that.

BTW, nvidia.NVreg_EnableMSI=0 is said to disable MSIs in the NVIDIA driver only.

Offline

#5 2016-06-05 08:36:30

newbie1962
Member
From: italy
Registered: 2012-07-24
Posts: 137

Re: System freeze, faulty gpu?

your nvidia graphics card is not configured, please refer nvidia,xorg wiki and opengl

Graphics:  Card-1: Intel 3rd Gen Core processor Graphics Controller
           bus-ID: 00:02.0
           Card-2: NVIDIA GF108M [GeForce GT 635M] bus-ID: 01:00.0
           Display Server: X.org 1.18.3 drivers: nouveau,intel
           tty size: 80x24 Advanced Data: N/A for root
Audio:     Card Intel 7 Series/C210 Series Family High Definition Audio Controller

Last edited by newbie1962 (2016-06-05 08:36:55)


hp-envy dv7

Offline

#6 2017-02-11 12:55:13

spoonie_aus
Member
From: Australia W.A
Registered: 2009-03-12
Posts: 47

Re: System freeze, faulty gpu?

avojevlavo  are you still have this issiue?
I'm experiencing the exactly same issue with my GTX 760 375.26. System will lock up between 20mins to and 1hr while playing games.

Offline

Board footer

Powered by FluxBB