You are not logged in.

#1 2018-05-05 18:58:25

valniborf
Member
Registered: 2018-05-05
Posts: 6

Systematic freeze on video playback or opengl

Hi,

To upgrade an old PC, I bought new parts:
- Motherboard: Asus A320M-K
- Processor: AMD Ryzen3 1300X
- Ballistix Sport 4GB DDR4 RAM (2666MHz, 1.2V)

I kept my old MSI Nvidia GT710 graphic card, my old PSU and my old HDD (with archlinux installed, up to date)

I updated the bios to the last firmware.

With the nvidia proprietary driver, whenever I started video playback with mplayer or mpv, the first frames were displayed and the system froze (display not updated, only mouse pointer moving, input ignored).
Sometimes, when I pressed the "q" key early, mplayer succeeds to exit, avoiding a hard reset. Launching mplayer with x11 video output (mplayer -vo x11 <filename>) played the video normally.
When i launched glxinfo, it outputs the display name, then freezes about 10 seconds before outputting the other information. glxgears froze the computer immediately.

I tried with the nouveau driver, worst behaviour, computer froze even without video playback.
With hardware acceleration disabled (Option "NoAccel" "True" in Xorg conf), no freeze.

I tried disabling the following options in the bios:
- EPU Power saving Mode
- Global C-state Control
- Opcache Control
Also tried with combination of kernel parameters pcie_aspm=off , pcie=noacpi, acpi=off
No improvement.

Errors appeared about DMA in the journal so I bought a new PSU: Corsair VS550.
Apparently, no more DMA errors but no improvement either.
 
I installed Windows7, the system hung while navigating in nvidia property dialog (nvidia driver crash, recovered by windows).

I swapped my graphic card with a Geforce405 from another PC (also under archlinux).Same behaviour. The GT710 works OK in the other PC (nvidia driver).

Yet another firmware update was available for the A320M-K so i installed it and set the new option Power Supply Idle Control to Typical Current Idle.

Now Windows7 is working correctly (video playback OK, can launch blender and do some mesh editing,...).

I reinstalled archlinux completely but I still have the problem on linux with a small improvement. I can usually sometimes launch the glxinfo command and it outputs immediately. I can even launch one or two glxgears commands and the window are rendered at 60fps (I output via HDMI toward a TV Monitor).
But suddenly, the display freezes and is unrecoverable.
When I succeed to launch a video with mplayer and stops it before the system freezes completely, the following errors appear in dmesg:

 [mai 5 16:04] pcieport 0000:00:03.1: AER: Uncorrected (Non-Fatal) error received: id=0000
[  +0,000011] pcieport 0000:00:03.1: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0019(Requester ID)
[  +0,000008] pcieport 0000:00:03.1:   device [1022:1453] error status/mask=00100000/04400000
[  +0,000004] pcieport 0000:00:03.1:    [20] Unsupported Request    (First)
[  +0,000005] pcieport 0000:00:03.1:   TLP Header: 40000001 0700000f fee00000 00000000
[  +0,000007] pcieport 0000:00:03.1: broadcast error_detected message
[  +0,000033] pcieport 0000:00:03.1: AER: Device recovery failed

I also usually have Xid=8 nvidia errors in the log when the freeze occurs:

mai 05 20:55:40 arch1 kernel: NVRM: GPU at PCI:0000:07:00: GPU-aea364b7-bc9c-d936-1c0b-105ba2e6d1b2
mai 05 20:55:40 arch1 kernel: NVRM: Xid (PCI:0000:07:00): 8, Channel 00000001
mai 05 20:55:42 arch1 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
mai 05 20:55:46 arch1 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
mai 05 20:55:48 arch1 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
mai 05 20:55:56 arch1 kernel: NVRM: Xid (PCI:0000:07:00): 8, Channel 00000001
mai 05 20:55:58 arch1 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
mai 05 20:56:01 arch1 dhcpcd[389]: enp5s0: fe80::1a1e:78ff:fe39:1732 is unreachable, expiring it
mai 05 20:56:02 arch1 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context
mai 05 20:56:04 arch1 dhcpcd[389]: enp5s0: fe80::1a1e:78ff:fe39:1732 is reachable again
mai 05 20:56:04 arch1 kernel: NVRM: os_schedule: Attempted to yield the CPU while in atomic or interrupt context

I tried to stress the processor using "stress -c 8 -m 8 -i 8" and it did not produce any error.
I also tried to limit the memory frequency in the BIOS (tried with 2400, 2133 and even with 1333).

While booting , the following errors appear:

mai 05 20:59:52 arch1 kernel: ACPI Error: Needed [Integer/String/Buffer], found [Region]         (ptrval) (20180105/exresop-424)
mai 05 20:59:52 arch1 kernel: ACPI Error: AE_AML_OPERAND_TYPE, Could not execute arguments for [IOB2] (Region) (20180105/nsinit-426)
mai 05 20:59:52 arch1 kernel: ACPI: [Firmware Bug]: BIOS _OSI(Linux) query ignored
...
mai 05 20:59:54 arch1 kernel: sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
mai 05 20:59:54 arch1 kernel: sp5100-tco sp5100-tco: I/O address 0x0cd6 already in use
mai 05 20:59:54 arch1 kernel: sp5100-tco: probe of sp5100-tco failed with error -16
mai 05 20:59:54 arch1 kernel: acpi PNP0C14:01: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:00)
mai 05 20:59:54 arch1 kernel: ccp 0000:08:00.2: enabling device (0000 -> 0002)
mai 05 20:59:54 arch1 kernel: ccp 0000:08:00.2: ccp enabled
mai 05 20:59:54 arch1 kernel: ccp 0000:08:00.2: psp initialization failed
mai 05 20:59:54 arch1 kernel: ccp 0000:08:00.2: enabled
...
mai 05 20:59:56 arch1 kernel: kvm: disabled by bios
...
mai 05 21:00:25 arch1 kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff wi>
mai 05 21:00:25 arch1 kernel: caller _nv000788rm+0xe4/0x1c0 [nvidia] mapping multiple BARs
mai 05 21:00:25 arch1 kernel: NVRM: Your system is not currently configured to drive a VGA console
mai 05 21:00:25 arch1 kernel: NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
mai 05 21:00:25 arch1 kernel: NVRM: requires the use of a text-mode VGA console. Use of other console
mai 05 21:00:25 arch1 kernel: NVRM: drivers including, but not limited to, vesafb, may result in
mai 05 21:00:25 arch1 kernel: NVRM: corruption and stability problems, and is not supported.
mai 05 21:00:25 arch1 kernel: ------------[ cut here ]------------
mai 05 21:00:25 arch1 kernel: Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'nvidia_stack_t' (offset 11864, size 3>
mai 05 21:00:25 arch1 kernel: WARNING: CPU: 2 PID: 524 at mm/usercopy.c:81 usercopy_warn+0x7e/0xa0
mai 05 21:00:25 arch1 kernel: Modules linked in: cfg80211 8021q mrp snd_hda_codec_hdmi snd_hda_codec_realtek nls_iso8859_1 nls_cp437 nvidia(PO) snd_hda_codec_ge>
mai 05 21:00:25 arch1 kernel: CPU: 2 PID: 524 Comm: Xorg Tainted: P           O     4.16.6-1-ARCH #1
mai 05 21:00:25 arch1 kernel: Hardware name: System manufacturer System Product Name/PRIME A320M-K, BIOS 4011 04/19/2018
mai 05 21:00:25 arch1 kernel: RIP: 0010:usercopy_warn+0x7e/0xa0
mai 05 21:00:25 arch1 kernel: RSP: 0018:ffff9cc800b17bb0 EFLAGS: 00010286
mai 05 21:00:25 arch1 kernel: RAX: 0000000000000000 RBX: ffff8c830db12e58 RCX: 0000000000000001
mai 05 21:00:25 arch1 kernel: RDX: 0000000080000001 RSI: ffffffff8de680bc RDI: 00000000ffffffff
mai 05 21:00:25 arch1 kernel: RBP: 0000000000000003 R08: 0000000000000094 R09: 0000000000000356
mai 05 21:00:25 arch1 kernel: R10: ffffffff8dea43b9 R11: 0000000000000001 R12: 0000000000000001
mai 05 21:00:25 arch1 kernel: R13: ffff8c830db12e5b R14: ffff8c830db12e58 R15: ffff8c830db12ea0
mai 05 21:00:25 arch1 kernel: FS:  00007f0f4b65f940(0000) GS:ffff8c831ec80000(0000) knlGS:0000000000000000
mai 05 21:00:25 arch1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
mai 05 21:00:25 arch1 kernel: CR2: 00007f0f42448000 CR3: 000000010d194000 CR4: 00000000003406e0
mai 05 21:00:25 arch1 kernel: Call Trace:
mai 05 21:00:25 arch1 kernel:  __check_object_size+0x130/0x1a0
mai 05 21:00:25 arch1 kernel:  os_memcpy_to_user+0x21/0x40 [nvidia]
mai 05 21:00:25 arch1 kernel:  _nv001372rm+0xa5/0x260 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? _nv004784rm+0x4eba/0x5500 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? _nv004331rm+0xec/0xf0 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? _nv004326rm+0xca/0x650 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? _nv015126rm+0x576/0x5c0 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? _nv000694rm+0x2e/0x60 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? _nv000789rm+0x5f5/0x8b0 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? _raw_spin_unlock_irqrestore+0x20/0x40
mai 05 21:00:25 arch1 kernel:  ? rm_ioctl+0x73/0x100 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? nvidia_ioctl+0x221/0x460 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? nvidia_frontend_ioctl+0x2d/0x60 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? nvidia_frontend_unlocked_ioctl+0x19/0x20 [nvidia]
mai 05 21:00:25 arch1 kernel:  ? do_vfs_ioctl+0xa4/0x630
mai 05 21:00:25 arch1 kernel:  ? __sb_end_write+0x42/0x60
mai 05 21:00:25 arch1 kernel:  ? vfs_write+0x131/0x1a0
mai 05 21:00:25 arch1 kernel:  ? SyS_ioctl+0x74/0x80
mai 05 21:00:25 arch1 kernel:  ? do_syscall_64+0x74/0x190
mai 05 21:00:25 arch1 kernel:  ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2
mai 05 21:00:25 arch1 kernel: Code: 48 c7 c0 81 23 e7 8d 48 0f 44 c2 41 50 51 41 51 48 89 f9 49 89 f1 4d 89 d8 4c 89 d2 48 89 c6 48 c7 c7 d8 23 e7 8d e8 42 aa e>
mai 05 21:00:25 arch1 kernel: ---[ end trace 3f3ad069147a9381 ]---

I would like to know which part of the hardware is failing: CPU, memory or motherboard. I may return it.
Now that the system seems to work under Windows, it may be a software issue.
I would appreciate any hint...

Last edited by valniborf (2018-05-05 19:55:51)

Offline

#2 2018-05-06 05:07:09

nesk
Member
Registered: 2011-03-31
Posts: 181

Re: Systematic freeze on video playback or opengl

Welcome to the forums.
This looks suspicious:

mai 05 21:00:25 arch1 kernel: NVRM: Your system is not currently configured to drive a VGA console
mai 05 21:00:25 arch1 kernel: NVRM: on the primary VGA device. The NVIDIA Linux graphics driver
mai 05 21:00:25 arch1 kernel: NVRM: requires the use of a text-mode VGA console. Use of other console
mai 05 21:00:25 arch1 kernel: NVRM: drivers including, but not limited to, vesafb, may result in
mai 05 21:00:25 arch1 kernel: NVRM: corruption and stability problems, and is not supported.
  • Please post your kernel cmdline (cat /proc/cmdline)

  • How do you start X? If DM - which DM, if startx - post your .xinitrc.

  • You mentioned outputting to HDMI - is that HDMI on Nvidia card?

  • Do you have any other cards installed (including iGPU embedded in the CPU?

  • Which bootloader are you using?

Offline

#3 2018-05-06 07:21:31

seth
Member
Registered: 2012-09-03
Posts: 49,992

Re: Systematic freeze on video playback or opengl

There's an uncorrected PCI error and xid 8 means the GPU stopped processing.
Try passing "rcutree.rcu_idle_gp_delay=1" to the kernel parameters, also try the behavior of the lts kernel.

Usually I'd say the circumstances *SCREAM* "underpowered", but you say you also upgraded the PSU - does the new one have more power than the old one?

@nesk, that warning is (unfortunately) "normal" on UEFI systems - it's "bad" and can have funny effects, but rather not the ones recorded here.

Offline

#4 2018-05-06 07:45:30

valniborf
Member
Registered: 2018-05-05
Posts: 6

Re: Systematic freeze on video playback or opengl

$cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=6cd94556-7d06-451b-ab8a-e2120c38c0b4 rw quiet

I also tried by appending pcie_aspm=off

Before, I was using lightdm, launched through the systemd service. Now, I simply use starts.
My WM is fluxbox.

My .xinitrc (no special config in /etc/X11/xinit/)

#!/bin/sh

userresources=$HOME/.Xresources
usermodmap=$HOME/.Xmodmap
sysresources=/etc/X11/xinit/.Xresources
sysmodmap=/etc/X11/xinit/.Xmodmap

# merge in defaults and keymaps

if [ -f $sysresources ]; then
    xrdb -merge $sysresources
fi

if [ -f $sysmodmap ]; then
    xmodmap $sysmodmap
fi

if [ -f "$userresources" ]; then
    xrdb -merge "$userresources"
fi

if [ -f "$usermodmap" ]; then
    xmodmap "$usermodmap"
fi

# start some nice programs

if [ -d /etc/X11/xinit/xinitrc.d ] ; then
 for f in /etc/X11/xinit/xinitrc.d/?*.sh ; do
  [ -x "$f" ] && . "$f"
 done
 unset f
fi

exec startfluxbox

Yes, the HDMI is on the NVidia card, thre is no other card on the motherboard. Two SATA disks are plugged.
The Ryzen3 1300X has no GPU embedded.

For the bootloader, before I reinstalled, the boot was with MBR via Grub. Now it is again Grub but I did a UEFI install.

@Seth: I will try your kernel options. The old PSU had only 300W power. The new one is 550W. For both the motherboard and the MSI GT710, 350W is the minimum required according to the documentation.
The Nvidia GT710 (1GB) is a passive cooling one. The Geforce405 has a little fan.

Offline

#5 2018-05-06 08:42:25

valniborf
Member
Registered: 2018-05-05
Posts: 6

Re: Systematic freeze on video playback or opengl

So, I tried with the kernel parameter "rcutree.rcu_idle_gp_delay=1", same behaviour.

I installed and boot the linux-lts kernel but this latter requires "nvidia-340xx-lts"

$pacman -Qs nvidia

local/libvdpau 1.1.1+3+ga21bf7a-1
    Nvidia VDPAU library
local/nvidia-340xx 340.106-32
    NVIDIA drivers for linux, 340xx legacy branch
local/nvidia-340xx-lts 340.106-9
    NVIDIA drivers for linux-lts
local/nvidia-340xx-utils 340.106-1
    NVIDIA drivers utilities

When I use the "glxinfo" after the "startx" command, GLX seems down:

$glxinfo

X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  154 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  37
  Current serial number in output stream:  38

Offline

#6 2018-05-06 09:21:44

loqs
Member
Registered: 2014-03-06
Posts: 17,196

Re: Systematic freeze on video playback or opengl

Offline

#7 2018-05-06 10:00:21

valniborf
Member
Registered: 2018-05-05
Posts: 6

Re: Systematic freeze on video playback or opengl

It may effectively explains the problem with nvidia-340xx-lts.

I tried once more with xf86-video-nouveau with either the last kernel or the lts-kernel and it freezes very quickly (while scrolling in a terminal for example) without any error in the journal.

Offline

#8 2018-05-06 10:17:31

nesk
Member
Registered: 2011-03-31
Posts: 181

Re: Systematic freeze on video playback or opengl

seth wrote:

@nesk, that warning is (unfortunately) "normal" on UEFI systems - it's "bad" and can have funny effects, but rather not the ones recorded here.

Interesting, I never had it on my UEFI system back when I had Nvidia. Cursory web search suggests severe incompatibilities between vesafb (and other framebuffer drivers) and Nvidia - maybe OP should disable those, what do you think?

Offline

#9 2018-05-06 10:49:14

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,427

Re: Systematic freeze on video playback or opengl

That only really has an "effect" on plain TTYs it shouldn't matter in Xorg and definitely not have these effects during normal usage.

@valniborf FWIW that card is supported by the latest nvidia driver (or are you on the GT460 and not the 710 anymore?), as the error seems to be tied to the yield instruction, does it help if you set, in e.g. your /etc/profile

export __GL_YIELD=USLEEP

?  FWIW I'd also rule out the usual ryzen suspects like explicitly disabling c-state 6 (see discussion here: https://bbs.archlinux.org/viewtopic.php?id=233304 ) and/or running https://github.com/suaefar/ryzen-test (edit the kill-ryzen.sh to remove the apt-gets in the beginning and the exit condition, you will want to have base-devel installed) to rule out a processor issue.

Offline

#10 2018-05-06 11:38:54

valniborf
Member
Registered: 2018-05-05
Posts: 6

Re: Systematic freeze on video playback or opengl

I still have the GT218 (Geforce 405) in the computer.

"Global C-state Control" is already disabled in the BIOS.
However, I added the line in /etc/profile, rebooted, disabled C6-state via the script mentioned in the link you provided, launched startx but the PC froze on the first glxinfo command.

Offline

#11 2018-05-06 17:16:08

valniborf
Member
Registered: 2018-05-05
Posts: 6

Re: Systematic freeze on video playback or opengl

I used the kill-ryzen script (first the one from suaefar, then from Oxalin ). It ran 3h30 without error on the 4 processors. I finally stopped it.

Offline

Board footer

Powered by FluxBB