You are not logged in.

#1 2020-06-14 13:20:10

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

System randomly freeze

Hello,

I recently installed Archlinux on my custom NAS server

However the system totally freeze at random moment. I lost access to SSH, keyboard led remain set, the only way to get access back is to press Power button for 10sec or unplug the PC.

My PC
* CPU: AMD Ryzen 3 3200G with Vega
* Mainboard: MSI MPG X570 GAMING PLUS
* RAM: HyperX Fury 2666MHz 8GB (2x4GB)

Before a crash happens, I got this in journalctl (output truncated)

04:04:49 Reina kernel: clocksource: timekeeping watchdog on CPU3: Marking clocksource 'tsc' as unstable because the skew is too large:
04:04:49 Reina kernel: clocksource:                       'hpet' wd_now: 5b4d62ff wd_last: 5acc8cb9 mask: ffffffff
04:04:49 Reina kernel: clocksource:                       'tsc' cs_now: 269ec9a60f4 cs_last: 26985db7b00 mask: ffffffffffffffff
04:04:49 Reina kernel: tsc: Marking TSC unstable due to clocksource watchdog
04:04:49 Reina kernel: TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
04:04:49 Reina kernel: sched_clock: Marking unstable (706191510027, 170240201)<-(706363924801, -2174678)
04:04:49 Reina kernel: clocksource: Switched to clocksource hpet
04:05:49 Reina kernel: ------------[ cut here ]------------
04:05:49 Reina kernel: NETDEV WATCHDOG: enp39s0 (r8169): transmit queue 0 timed out
04:05:49 Reina kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x26d/0x280
04:05:49 Reina kernel: Modules linked in: amdgpu rtl8821ae edac_mce_amd btcoexist rtl_pci joydev hid_generic rtlwifi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio kvm usbhid snd_hda_codec_hdmi snd_hda_intel wmi_bmof hid mac80211 irqbypass snd_intel_dspcfg snd_hda_codec nls_iso8859_1 gpu_sched snd_hda_core i2c_algo_bit snd_hwdep raid0 nls_cp437 crct10dif_pclmul crc32_pclmul vfat ttm md_mod ghash_clmulni_intel snd_pcm drm_kms_helper fat ccp snd_timer snd cec aesni_intel rc_core crypto_simd sp5100_tco syscopyarea sysfillrect cryptd sysimgblt glue_helper cfg80211 k10temp i2c_piix4 pcspkr fb_sys_fops soundcore r8169 rng_core realtek rfkill libphy libarc4 wmi pinctrl_amd evdev mac_hid drm agpgart ip_tables x_tables xfs libcrc32c crc32c_generic crc32c_intel xhci_pci xhci_hcd
04:05:49 Reina kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.7.2-arch1-1 #1
04:05:49 Reina kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C37/MPG X570 GAMING PLUS (MS-7C37), BIOS A.70 01/09/2020
04:05:49 Reina kernel: RIP: 0010:dev_watchdog+0x26d/0x280
04:05:49 Reina kernel: Code: a7 e9 76 ff eb 85 4c 89 f7 c6 05 1c f2 e9 00 01 e8 68 cc fa ff 44 89 e9 4c 89 f6 48 c7 c7 18 4f e0 8d 48 89 c2 e8 9a cb 7f ff <0f> 0b e9 63 ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44
04:05:49 Reina kernel: RSP: 0018:ffffbfde4007ce50 EFLAGS: 00010282
04:05:49 Reina kernel: RAX: 0000000000000000 RBX: ffffa28cd2a33a00 RCX: 0000000000000000
04:05:49 Reina kernel: RDX: 0000000000000103 RSI: ffffa28cd8659ac8 RDI: 00000000ffffffff
04:05:49 Reina kernel: RBP: ffffa28cd552c3dc R08: 00000000000004a0 R09: 0000000000000001
04:05:49 Reina kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffffa28cd552c480
04:05:49 Reina kernel: R13: 0000000000000000 R14: ffffa28cd552c000 R15: ffffa28cd2a33a80
04:05:49 Reina kernel: FS:  0000000000000000(0000) GS:ffffa28cd8640000(0000) knlGS:0000000000000000
04:05:49 Reina kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
04:05:49 Reina kernel: CR2: 00007f61c0865060 CR3: 000000018183a000 CR4: 00000000003406e0
04:05:49 Reina kernel: Call Trace:
04:05:49 Reina kernel:  <IRQ>
04:05:49 Reina kernel:  ? pfifo_fast_dequeue+0x1d0/0x1d0
04:05:49 Reina kernel:  ? pfifo_fast_dequeue+0x1d0/0x1d0
04:05:49 Reina kernel:  call_timer_fn+0x2d/0x160
04:05:49 Reina kernel:  ? pfifo_fast_dequeue+0x1d0/0x1d0
04:05:49 Reina kernel:  __run_timers+0x193/0x2a0
04:05:49 Reina kernel:  run_timer_softirq+0x2b/0x50
04:05:49 Reina kernel:  __do_softirq+0x10f/0x358
04:05:49 Reina kernel:  irq_exit+0xab/0x120
04:05:49 Reina kernel:  smp_apic_timer_interrupt+0xa6/0x1b0
04:05:49 Reina kernel:  apic_timer_interrupt+0xf/0x20
04:05:49 Reina kernel:  </IRQ>
04:05:49 Reina kernel: RIP: 0010:native_safe_halt+0xe/0x10
04:05:49 Reina kernel: Code: f0 80 48 02 20 48 8b 00 a8 08 75 c3 e9 7a ff ff ff cc cc cc cc cc cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d d6 b0 41 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d c6 b0 41 00 f4 c3 cc cc 0f 1f 44 00
04:05:49 Reina kernel: RSP: 0018:ffffbfde40147ec0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
04:05:49 Reina kernel: RAX: 0000000000000001 RBX: ffffa28cd77a3c80 RCX: 0000000000000001
04:05:49 Reina kernel: RDX: 0000000000000001 RSI: 0000000000000087 RDI: ffffffff8ddf4a0d
04:05:49 Reina kernel: RBP: 0000000000000001 R08: 0000000000000046 R09: 0000000000000020
04:05:49 Reina kernel: R10: 0000000000000040 R11: 0000000000000000 R12: 0000000000000000
04:05:49 Reina kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
04:05:49 Reina kernel:  default_idle+0x18/0x170
04:05:49 Reina kernel:  do_idle+0x1f1/0x260
04:05:49 Reina kernel:  cpu_startup_entry+0x19/0x20
04:05:49 Reina kernel:  start_secondary+0x190/0x1e0
04:05:49 Reina kernel:  secondary_startup_64+0xb6/0xc0
04:05:49 Reina kernel: ---[ end trace 99e196efed36f7cb ]---
04:06:47 Reina kernel: hrtimer: interrupt took 26197047 ns

Any ideas ?
Thanks

_________

SOLVED:
Replace your AMD component by any Intel (if CPU/APU) or NVidia (if GPU)
Ryzen 3 3200G → Core i7-7700K and not any freeze/kernel panic o//

Last edited by automne (2020-07-01 21:12:34)

Offline

#2 2020-06-15 07:51:50

archixxx
Member
Registered: 2012-10-17
Posts: 40

Re: System randomly freeze

I've a similar problem. My PC worked perfectly stable for about 9 month but since about 2 weeks ago it freezes randomly about once a day. No SSH access, no direct access, blank screen. I can only power cycle or press reset button. Sadly I don't get any (error) messages that would point me at least a little bit where the problem is located. Nothing unusual in "dmesg", systemd journal or other logs.
Yesterday I was logged in via ssh while such a freeze happened. I had "top", "dmesg -w" and "journalctl -f" running at that time but I also didn't get any errors or useful messages before the freeze. So I suspect that something changed at the "lower level" like kernel, driver, CPU microcode, power states or something like this.

My PC:
* AMD Ryzen 9 3900X
* Mainboard: ASUS ROG STRIX X570-E GAMING (BIOS 1409 05/12/2020)
* RAM: Corsair Vengeance RGB Pro 64GB DDR4 Kit 3200 (4x16GB)
* GFX: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]

Since the system worked stable for months I think it's not related to any hardware issue. I also didn't changed anything in that regard but I guess one of the next things I'll do is a RAM test.

So the first thing I did was updating the BIOS to the latest release. But that didn't changed anything. Freezes still happen. Meanwhile I'm not so sure if that was a good idea at all because if the problem is microcode related then the new BIOS would contain that broken code too I guess and downgrading the BIOS isn't something I really want to do... In general I didn't changed that much in the BIOS settings. No overclocking and stuff like that. Only activated SVT and SMT and changed a few fan settings.

So today I downgraded kernel/kernel-headers to "linux-5.6.4.arch1-1" (from "linux-5.7.2.arch1-1"), amd-ucode to "amd-ucode-20200316.8eb0b28-1-any" and linux-firmware to "linux-firmware-20200316.8eb0b28-1". "linux-5.6.4" because I'm pretty sure that this kernel worked perfectly fine. I'm not so sure about the later ones. All the packages/files were released in the first half of April. So let's see what happens.

If that doesn't help the next thing I'll try is the RAM test already mentioned above.

Offline

#3 2020-06-15 10:55:32

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

archixxx wrote:

I've a similar problem. My PC worked perfectly stable for about 9 month but since about 2 weeks ago it freezes randomly about once a day. No SSH access, no direct access, blank screen. I can only power cycle or press reset button. Sadly I don't get any (error) messages that would point me at least a little bit where the problem is located. Nothing unusual in "dmesg", systemd journal or other logs.
Yesterday I was logged in via ssh while such a freeze happened. I had "top", "dmesg -w" and "journalctl -f" running at that time but I also didn't get any errors or useful messages before the freeze. So I suspect that something changed at the "lower level" like kernel, driver, CPU microcode, power states or something like this.

My PC:
* AMD Ryzen 9 3900X
* Mainboard: ASUS ROG STRIX X570-E GAMING (BIOS 1409 05/12/2020)
* RAM: Corsair Vengeance RGB Pro 64GB DDR4 Kit 3200 (4x16GB)
* GFX: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]

Since the system worked stable for months I think it's not related to any hardware issue. I also didn't changed anything in that regard but I guess one of the next things I'll do is a RAM test.

So the first thing I did was updating the BIOS to the latest release. But that didn't changed anything. Freezes still happen. Meanwhile I'm not so sure if that was a good idea at all because if the problem is microcode related then the new BIOS would contain that broken code too I guess and downgrading the BIOS isn't something I really want to do... In general I didn't changed that much in the BIOS settings. No overclocking and stuff like that. Only activated SVT and SMT and changed a few fan settings.

So today I downgraded kernel/kernel-headers to "linux-5.6.4.arch1-1" (from "linux-5.7.2.arch1-1"), amd-ucode to "amd-ucode-20200316.8eb0b28-1-any" and linux-firmware to "linux-firmware-20200316.8eb0b28-1". "linux-5.6.4" because I'm pretty sure that this kernel worked perfectly fine. I'm not so sure about the later ones. All the packages/files were released in the first half of April. So let's see what happens.

If that doesn't help the next thing I'll try is the RAM test already mentioned above.

I tried after posting this topic to tweak C-State & P-State and the system crash more than before, so it's related I guess.

If the LTS Kernel solve my issue, I will open a bug report on the kernel bug tracker.

Offline

#4 2020-06-15 11:07:35

turmoni
Member
Registered: 2020-06-15
Posts: 3

Re: System randomly freeze

I've also had significant stability issues with 5.7 - I'm wondering what we have in common (or if they're separate issues that have come up).

Do either of you use ZFS? That was my first instinct, although my root isn't on ZFS so I'd expect log files to continue to be writeable.
My next one, based purely on having had many issues up until 5.6, was GPU - we do all seem to have AMD in common, but different architectures (I have a 5700 XT, and in its previous freezing behaviour I could still ssh into the box to look at dmesg).

My CPU is an i7 4770, so apart from basic architecture, nothing really in common there.

Offline

#5 2020-06-15 11:13:49

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

turmoni wrote:

I've also had significant stability issues with 5.7 - I'm wondering what we have in common (or if they're separate issues that have come up).

Do either of you use ZFS? That was my first instinct, although my root isn't on ZFS so I'd expect log files to continue to be writeable.
My next one, based purely on having had many issues up until 5.6, was GPU - we do all seem to have AMD in common, but different architectures (I have a 5700 XT, and in its previous freezing behaviour I could still ssh into the box to look at dmesg).

My CPU is an i7 4770, so apart from basic architecture, nothing really in common there.

Nope, I use XFS as filesystem.

Offline

#6 2020-06-15 11:30:26

archixxx
Member
Registered: 2012-10-17
Posts: 40

Re: System randomly freeze

turmoni wrote:

I've also had significant stability issues with 5.7 - I'm wondering what we have in common (or if they're separate issues that have come up).

Do either of you use ZFS? That was my first instinct, although my root isn't on ZFS so I'd expect log files to continue to be writeable.
My next one, based purely on having had many issues up until 5.6, was GPU - we do all seem to have AMD in common, but different architectures (I have a 5700 XT, and in its previous freezing behaviour I could still ssh into the box to look at dmesg).

My CPU is an i7 4770, so apart from basic architecture, nothing really in common there.

I use ext4 everywhere. Yesterday I did a large rsync copying thousands of files from a remote host. It seems that it triggered the freeze. Also the freeze seems to trigger more often when my backups are running during night or when "updatedb" updates its database which also happens at night. So one of my thought also included a corrupt filesystem or something like that. But I've never seen such a behavior with ext4 in the past in contrast to XFS. But even in these cases I got at least a error messages that told me something useful. But in my current case I don't get anything useful. So for me personally I'm not investigating in that direction. I also never had a crash during the last few month that may have left my filesystems in a corrupted state before the freezes began.

Offline

#7 2020-06-15 21:28:01

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

Kernel 5.7 seems to be the culprit. No crash for 3h, I wait 10 more hours and I create the issue.

Offline

#8 2020-06-15 21:38:38

archixxx
Member
Registered: 2012-10-17
Posts: 40

Re: System randomly freeze

I already had the freezes with later 5.6 kernels (maybe around 5.6.10, can't remember). That's why I upgraded to 5.7.2 but I had the impression that this made it even more worse. Kernel 5.6.4 with the downgraded firmware, ucode and so on as mentioned above runs for around 15h now and I did quite a lot of things. That gives me at least some hope that this combination runs more stable. But to be sure the system needs to run stable for at least another 48hrs.

Offline

#9 2020-06-15 21:46:51

archixxx
Member
Registered: 2012-10-17
Posts: 40

Re: System randomly freeze

Maybe one additional note: I normally tune power consumption by tweaking some "Tunables" with "powertop" and change the CPU scheduler with "corectrl". This worked perfectly well the last few months without any issues and saved a little bit of energy. But to not mess around with power savings I currently left everything untouched. So all these power save settings using the default values ATM.

Offline

#10 2020-06-15 22:44:26

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

This is may be linked to our issue: https://bugs.archlinux.org/task/66991

Offline

#11 2020-06-16 01:04:08

Zorbik
Member
Registered: 2016-08-09
Posts: 42

Re: System randomly freeze

Same issue here. Ryzen CPU with a Vega64, but it's definitely the gpu that's causing the crash.

Kernel: 5.7.2
AMDGPU: 19.1.0

Kernel panic from journalctl:

Jun 15 20:51:56 olympus kernel: general protection fault, probably for non-canonical address 0xb72b1951bbaf746f: 0000 [#1] PREEMPT SMP NOPTI
Jun 15 20:51:56 olympus kernel: CPU: 2 PID: 278 Comm: kworker/u32:11 Not tainted 5.7.2-arch1-1 #1
Jun 15 20:51:56 olympus kernel: Hardware name: Gigabyte Technology Co., Ltd. X470 AORUS GAMING 5 WIFI/X470 AORUS GAMING 5 WIFI-CF, BIOS F2 03/14/2018
Jun 15 20:51:56 olympus kernel: Workqueue: events_unbound commit_work [drm_kms_helper]
Jun 15 20:51:56 olympus kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 15 20:51:56 olympus kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39 e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 fc ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00 00 48 b9 00 00 00 00 01 00 00
Jun 15 20:51:56 olympus kernel: RSP: 0018:ffff9870c0827af8 EFLAGS: 00010286
Jun 15 20:51:56 olympus kernel: RAX: 0000000000000006 RBX: ffff8e3540608800 RCX: ffff8e3581e48000
Jun 15 20:51:56 olympus kernel: RDX: ffff8e357f6cde00 RSI: ffffffffc068b1a0 RDI: b72b1951bbaf746f
Jun 15 20:51:56 olympus kernel: RBP: ffff9870c0827e60 R08: 0000000000000001 R09: 0000000000000001
Jun 15 20:51:56 olympus kernel: R10: 0000000000000178 R11: 0000000000038fc5 R12: 0000000000000000
Jun 15 20:51:56 olympus kernel: R13: 0000000000000006 R14: ffff8e3540608c00 R15: ffff8e354bb4b000
Jun 15 20:51:56 olympus kernel: FS:  0000000000000000(0000) GS:ffff8e358ec80000(0000) knlGS:0000000000000000
Jun 15 20:51:56 olympus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 15 20:51:56 olympus kernel: CR2: 00007ff8cd42b000 CR3: 00000003c051c000 CR4: 00000000003406e0
Jun 15 20:51:56 olympus kernel: Call Trace:
Jun 15 20:51:56 olympus kernel:  commit_tail+0x94/0x130 [drm_kms_helper]
Jun 15 20:51:56 olympus kernel:  process_one_work+0x1da/0x3d0
Jun 15 20:51:56 olympus kernel:  worker_thread+0x4d/0x3e0
Jun 15 20:51:56 olympus kernel:  ? rescuer_thread+0x3f0/0x3f0
Jun 15 20:51:56 olympus kernel:  kthread+0x13e/0x160
Jun 15 20:51:56 olympus kernel:  ? __kthread_bind_mask+0x60/0x60
Jun 15 20:51:56 olympus kernel:  ret_from_fork+0x22/0x40
Jun 15 20:51:56 olympus kernel: Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc ccm iwlmvm mac80211 libarc4 nls_i>
Jun 15 20:51:56 olympus kernel:  blake2b_generic libcrc32c crc32c_generic xor raid6_pq crc32c_intel xhci_pci xhci_hcd amdgpu gpu_sched i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm agpgart
Jun 15 20:51:56 olympus kernel: ---[ end trace a6b05dfbb5cd9dfc ]---
Jun 15 20:51:56 olympus kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2aa/0x2310 [amdgpu]
Jun 15 20:51:56 olympus kernel: Code: 4f 08 8b 81 e0 02 00 00 41 83 c5 01 44 39 e8 0f 87 46 ff ff ff 48 83 bd f0 fc ff ff 00 0f 84 03 01 00 00 48 8b bd f0 fc ff ff <80> bf b0 01 00 00 01 0f 86 ac 00 00 00 48 b9 00 00 00 00 01 00 00
Jun 15 20:51:56 olympus kernel: RSP: 0018:ffff9870c0827af8 EFLAGS: 00010286
Jun 15 20:51:56 olympus kernel: RAX: 0000000000000006 RBX: ffff8e3540608800 RCX: ffff8e3581e48000
Jun 15 20:51:56 olympus kernel: RDX: ffff8e357f6cde00 RSI: ffffffffc068b1a0 RDI: b72b1951bbaf746f
Jun 15 20:51:56 olympus kernel: RBP: ffff9870c0827e60 R08: 0000000000000001 R09: 0000000000000001
Jun 15 20:51:56 olympus kernel: R10: 0000000000000178 R11: 0000000000038fc5 R12: 0000000000000000
Jun 15 20:51:56 olympus kernel: R13: 0000000000000006 R14: ffff8e3540608c00 R15: ffff8e354bb4b000
Jun 15 20:51:56 olympus kernel: FS:  0000000000000000(0000) GS:ffff8e358ec80000(0000) knlGS:0000000000000000
Jun 15 20:51:56 olympus kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 15 20:51:56 olympus kernel: CR2: 00007ff8cd42b000 CR3: 00000003c051c000 CR4: 00000000003406e0

No logs for an entire minute before the crash. I'd be happy to provide more info if it'd be useful. Anyone have luck with downgrading their kernel?

Last edited by Zorbik (2020-06-16 01:08:34)

Offline

#12 2020-06-16 07:35:42

archixxx
Member
Registered: 2012-10-17
Posts: 40

Re: System randomly freeze

Zorbik wrote:

Anyone have luck with downgrading their kernel?

My PC is now running more than 24hrs without issues with kernel 5.6.4 and other downgraded packages mentioned above. That looks at least promising. If it runs another 24hrs then I would say it's likely that there is a problem/bug with one of the newer version of that packages. But as I don't get any useful (error) messages it could be everything...

Offline

#13 2020-06-16 07:51:40

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

No issue on 5.6.4, I opened a ticket on kernel's bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208205

Offline

#14 2020-06-16 12:21:45

turmoni
Member
Registered: 2020-06-15
Posts: 3

Re: System randomly freeze

automne wrote:

No issue on 5.6.4, I opened a ticket on kernel's bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208205

I think if it's definitely an amdgpu issue (I don't think I have a way to tell for myself, but it seems plausible), then https://gitlab.freedesktop.org/drm/amd/-/issues might be the best place to put a bug.

Offline

#15 2020-06-16 12:50:13

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

turmoni wrote:
automne wrote:

No issue on 5.6.4, I opened a ticket on kernel's bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=208205

I think if it's definitely an amdgpu issue (I don't think I have a way to tell for myself, but it seems plausible), then https://gitlab.freedesktop.org/drm/amd/-/issues might be the best place to put a bug.

Thanks, I also opened a ticket on this platform: https://gitlab.freedesktop.org/drm/amd/-/issues/1172

Offline

#16 2020-06-16 12:54:22

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Offline

#17 2020-06-16 18:29:03

archixxx
Member
Registered: 2012-10-17
Posts: 40

Re: System randomly freeze

If it's a amdgpu thingy then good luck! The possibility that someone from AMD will care about that at https://gitlab.freedesktop.org is basically zero. There are only users helping users but that's it (which sometimes really helps of course). As kernel 5.6.4 runs without issues for me I've added

IgnorePkg   = amd-ucode
IgnorePkg   = linux
IgnorePkg   = linux-firmware
IgnorePkg   = linux-headers

to /etc/pacman.conf for now. As I don't see any error messages I can't contribute anything anyways. I'll stay with that settings until kernel 5.8 is stable and try that one and do the usual "sit and wait" thingy if it comes to amdgpu bugs...

Offline

#18 2020-06-16 20:55:55

Ropid
Member
Registered: 2015-03-09
Posts: 1,069

Re: System randomly freeze

If you can find the commit that started the issue, then AMD on gitlab.freedesktop.org will help you immediately.

For example, when I tried kernel 5.7-rc1, something broke for me because of the amdgpu kernel module. I then decided to find out how bisecting the kernel works and found the commit between 5.6 and 5.7 that introduced the problem. I then made a bug report that mentioned that commit. I got an answer from someone from AMD within two hours. Later on the same day he already shared a patch that fixed the problem for me.

That said, your problem with a random freeze sounds really annoying to bisect. Is there a way to force the crash fast, without having to wait days for it to happen randomly? For the problem I had with amdgpu, I could test it immediately after boot, I didn't have to wait for my bug to happen.

Offline

#19 2020-06-16 22:08:44

archixxx
Member
Registered: 2012-10-17
Posts: 40

Re: System randomly freeze

TBH I don't want to waste my time debugging amdgpu things anymore. In my experience AMD isn't really interested in bug reports. The list of open issues is basically endless at gitlab.freedesktop.org. Not all users are able to find exactly that commit that causes the problem. And if you can't provide that commit you're on your own.

If this problem here is a general issue it may get fixed sooner or later (by accident maybe). Who knows ;-) The freeze "only" happens once or twice a day. It looks like it happens more often when lots of I/O (HDD and/or network) is going on. So the first question would be how this relates to amdgpu. Maybe it's something completely different. Without error message it's hard to tell. I'll stay with 5.6.4 for now. It works perfectly. If the problem still exists in 5.10 I may invest the time and try this bisect thingy ;-)

Offline

#20 2020-06-17 00:39:39

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

I just compiled the latest beta rc of the kernel to see if the bug was solved my AMD™
SGMl8eC.png

Offline

#21 2020-06-17 07:30:09

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

5.8-rc1 seems to solve the issue
xhB0uq1.png

EDIT: Nevermind, server hangged. fml

Last edited by automne (2020-06-17 09:16:39)

Offline

#22 2020-06-17 20:41:55

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: System randomly freeze

@automne can you update https://gitlab.freedesktop.org/drm/amd/-/issues/1172 with the requested information.
You could also add that it is the same report as https://bugzilla.kernel.org/show_bug.cgi?id=208205 and that 5.8-rc1 also has the issue.

Please do not post images of test.

Offline

#23 2020-06-18 17:39:16

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

No reboot since I enabled HPET  on 5.8 : https://bbs.archlinux.org/viewtopic.php … 2#p1600722

Offline

#24 2020-06-18 18:14:35

loqs
Member
Registered: 2014-03-06
Posts: 17,192

Re: System randomly freeze

What about 5.7.3 / 5.7.4 with clocksource=hpet or tsc=unstable?

Offline

#25 2020-06-18 20:05:15

automne
Member
From: /dev/md/kumiko
Registered: 2020-06-14
Posts: 19
Website

Re: System randomly freeze

loqs wrote:

What about 5.7.3 / 5.7.4 with clocksource=hpet or tsc=unstable?

I'll give a try and tell you if it crash or not

Offline

Board footer

Powered by FluxBB