You are not logged in.
I know, yes, ok and figured as much
Offline
So I'm going to test the kernel parameter as described here:
cat /etc/cmdline.d/amdgpu.conf
#reset_method:GPU reset method (-1 = auto (default), 0 = legacy, 1 = mode0, 2 = mode1, 3 = mode2, 4 = baco/bamaco) (int)
amdgpu.reset_method=1
Offline
This is most likely not going to work - what#s your bootloader?
Offline
I use systemd-boot and Unified kernel image
System:
Firmware: UEFI 2.90 (INSYDE Corp. 0.771)
Firmware Arch: x64
Secure Boot: enabled (user)
TPM2 Support: yes
Measured UKI: yes
Boot into FW: supported
Current Boot Loader:
Product: systemd-boot 256.6-1-arch
Features: ✓ Boot counting
✓ Menu timeout control
✓ One-shot menu timeout control
✓ Default entry control
✓ One-shot entry control
✓ Support for XBOOTLDR partition
✓ Support for passing random seed to OS
✓ Load drop-in drivers
✓ Support Type #1 sort-key field
✓ Support @saved pseudo-entry
✓ Support Type #1 devicetree field
✓ Enroll SecureBoot keys
✓ Retain SHIM protocols
✓ Menu can be disabled
✓ Boot loader sets ESP information
Stub: systemd-stub 256.7-1-arch
Features: ✓ Stub sets ESP information
✓ Picks up credentials from boot partition
✓ Picks up system extension images from boot partition
✓ Picks up configuration extension images from boot partition
✓ Measures kernel+command line+sysexts
✓ Support for passing random seed to OS
✓ Pick up .cmdline from addons
✓ Pick up .cmdline from SMBIOS Type 11
✓ Pick up .dtb from addons
ESP: /dev/disk/by-partuuid/605c70c9-5343-42ad-b907-ea8413db40c5
File: └─/EFI/systemd/systemd-bootx64.efi
Offline
Ok, then it's indeed going to work
(UKIs aren't just all that common and I worried you googled that up somewhere)
Offline
The wiki is extremely well sorted, I don't even need google
I wanted to test UKI, because the concept sounded very interesting.
I wish I knew how to reproduce the problem so that I could test it better.
Offline
So far it hasn't happened again, so option 1 seems to help. Or I've just been lucky/unlucky so far.
Offline
Are there already any (now succesful) resets logged in the journal?
Offline
So the last reset was on 8.
sudo journalctl | grep "amdgpu: GPU"
Sep 12 19:54:08 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset begin!
Sep 12 19:54:11 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 12 19:54:11 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset(2) succeeded!
Sep 15 17:14:56 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset begin!
Sep 15 17:14:59 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 15 17:14:59 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset(2) succeeded!
Sep 20 14:27:39 FrameWork kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Sep 20 14:27:41 FrameWork kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
Sep 20 14:27:41 FrameWork kernel: amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
Sep 20 14:27:42 FrameWork kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 20 14:27:42 FrameWork kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(2) succeeded!
Sep 22 22:15:20 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset begin!
Sep 22 22:15:22 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 22 22:15:23 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset(2) succeeded!
Oct 11 14:03:54 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset begin!
Oct 11 14:03:54 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset succeeded, trying to resume
Oct 11 14:03:55 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset(1) succeeded!
Nov 08 14:43:05 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset begin!
Nov 08 14:43:08 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset succeeded, trying to resume
Nov 08 14:43:08 FrameWork kernel: amdgpu 0000:c5:00.0: amdgpu: GPU reset(2) succeeded!
Offline
The frequency in general seems to have gone down drastically?
Temperature? Humidity (when did you turn on the radiators)?
Offline
Actually nothing has changed except the parameter, neither the temperature nor the humidity is different; at least according to my thermometer
Offline
Today it happened again, shortly after I unplugged the monitor (usbc), the computer was in standby, it showed me the login screen. The clock showed 8:36, but I only see 8:38
Offline
Nov 18 08:38:24 FrameWork gnome-shell[3818]: Could not release device '/dev/input/event4' (13,68): GDBus.Error:org.freedesktop.login1.DeviceNotTaken: Device not taken
Nov 18 08:38:24 FrameWork gnome-shell[3818]: Could not release device '/dev/input/event25' (13,89): GDBus.Error:org.freedesktop.login1.DeviceNotTaken: Device not taken
Nov 18 08:38:24 FrameWork gnome-shell[3818]: Could not release device '/dev/input/event26' (13,90): GDBus.Error:org.freedesktop.login1.DeviceNotTaken: Device not taken
Nov 18 08:38:24 FrameWork gnome-shell[3818]: Could not release device '/dev/input/event3' (13,67): GDBus.Error:org.freedesktop.login1.DeviceNotTaken: Device not taken
…
Nov 18 08:38:26 FrameWork systemd-coredump[201929]: Process 3818 (gnome-shell) of user 1000 dumped core.
Stack trace of thread 3818:
#0 0x000078d48891e847 g_hash_table_iter_next (libglib-2.0.so.0 + 0x3d847)
#1 0x000078d4883ca0ff n/a (libmutter-15.so.0 + 0x1ca0ff)
#2 0x000078d4883b7183 n/a (libmutter-15.so.0 + 0x1b7183)
#3 0x000078d4883eeb6a n/a (libmutter-15.so.0 + 0x1eeb6a)
#4 0x000078d4883ef2d9 meta_thread_run_impl_task_sync (libmutter-15.so.0 + 0x1ef2d9)
#5 0x000078d4883bcb72 n/a (libmutter-15.so.0 + 0x1bcb72)
#6 0x000078d488286317 n/a (libmutter-15.so.0 + 0x86317)
#7 0x000078d488ee382a g_closure_invoke (libgobject-2.0.so.0 + 0x1182a)
#8 0x000078d488f14565 n/a (libgobject-2.0.so.0 + 0x42565)
#9 0x000078d488f04ca9 n/a (libgobject-2.0.so.0 + 0x32ca9)
#10 0x000078d488f04f32 g_signal_emit_valist (libgobject-2.0.so.0 + 0x32f32)
#11 0x000078d488f04ff4 g_signal_emit (libgobject-2.0.so.0 + 0x32ff4)
#12 0x000078d488eefd16 n/a (libgobject-2.0.so.0 + 0x1dd16)
#13 0x000078d488ee4010 n/a (libgobject-2.0.so.0 + 0x12010)
#14 0x000078d488ef8016 g_object_setv (libgobject-2.0.so.0 + 0x26016)
#15 0x000078d488ef8242 g_object_set_property (libgobject-2.0.so.0 + 0x26242)
#16 0x000078d48824dec4 n/a (libmutter-15.so.0 + 0x4dec4)
#17 0x000078d488b3889f n/a (libgio-2.0.so.0 + 0x10789f)
#18 0x000078d48893e559 n/a (libglib-2.0.so.0 + 0x5d559)
#19 0x000078d4889a1157 n/a (libglib-2.0.so.0 + 0xc0157)
#20 0x000078d48893f287 g_main_loop_run (libglib-2.0.so.0 + 0x5e287)
#21 0x000078d4882d11fa meta_context_run_main_loop (libmutter-15.so.0 + 0xd11fa)
#22 0x000078d487785596 n/a (libffi.so.8 + 0x7596)
#23 0x000078d48778200e n/a (libffi.so.8 + 0x400e)
#24 0x000078d487784bd3 ffi_call (libffi.so.8 + 0x6bd3)
#25 0x000078d48878f851 n/a (libgjs.so.0 + 0x4e851)
#26 0x000078d488790c3f n/a (libgjs.so.0 + 0x4fc3f)
#27 0x000078d486960725 n/a (libmozjs-128.so + 0x1560725)
#28 0x000078d4869f6157 n/a (libmozjs-128.so + 0x15f6157)
#29 0x000078d486a662b2 _ZN2JS4CallEP9JSContextNS_6HandleINS_5ValueEEES4_RKNS_16HandleValueArrayENS_13MutableHandleIS3_EE (libmozjs-128.so + 0x16662b2)
#30 0x000078d4887c43a4 n/a (libgjs.so.0 + 0x833a4)
#31 0x000078d4887cc6cf gjs_context_eval_module (libgjs.so.0 + 0x8b6cf)
#32 0x000078d4887cc911 gjs_context_eval_module_file (libgjs.so.0 + 0x8b911)
#33 0x000061e76f627575 n/a (gnome-shell + 0x2575)
#34 0x000078d488034e08 n/a (libc.so.6 + 0x25e08)
#35 0x000078d488034ecc __libc_start_main (libc.so.6 + 0x25ecc)
#36 0x000061e76f6279e5 n/a (gnome-shell + 0x29e5)
gnome segfaults.
That also leads to a stall in logind
Nov 18 08:38:30 FrameWork kernel: amdgpu 0000:c5:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
Nov 18 08:38:30 FrameWork kernel: ------------[ cut here ]------------
Nov 18 08:38:30 FrameWork kernel: WARNING: CPU: 13 PID: 2952 at drivers/gpu/drm/amd/amdgpu/../display/dc/hubbub/dcn31/dcn31_hubbub.c:151 dcn31_program_compbuf_size+0xd1/0x230 [amdgpu]
Nov 18 08:38:30 FrameWork kernel: Modules linked in: hid_logitech_hidpp tun hid_magicmouse ccm rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi uhid cmac algif_hash algif_skcipher af_alg bnep typec_displayport ext4 mbcache vfat jbd2 fat snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci hid_sensor_als snd_sof_xtensa_dsp hid_sensor_trigger industrialio_triggered_buffer kfifo_buf snd_sof mt7921e hid_sensor_iio_common leds_cros_ec snd_sof_utils cros_usbpd_charger industrialio gpio_cros_ec led_class_multicolor cros_ec_chardev cros_ec_sysfs cros_charge_control cros_ec_debugfs mt7921_common cros_ec_hwmon cros_usbpd_logger cros_usbpd_notify snd_pci_ps mt792x_lib snd_amd_sdw_acpi mousedev soundwire_amd mt76_connac_lib soundwire_generic_allocation amd_atl intel_rapl_msr mt76 soundwire_bus intel_rapl_common cros_ec_dev snd_soc_core snd_hda_intel snd_usb_audio snd_compress snd_intel_dspcfg mac80211 ac97_bus
Nov 18 08:38:30 FrameWork kernel: snd_intel_sdw_acpi snd_pcm_dmaengine snd_usbmidi_lib spd5118 snd_ump snd_rpl_pci_acp6x snd_hda_codec snd_acp_pci snd_rawmidi btusb libarc4 snd_acp_legacy_common snd_hda_core snd_seq_device cros_ec_lpcs hid_sensor_hub hid_multitouch btrtl cros_ec snd_pci_acp6x mc snd_hwdep btintel sp5100_tco snd_pci_acp5x snd_rn_pci_acp3x snd_pcm btbcm ucsi_acpi cfg80211 snd_acp_config typec_ucsi i2c_piix4 snd_timer btmtk amd_pmf kvm_amd snd_soc_acpi kvm bluetooth rapl wmi_bmof pcspkr thunderbolt typec amdtee k10temp snd i2c_smbus snd_pci_acp3x soundcore rfkill joydev roles i2c_hid_acpi amd_sfh amd_pmc platform_profile i2c_hid serio mac_hid pkcs8_key_parser sg crypto_user loop nfnetlink zram ip_tables x_tables dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod uas usb_storage hid_generic usbhid amdgpu crc16 crct10dif_pclmul amdxcp crc32_pclmul i2c_algo_bit polyval_clmulni drm_ttm_helper polyval_generic ttm ghash_clmulni_intel drm_exec sha512_ssse3 gpu_sched sha256_ssse3 sha1_ssse3 drm_suballoc_helper aesni_intel
Nov 18 08:38:30 FrameWork kernel: drm_buddy nvme gf128mul drm_display_helper crypto_simd nvme_core cryptd xhci_pci ccp video cec nvme_auth xhci_pci_renesas wmi btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq
Nov 18 08:38:30 FrameWork kernel: CPU: 13 UID: 0 PID: 2952 Comm: systemd-logind Tainted: G W 6.11.8-arch1-2 #1 1400000003000000474e550014adde1903f711f0
Nov 18 08:38:30 FrameWork kernel: Tainted: [W]=WARN
but the main problem seems gnome…
https://wiki.archlinux.org/title/Debuginfod and https://wiki.archlinux.org/title/Core_d … _core_dump - see whether you can get a better backtrace of that crash (w/ symbols)
Offline