You are not logged in.

#1 2024-11-02 03:25:17

safe049
Member
From: Shanxi,China
Registered: 2024-05-02
Posts: 77
Website

ROCm Stable Diffusion cause screen glitch and crash amdgpu

I recently run stable diffusion on rocm
but when the sd webui startup for a few seconds,the screen freeze,then the screen glitch,unable to do anything with keyboard,gpu stat is normal[checked by ssh]
i can only connect to ssh to reboot it by sudo reboot -f
fastfetch:

                  -`                      DISTRO ?  Arch Linux x86_64
                 .o+`                    │ ├ ?  Linux 6.11.3-273-tkg-bore
                `ooo/                    │ ├? ?  2153 (pacman), 8 (flatpak)
               `+oooo:                   │ └ ?  zsh 5.9
              `+oooooo:                   DE/WM ?  bspwm (X11)
              -+oooooo+:                 │ ├? ?  Colloid-dark [GTK3/4]
            `/:-:++oooo+:                │ ├ ?  WhiteSur (24px)
           `/++++/+++++++:               │ ├ ?  alacritty (11pt)
          `/++++++++++++++:              │ └ ?  alacritty 0.14.0
         `/+++ooooooooooooo/`            │ ├? ?  Intel(R) Core(TM) i5-6600K (4) @ 3.90 GHz
        ./ooosssso++osssssso+`           │ ├? ?  AMD Radeon RX Vega
       .oossssso-````/ossssss+`          │ ├? ?  HD Graphics 530
      -osssssso.      :ssssssso.         │ ├? ?  1920x1080 @ 60Hz
     :osssssss/        osssso+++.        │ ├? ?  2.38 GiB / 15.49 GiB (15%)
    /ossssssss/        +ssssooo/-        │ ├? ?  0 B / 16.00 GiB (0%)
  `/ossssso+/:-        -:/+osssso+-      │ ├? ?  5 mins
 `+sso+:-`                 `.-/+oso:     │ └? ?  1920x1080 @ 60Hz
`++:.                           `-/+/     AUDIO ?  Built-in Audio Analog Stereo
.`                                 `/                                            

log:
0x0.st/XGSl.txt

error part:

11月 02 11:17:49 archyyds kernel: amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0, error code: 0xff
11月 02 11:17:49 archyyds kernel: amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0, error code: 0xff
11月 02 11:17:49 archyyds kernel: amdgpu: [powerplay] Failed message: 0x24, input parameter: 0x0, error code: 0xff
11月 02 11:17:49 archyyds kernel: [drm] kiq ring mec 2 pipe 1 q 0
11月 02 11:17:49 archyyds kernel: [drm] UVD and UVD ENC initialized successfully.
11月 02 11:17:49 archyyds kernel: [drm] VCE initialized successfully.
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring page0 uses VM inv eng 1 on hub 8
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 4 on hub 8
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring page1 uses VM inv eng 5 on hub 8
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring uvd_0 uses VM inv eng 6 on hub 8
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring uvd_enc_0.0 uses VM inv eng 7 on hub 8
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring uvd_enc_0.1 uses VM inv eng 8 on hub 8
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring vce0 uses VM inv eng 9 on hub 8
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring vce1 uses VM inv eng 10 on hub 8
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: ring vce2 uses VM inv eng 11 on hub 8
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow start
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow done
11月 02 11:17:49 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1) succeeded!
11月 02 11:17:49 archyyds systemd-coredump[3174]: Process 3017 (pt_main_thread) of user 1000 terminated abnormally with signal 6/ABRT, processing...
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: amdgpu: The CS has cancelled because the context is lost. This context is innocent.
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE)
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) Backtrace:
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) unw_get_proc_name failed: no unwind info found [-10]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 0: /usr/lib/Xorg (?+0x0) [0x642eb850c89c]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) unw_get_proc_name failed: no unwind info found [-10]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 1: /usr/lib/libc.so.6 (?+0x0) [0x7c4a256121d0]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) unw_get_proc_name failed: no unwind info found [-10]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 2: /usr/lib/libc.so.6 (?+0x0) [0x7c4a2566b3f4]
11月 02 11:17:49 archyyds systemd[1]: Created slice Slice /system/drkonqi-coredump-processor.
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 3: /usr/lib/libc.so.6 (gsignal+0x20) [0x7c4a25612120]
11月 02 11:17:49 archyyds systemd[1]: Created slice Slice /system/systemd-coredump.
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 4: /usr/lib/libc.so.6 (abort+0xdf) [0x7c4a255f94c3]
11月 02 11:17:49 archyyds systemd[1]: Started Process Core Dump (PID 3174/UID 0).
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) unw_get_proc_name failed: no unwind info found [-10]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 5: /usr/lib/libgallium-24.2.6-arch1.1.so (?+0x0) [0x7c4a22db0643]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) unw_get_proc_name failed: no unwind info found [-10]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 6: /usr/lib/libgallium-24.2.6-arch1.1.so (?+0x0) [0x7c4a22db3b33]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) unw_get_proc_name failed: no unwind info found [-10]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 7: /usr/lib/libgallium-24.2.6-arch1.1.so (?+0x0) [0x7c4a224ab8d4]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) unw_get_proc_name failed: no unwind info found [-10]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 8: /usr/lib/libgallium-24.2.6-arch1.1.so (?+0x0) [0x7c4a224cf1dd]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) unw_get_proc_name failed: no unwind info found [-10]
11月 02 11:17:49 archyyds systemd[1]: Started Pass systemd-coredump journal entries to relevant user for potential DrKonqi handling.
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 9: /usr/lib/libc.so.6 (?+0x0) [0x7c4a2566939d]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) unw_get_proc_name failed: no unwind info found [-10]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) 10: /usr/lib/libc.so.6 (?+0x0) [0x7c4a256ee49c]
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE)
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE)
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: Fatal server error:
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) Caught signal 6 (Aborted). Server aborting
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE)
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE)
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: Please consult the The X.Org Foundation support
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]:          at http://wiki.x.org
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]:  for help.
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE) Please also check the log file at "/home/dynamo/.local/share/xorg/Xorg.0.log" for additional information.
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (EE)
11月 02 11:17:49 archyyds /usr/lib/gdm-x-session[1975]: (II) AIGLX: Suspending AIGLX clients for VT switch
11月 02 11:17:51 archyyds dnsmasq[650]: reading /etc/resolv.conf
11月 02 11:17:51 archyyds dnsmasq[650]: using nameserver 127.2.0.17#53
11月 02 11:17:51 archyyds dnsmasq[650]: using nameserver 119.29.29.29#53

lspci -k:

(base)   dynamo   ~ git-[ master]-  lspci -k
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
	Subsystem: ASRock Incorporation Device 191f
	Kernel driver in use: skl_uncore
	Kernel modules: ie31200_edac
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07)
	Subsystem: ASRock Incorporation Device 1901
	Kernel driver in use: pcieport
00:02.0 Display controller: Intel Corporation HD Graphics 530 (rev 06)
	Subsystem: ASRock Incorporation Device 1912
	Kernel driver in use: i915
	Kernel modules: i915
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
	Subsystem: ASRock Incorporation Device a12f
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem (rev 31)
	Subsystem: ASRock Incorporation Device a131
	Kernel driver in use: intel_pch_thermal
	Kernel modules: intel_pch_thermal
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
	Subsystem: ASRock Incorporation Device a13a
	Kernel driver in use: mei_me
	Kernel modules: mei_me
00:17.0 SATA controller: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] (rev 31)
	Subsystem: ASRock Incorporation Device a102
	Kernel driver in use: ahci
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 (rev f1)
	Subsystem: ASRock Incorporation Device a114
	Kernel driver in use: pcieport
00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 (rev f1)
	Subsystem: ASRock Incorporation Device a118
	Kernel driver in use: pcieport
00:1d.3 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #12 (rev f1)
	Subsystem: ASRock Incorporation Device a11b
	Kernel driver in use: pcieport
00:1f.0 ISA bridge: Intel Corporation B150 Chipset LPC/eSPI Controller (rev 31)
	Subsystem: ASRock Incorporation Device a148
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
	Subsystem: ASRock Incorporation Device a121
00:1f.3 Audio device: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller (rev 31)
	Subsystem: ASRock Incorporation Device 1151
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel, snd_soc_avs
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
	Subsystem: ASRock Incorporation Device a123
	Kernel driver in use: i801_smbus
	Kernel modules: i2c_i801
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge (rev c3)
	Kernel driver in use: pcieport
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
	Subsystem: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
	Kernel driver in use: pcieport
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3)
	Subsystem: ASUSTeK Computer Inc. Device 0555
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
05:00.0 Ethernet controller: Qualcomm Atheros Killer E2400 Gigabit Ethernet Controller (rev 10)
	Subsystem: ASRock Incorporation Device e0a1
	Kernel driver in use: alx
	Kernel modules: alx
06:00.0 Network controller: Intel Corporation Wireless 7260 (rev bb)
	Subsystem: Intel Corporation Dual Band Wireless-AC 7260 [Wilkins Peak 2]
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi

Last edited by safe049 (2024-11-02 03:27:07)


std::cout << "I use Arch BTW" << endl;

Offline

Board footer

Powered by FluxBB