You are not logged in.

#1 2021-02-22 11:13:40

Peterix
Member
Registered: 2008-05-05
Posts: 30

[SOLVED] Powerdevil sends junk to AMDGPU i2c devices and causes issues

Hello,

I'm getting really strange unexplained hexdumps in `dmesg` output:

[ 6757.565842] data: 89 54 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6757.565843] data: 00 00 00 00 00 00 00 00
[ 6757.586841] data: 89 6a 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6757.586842] data: 00 00 00 00 00 00 00 00
[ 6757.604071] data: 89 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6757.604072] data: 00 00 00 00 00 00 00 00
[ 6757.625087] data: 89 96 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6757.625087] data: 00 00 00 00 00 00 00 00
[ 6757.646092] data: 89 ac 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6757.646092] data: 00 00 00 00 00 00 00 00
[ 6757.667074] data: 89 c2 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6757.667074] data: 00 00 00 00 00 00 00 00
[ 6757.683070] data: 89 d8 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6757.683070] data: 00 00 00 00 00 00 00 00
[ 6757.699087] data: 89 ee 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6757.699088] data: 00 00 00 00 00 00 00 00

When decoded, this sometimes contains other random looking data - what looks like corrupted URLs and strings from Slack.

https://°6slack.com/beacon/timin°Lg?user_id=UA1KD0XC3&te°bam_id=T04NRHYNL&ver=16
°x13783992&session_age=3°Ž4&session_id=78cb099c-°¤d9da-4fd7-bc0d-0f82725°ºb45c5&
sub_app_name=cli°Ðent&data=initializing_°æregular_socket_v1%7Cco°üunt%3A1%3Bsm_f
low_conn±ect_prov_10%7Ccount%3A±(1%3Bsonic_boot%7Ccount±>%3A1%3Bsonic_boot_desk
±Ttop_preload%7Ctiming%3±jA180%3Bsonic_boot_wind±€ow__responseStart_warm±–%7Ctim
ing%3A5%3Bsonic_±¬boot_window__responseE±Ând_warm%7Ctiming%3A6%3±ØBsonic_boot_wi
ndow__do±îmLoading_warm%7Ctiming²%3A7%3Bsonic_boot_phas²e_1_warm%7Ctiming%3A10
²08%3Bsonic_boot_phase_1²F_5_warm%7Ctiming%3A127²\%3Bsonic_boot_phase_2_²r2_warm
%7Ctiming%3A679%²ˆ3Bsonic_boot_phase_2_2²ž_1_warm%7Ctiming%3A599²´%3Bsonic_boot_
phase_2_²Ê2_1_1_warm%7Ctiming%3A²à599%3Bsonic_boot_phase²ö_2_2_1_2_warm%7Ctiming
³%3A37%3Bsonic_boot_pha³"se_2_2_1_3_warm%7Ctimi³8ng%3A13%3Bsonic_boot_p³Nhase_2_
2_2_warm%7Ctimi³dng%3A63%3Bsonic_boot_p³zhase_2_2_3_warm%7Ctimi³ng%3A17%3Bsonic
_boot_p³¦hase_2_warm%7Ctiming%3³¼A680%3Bchannel_sidebar³Ò_force_collapsed%7Ccou³
ènt%3A1%3Bttfmp_sonic_w³þarm%7Ctiming%3A980%3Bc´lient_boot_mount_succe´*ss%7Cco
unt%3A1%3Bprq_t´@ime_to_visible%7Ctimin´Vg%3A1241.945%3Bprq_tim´le_to_visible_wa
rm%7Cti´‚ming%3A1241.945%3Bclie´˜nt_theme_light%7Ccount´®%3A1%3Bclient_system_t´
Äheme_sync_off%7Ccount%´Ú3A1%3Bsonic_boot_phase´ð_3_warm%7Ctiming%3A315µ%3Bsm_f
low_connected_pµrov_connected%7Ccount%µ23A1%3Bsm_flow_finalizeµH_prov_10%7Ccoun
t%3A1%3µ^Bsm_flow_primary_conneµtcted_10%7Ccount%3A1%3BµŠsm_flow_fina•l[ˆ5vclv_µ
ok_10%7Ccount%3A1%3Ba1µ¶1y_client_stats_v1_a11µÌy_animation_os_settingµâ_vs_sla
ck_pref_track_vµøalue%7Ctiming%3A2%3Bme¶mbership-update%7Ctimi

This is something that has started happening recently.

The notable changes in my system:

  • Switched to a AMD Zen 3 processor

  • Switched from NVidia to AMD graphics (AMDGPU)

  • Changed from using legacy boot + MBR to GPT and UEFI

Because of the large amount of changes and the unclear nature of those kernel logs, I'm not sure where this is from.

Any ideas?

Last edited by Peterix (2021-02-23 14:21:05)

Offline

#2 2021-02-22 11:28:03

frostschutz
Member
Registered: 2013-11-15
Posts: 1,409

Re: [SOLVED] Powerdevil sends junk to AMDGPU i2c devices and causes issues

any drm.debug stuff in your kernel parameters?

there are various data: printers in the kernel but most of them only active with debug flags

linux-5.11 $ grep -r '"data:' .
./tools/testing/selftests/rtc/rtctest.c:156:	TH_LOG("data: %lx", data);
./tools/testing/selftests/rtc/rtctest.c:258:	TH_LOG("data: %lx", data);
./drivers/usb/core/devio.c:506:		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_NONE, 32, 1,
./drivers/usb/core/devio.c:520:		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_NONE, 32, 1,
./drivers/usb/core/devio.c:527:		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_NONE, 32, 1,
./drivers/usb/c67x00/c67x00-sched.c:153:	dev_dbg(dev, "data: %*ph\n", td_length(td), td->data);
./drivers/scsi/lpfc/lpfc_els.c:2254:				 "data: x%x\n",
./drivers/s390/net/ctcm_mpc.c:281:	ctcm_pr_debug("data: ");
./drivers/platform/x86/samsung-laptop.c:408:				"data:{0x%08x, 0x%08x, 0x%04x, 0x%02x}",
./drivers/net/wireless/marvell/mwifiex/wmm.c:191:		mwifiex_dbg(adapter, DATA, "data: ralist %p: is_11n_enabled=%d\n",
./drivers/net/wireless/marvell/mwifiex/wmm.c:849:		mwifiex_dbg(adapter, DATA, "data: drop packet in disconnect\n");
./drivers/net/wireless/marvell/mwifiex/wmm.c:1090:	mwifiex_dbg(priv->adapter, DATA, "data: WMM: Pkt Delay: %d ms,\t"
./drivers/net/wireless/marvell/mwifiex/wmm.c:1256:		mwifiex_dbg(adapter, DATA, "data: nothing to send\n");
./drivers/net/wireless/marvell/mwifiex/wmm.c:1264:		    "data: dequeuing the packet %p %p\n", ptr, skb);
./drivers/net/wireless/marvell/mwifiex/wmm.c:1376:		mwifiex_dbg(adapter, ERROR, "data: -EBUSY is returned\n");
./drivers/net/wireless/marvell/mwifiex/wmm.c:1431:	mwifiex_dbg(adapter, DATA, "data: tid=%d\n", tid);
./drivers/net/wireless/marvell/mwifiex/uap_txrx.c:162:			    "data: Tx: insufficient skb headroom %d\n",
./drivers/net/wireless/marvell/mwifiex/txrx.c:54:			    "data: priv not found. Drop RX packet\n");
./drivers/net/wireless/marvell/mwifiex/txrx.c:132:		mwifiex_dbg(adapter, DATA, "data: -ENOSR is returned\n");
./drivers/net/wireless/marvell/mwifiex/txrx.c:141:		mwifiex_dbg(adapter, ERROR, "data: -EBUSY is returned\n");
./drivers/net/wireless/marvell/mwifiex/txrx.c:177:			    "data: priv not found. Drop TX packet\n");
./drivers/net/wireless/marvell/mwifiex/txrx.c:196:		mwifiex_dbg(adapter, ERROR, "data: -ENOSR is returned\n");
./drivers/net/wireless/marvell/mwifiex/txrx.c:211:		mwifiex_dbg(adapter, ERROR, "data: -EBUSY is returned\n");
./drivers/net/wireless/marvell/mwifiex/sta_tx.c:207:			    "data: %s: host_to_card succeeded\n",
./drivers/net/wireless/marvell/mwifiex/sdio.c:1129:		    "data: mp_rd_bitmap=0x%08x\n", rd_bitmap);
./drivers/net/wireless/marvell/mwifiex/sdio.c:1144:			    "data: port=%d mp_rd_bitmap=0x%08x\n",
./drivers/net/wireless/marvell/mwifiex/sdio.c:1160:		    "data: port=%d mp_rd_bitmap=0x%08x -> 0x%08x\n",
./drivers/net/wireless/marvell/mwifiex/sdio.c:1180:		    "data: mp_wr_bitmap=0x%08x\n", wr_bitmap);
./drivers/net/wireless/marvell/mwifiex/sdio.c:1206:		    "data: port=%d mp_wr_bitmap=0x%08x -> 0x%08x\n",
./drivers/net/wireless/marvell/mwifiex/sdio.c:2211:			    "data: %s: precopy current buffer\n",
./drivers/net/wireless/marvell/mwifiex/sdio.c:2223:			    "data: %s: send aggr buffer: %d %d\n",
./drivers/net/wireless/marvell/mwifiex/sdio.c:2267:			    "data: %s: send current buffer %d\n",
./drivers/net/wireless/marvell/mwifiex/sdio.c:2275:			    "data: %s: postcopy current buffer\n",
./drivers/net/wireless/marvell/mwifiex/main.c:872:		    "data: %lu BSS(%d-%d): Data <= kernel\n",
./drivers/net/wireless/marvell/mwifiex/main.c:889:			    "data: Tx: insufficient skb headroom %d\n",
./drivers/net/wireless/marvell/mwifiex/11n_aggr.c:285:		mwifiex_dbg(adapter, ERROR, "data: -EBUSY is returned\n");
./drivers/net/wireless/marvell/mwifiex/11n.c:762:		mwifiex_dbg(priv->adapter, DATA, "data: %s tid=%d\n",
./drivers/net/wireless/intersil/orinoco/scan.c:206:				       "data: %zu\n", priv->ndev->name,
./drivers/net/wireless/ath/carl9170/rx.c:985:		"data:%d, rx:%d, pending:%d ]\n", clen, wlen, tlen,
./drivers/net/ethernet/dnet.c:497:	printk(KERN_DEBUG PFX "data:");
./drivers/net/ethernet/cadence/macb_main.c:1367:		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_ADDRESS, 16, 1,
./drivers/net/ethernet/cadence/macb_main.c:2155:	print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_OFFSET, 16, 1,
./drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:1093:			BNX2X_ERR("data: %x %x %x\n",
./drivers/net/ethernet/altera/altera_tse_main.c:420:			print_hex_dump(KERN_ERR, "data: ", DUMP_PREFIX_OFFSET,
./drivers/net/can/peak_canfd/peak_canfd.c:152:		   "data: brp=%u tseg1=%u tseg2=%u sjw=%u\n",
./drivers/media/rc/imon_raw.c:40:	dev_dbg(imon->dev, "data: %*ph", 8, &imon->ir_buf);
./drivers/iommu/io-pgtable-arm.c:1107:	pr_err("data: %d levels, 0x%zx pgd_size, %u pg_shift, %u bits_per_level, pgd @ %p\n",
./drivers/input/touchscreen/tsc2007_core.c:47:	dev_dbg(&tsc->client->dev, "data: 0x%x, val: 0x%x\n", data, val);
./drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c:2437:		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_NONE,
./drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c:2471:		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_NONE,
./drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c:1903:		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_NONE,
./drivers/gpu/drm/amd/pm/swsmu/smu11/arcturus_ppt.c:1937:		print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_NONE,
./drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c:237:		print_hex_dump(KERN_INFO, "data: ", DUMP_PREFIX_NONE,
./drivers/gpu/drm/amd/amdgpu/smu_v11_0_i2c.c:393:		print_hex_dump(KERN_INFO, "data: ", DUMP_PREFIX_NONE,
./arch/um/os-Linux/skas/mem.c:95:				       "data:");
./arch/s390/kvm/trace-s390.h:196:		      "data:%08llx %016llx",
./arch/arm/boot/dts/at91-tse850-3.dts:126:			label = "data:red";
./arch/arm/boot/dts/at91-tse850-3.dts:130:			label = "data:green";
./Documentation/s390/s390dbf.rst:416:  #define UNKNOWNSTR "data: %08x"

Last edited by frostschutz (2021-02-22 11:31:20)

Offline

#3 2021-02-22 16:00:36

Peterix
Member
Registered: 2008-05-05
Posts: 30

Re: [SOLVED] Powerdevil sends junk to AMDGPU i2c devices and causes issues

Well, I think it's AMDGPU, because whatever was logging this broke rather spectacularly:

[ 1528.182530] data: df 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1528.182531] data: 00 00 00 00 00 00 00 00
[ 1528.198312] data: df 7a 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1528.198313] data: 00 00 00 00 00 00 00 00
[ 1528.228473] amdgpu 0000:0b:00.0: amdgpu: failed send message: TransferTableDram2Smu (19)     param: 0x00000009 response 0xfffffffb
[ 1528.228474] amdgpu 0000:0b:00.0: amdgpu: sienna_cichlid_i2c_write- error occurred :fffffffb
[ 2046.267244] audit: type=1100 audit(1614007873.474:98): pid=5451 uid=1000 auid=1000 ses=2 msg='op=PAM:authentication grantors=pam_faillock,pam_permit,pam_faillock acct="peterix" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'
[ 2046.268186] audit: type=1101 audit(1614007873.474:99): pid=5451 uid=1000 auid=1000 ses=2 msg='op=PAM:accounting grantors=pam_unix,pam_permit,pam_time acct="peterix" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'
[ 2046.268361] audit: type=1110 audit(1614007873.474:100): pid=5451 uid=1000 auid=1000 ses=2 msg='op=PAM:setcred grantors=pam_faillock,pam_permit,pam_faillock acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'
[ 2046.270267] audit: type=1105 audit(1614007873.478:101): pid=5451 uid=1000 auid=1000 ses=2 msg='op=PAM:session_open grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/1 res=success'

It makes the system lag a lot too when it happens.

Offline

#4 2021-02-23 13:37:53

Peterix
Member
Registered: 2008-05-05
Posts: 30

Re: [SOLVED] Powerdevil sends junk to AMDGPU i2c devices and causes issues

OK. The plot thinckens.

I've built a custom kernel where I changed a bunch of these to be identifiable.

[   22.215325] sienna_cichlid_i2c_write_data data: 00 00 71 33 79 23 dc e0
[   22.231329] sienna_cichlid_i2c_write_data data: 1f 74 f6 51 00 00 00 00 00 00 00 00 f1 37 79 23
[   22.231331] sienna_cichlid_i2c_write_data data: dc e0 f6 21 5f 66 74 72
[   22.248315] sienna_cichlid_i2c_write_data data: 1f 8a 61 63 65 5f 41 31 79 23 dc e0 f6 31 0a 00
[   22.248316] sienna_cichlid_i2c_write_data data: 00 00 00 00 00 00 61 38

So, SOMETHING is writing weird garbage to the GPU over i2c?

The kernel driver logs junk while this is happening, and eventually, whatever is reading this junk on the GPU side can't take it anymore and stops responding.

The system becomes unstable afterwards, because the legitimate use of this i2c interface isn't working anymore either.

Offline

#5 2021-02-23 13:41:50

Peterix
Member
Registered: 2008-05-05
Posts: 30

Re: [SOLVED] Powerdevil sends junk to AMDGPU i2c devices and causes issues

Next up is figuring out where all these writes come from, and add logging for the ones that actually cause it to fail... because the code seems to not log those for some reason...

Offline

#6 2021-02-23 14:02:04

Peterix
Member
Registered: 2008-05-05
Posts: 30

Re: [SOLVED] Powerdevil sends junk to AMDGPU i2c devices and causes issues

It is the KDE powerdevil service...

 ~  $  lsof /dev/i2c-*
COMMAND    PID    USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
org_kde_p 1223 peterix   14u   CHR   89,4      0t0  561 /dev/i2c-4
org_kde_p 1223 peterix   15u   CHR   89,5      0t0  562 /dev/i2c-5
org_kde_p 1223 peterix   16u   CHR   89,6      0t0  563 /dev/i2c-6
 ~  $  less /sys/dev/char/89\:4/name 
 ~  $  cat /sys/dev/char/89\:4/name 
AMDGPU DM i2c hw bus 3
 ~  $  cat /sys/dev/char/89\:5/name 
AMDGPU DM aux hw bus 0
 ~  $  cat /sys/dev/char/89\:6/name 
AMDGPU DM aux hw bus 1

Why is powerdevil sending random junk from Slack to an RX 6900XT?

This is some prime software gore.

Offline

#7 2021-02-23 14:19:44

Peterix
Member
Registered: 2008-05-05
Posts: 30

Re: [SOLVED] Powerdevil sends junk to AMDGPU i2c devices and causes issues

Removed powerdevil, the kernel log is now nice and quiet.

Further investigation will be needed on both sides... AMDGPU because it can't recover from this error, and powerdevil, because it sends garbage to random i2c devices.

Offline

#8 2021-02-23 14:28:26

Peterix
Member
Registered: 2008-05-05
Posts: 30

Re: [SOLVED] Powerdevil sends junk to AMDGPU i2c devices and causes issues

As a bonus, the system no longer hangs on shutdown / reboot, and I can use sensors to probe the GPU temperatures and fan speeds again.

Yay.

Offline

#9 2021-02-23 14:40:54

frostschutz
Member
Registered: 2013-11-15
Posts: 1,409

Re: [SOLVED] Powerdevil sends junk to AMDGPU i2c devices and causes issues

Fascinating. I have amdgpu but no powerdevil... glad you could track it down anyway

Offline

Board footer

Powered by FluxBB