You are not logged in.

#1 2024-06-01 09:57:46

ECC83
Member
Registered: 2022-12-11
Posts: 32

Occasional system freeze [solved]

My system may freeze during upgrade or shutdown. It happens occasionally. Version of the kernel is 6.6.32-1-lts. There is also some weird stuff in the system journal appears.
By means of downgrading the kernel I've figured out that freezes are not the case under version 6.1-lts.
Any help very appreciated.

Last edited by ECC83 (2024-06-08 10:21:07)

Offline

#2 2024-06-01 10:06:00

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,523
Website

Re: Occasional system freeze [solved]

Ouch .. This sounds like a possible kernel regression, but its hard to narrow down when it happens just occasionally . hmm

Did you already check if you also get this type of error with the "linux" package?

Offline

#3 2024-06-01 11:34:04

ECC83
Member
Registered: 2022-12-11
Posts: 32

Re: Occasional system freeze [solved]

gromit wrote:

Did you already check if you also get this type of error with the "linux" package?

I'll check it out and report results.

Offline

#4 2024-06-02 09:58:44

ECC83
Member
Registered: 2022-12-11
Posts: 32

Re: Occasional system freeze [solved]

After some testing of linux package I've found that the same trace still appears. The problem is it does not happen all the time.

Offline

#5 2024-06-02 11:27:27

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,523
Website

Re: Occasional system freeze [solved]

Could you also test the latest mainline release?

sudo pacman -U https://pkgbuild.com/~gromit/linux-bisection-kernels/linux-mainline-6.10rc1-1-x86_64.pkg.tar.zst

If its also a problem there it might be a regression in the upstream linux kernel which would need to be bisected down to the specific commit that's causing the issue.

Are you confident to do the bisection on your own or do you need some help?   
If you want we could also provide you with prebuilt kernel images for you to test (see for example this exchange https://gitlab.archlinux.org/archlinux/ … te_187653) 

Good info to get you started is:
- https://docs.kernel.org/admin-guide/rep … sions.html
- https://wiki.archlinux.org/title/Kernel … egressions

Offline

#6 2024-06-02 13:40:04

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,122

Re: Occasional system freeze [solved]

When downgrading the LTS kernel, you probably downgraded the nvidia driver w/ it? To a version before 550xx?
https://bbs.archlinux.org/viewtopic.php … 2#p2173652

Offline

#7 2024-06-02 14:25:03

ECC83
Member
Registered: 2022-12-11
Posts: 32

Re: Occasional system freeze [solved]

gromit wrote:

Could you also test the latest mainline release?

Ok, I'll test it.

gromit wrote:

Are you confident to do the bisection on your own or do you need some help?

I guess I do need some help because I've never done it before.

seth wrote:

When downgrading the LTS kernel, you probably downgraded the nvidia driver w/ it? To a version before 550xx?

It's true. To the 535.104.05-7 to be more specific.

I would like to clarify the situation. The first thing is that my pc is a laptop if it matters. The second is that those freezes may happen only during shutdown or update (when reloading config occurs).  Also the caps lock indicator is blinking when system freezes.

Offline

#8 2024-06-02 14:37:08

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,122

Re: Occasional system freeze [solved]

Try to use the 535xx-dkms package from the AUR along a recent kernel and see whether you can still reproduce the problem.

Offline

#9 2024-06-03 18:03:07

ECC83
Member
Registered: 2022-12-11
Posts: 32

Re: Occasional system freeze [solved]

seth wrote:

Try to use the 535xx-dkms package from the AUR along a recent kernel and see whether you can still reproduce the problem.

Result is that the mentioned above trace still presents. What should I do now? Test the mainline with latest driver?

Last edited by ECC83 (2024-06-03 18:05:31)

Offline

#10 2024-06-03 18:05:19

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,523
Website

Re: Occasional system freeze [solved]

Could you temporarily switch to the Nouveau drivers so we can do the bisection with an untainted kernel?
https://wiki.archlinux.org/title/Nouveau

Offline

#11 2024-06-03 18:10:11

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,523
Website

Re: Occasional system freeze [solved]

Yes and also please test with the mainline kernel and post the dmesg output from your tests there

Offline

#12 2024-06-03 18:10:36

ECC83
Member
Registered: 2022-12-11
Posts: 32

Re: Occasional system freeze [solved]

gromit wrote:

Could you temporarily switch to the Nouveau drivers so we can do the bisection with an untainted kernel?
https://wiki.archlinux.org/title/Nouveau

Sure. Should I test the mainline or linux package? In previous post I meant linux package along 535xx-dkms.

Offline

#13 2024-06-03 18:11:54

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,523
Website

Re: Occasional system freeze [solved]

The idea of testing with the mainline kernel is to rule out that the bug has already been fixed upstream in the linux kernel.

sudo pacman -U https://pkgbuild.com/~gromit/linux-bisection-kernels/linux-mainline-6.10rc2-1-x86_64.pkg.tar.zst

Offline

#14 2024-06-03 20:09:21

ECC83
Member
Registered: 2022-12-11
Posts: 32

Re: Occasional system freeze [solved]

It looks like there is no the trace when the mainline kernel with the Nouveau drivers is used. Here is dmesg output for this case.

Offline

#15 2024-06-03 21:01:54

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,122

Re: Occasional system freeze [solved]

Did the system still "freeze during upgrade or shutdown"?
(The dmesg is only 54 seconds)

Edit: if not, does the otherwise affected kernel freeze w/ nouveau instead of nvidia?

Last edited by seth (2024-06-03 21:02:28)

Offline

#16 2024-06-04 09:21:39

ECC83
Member
Registered: 2022-12-11
Posts: 32

Re: Occasional system freeze [solved]

It looks like I made hasty conclusion in my previous post.  Here is updated dmesg output for the mainline kernel with the nouveau drivers. The trace can be found in it. I haven't succeeded to reproduce the freeze itself yet though. I guess I need more time for testing.

Offline

#17 2024-06-04 10:56:31

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,523
Website

Re: Occasional system freeze [solved]

Yeah the freeze might be a different bug, if you want we can have a look at the trace first. Do you know which was the last version in which the trace did not appear?

Offline

#18 2024-06-04 11:41:37

ECC83
Member
Registered: 2022-12-11
Posts: 32

Re: Occasional system freeze [solved]

gromit wrote:

Yeah the freeze might be a different bug, if you want we can have a look at the trace first. Do you know which was the last version in which the trace did not appear?

That's pretty hard to say, maybe two month ago. The point is that I didn't check system journal before the freezes began. After freeze happened I checked the journal and saw the trace, that's why I thought they connected. Since freezes happen occasionally, I've ignored them for a while. Then I downgraded the kernel to the linux-lts-6.1.50-1 and it was totally fine (no traces, no freezes). Finally I decided to check the latest lts kernel again and figured out that the problem still presents. And then I created this topic.
I can check through lts versions but it will take some time.

Offline

#19 2024-06-04 13:22:16

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,122

Re: Occasional system freeze [solved]

[    5.306160] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000
[    5.306173] ucsi_ccg 7-0008: i2c_transfer failed -110
[    5.306181] ucsi_ccg 7-0008: ucsi_ccg_init failed - -110
[    5.306189] ucsi_ccg 7-0008: probe with driver ucsi_ccg failed with error -110
…
[    9.218602] ------------[ cut here ]------------
[    9.218610] WARNING: CPU: 1 PID: 116 at drivers/usb/typec/ucsi/ucsi.c:1326 ucsi_reset_ppm+0x20c/0x220 [typec_ucsi]
[    9.218637] Modules linked in: ccm uhid cmac algif_hash algif_skcipher af_alg snd_sof_pci_intel_cnl snd_sof_intel_hda_generic intel_uncore_frequency soundwire_intel intel_uncore_frequency_common bnep soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils soundwire_generic_allocation soundwire_bus snd_soc_avs snd_soc_hda_codec snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_sst_ipc intel_tcc_cooling snd_soc_sst_dsp iwlmvm snd_soc_acpi_intel_match snd_hda_codec_realtek x86_pkg_temp_thermal snd_soc_acpi snd_hda_codec_generic intel_powerclamp mousedev coretemp snd_hda_scodec_component crct10dif_pclmul mac80211 crc32_pclmul snd_soc_core polyval_clmulni snd_hda_codec_hdmi polyval_generic snd_compress ac97_bus gf128mul libarc4 ledtrig_netdev snd_pcm_dmaengine ghash_clmulni_intel ptp snd_hda_intel sha512_ssse3 pps_core sha256_ssse3 snd_intel_dspcfg iTCO_wdt snd_intel_sdw_acpi sha1_ssse3 btusb intel_pmc_bxt aesni_intel snd_hda_codec r8169
[    9.218782]  btrtl processor_thermal_device_pci_legacy hid_multitouch ee1004 iTCO_vendor_support mei_pxp mei_hdcp ucsi_ccg intel_rapl_msr btintel processor_thermal_device crypto_simd iwlwifi snd_hda_core cryptd btbcm spi_nor processor_thermal_wt_hint realtek btmtk vfat processor_thermal_rfim snd_hwdep rapl fat intel_cstate snd_pcm intel_uncore i2c_i801 mdio_devres bluetooth asus_nb_wmi pcspkr ucsi_acpi processor_thermal_rapl wmi_bmof mtd cfg80211 i2c_smbus typec_ucsi mei_me snd_timer intel_rapl_common libphy intel_lpss_pci i2c_mux processor_thermal_wt_req snd mei intel_lpss processor_thermal_power_floor typec idma64 processor_thermal_mbox soundcore i2c_nvidia_gpu i2c_hid_acpi roles intel_soc_dts_iosf intel_pch_thermal i2c_hid intel_pmc_core int3403_thermal joydev int340x_thermal_zone intel_vsec pmt_telemetry int3400_thermal pinctrl_cannonlake acpi_tad pmt_class asus_wireless acpi_thermal_rel acpi_pad mac_hid crypto_user loop dm_mod nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_asus asus_wmi
[    9.218966]  sparse_keymap platform_profile rfkill hid_generic usbhid nouveau i915 drm_ttm_helper serio_raw gpu_sched atkbd drm_gpuvm libps2 vivaldi_fmap cec drm_exec nvme drm_buddy i2c_algo_bit mxm_wmi ttm nvme_core intel_gtt crc32c_intel spi_intel_pci drm_display_helper xhci_pci nvme_auth spi_intel xhci_pci_renesas i8042 serio video wmi
[    9.219043] CPU: 1 PID: 116 Comm: kworker/1:1 Not tainted 6.10.0-rc2-1-mainline #1 71def6f3bd7226827735e98156bf131376a1ffe9
[    9.219055] Hardware name: ASUSTeK COMPUTER INC. Zephyrus S GX502GW_GX502GW/GX502GW, BIOS GX502GW.310 04/24/2020
[    9.219061] Workqueue: events_long ucsi_init_work [typec_ucsi]
[    9.219079] RIP: 0010:ucsi_reset_ppm+0x20c/0x220 [typec_ucsi]
[    9.219093] Code: 54 24 0c 81 e2 00 00 00 08 0f 85 12 ff ff ff 4c 89 6c 24 20 48 8b 05 93 81 19 fd 49 39 c4 79 80 b8 92 ff ff ff e9 f7 fe ff ff <0f> 0b e9 32 ff ff ff e8 78 9a fb fb 0f 1f 84 00 00 00 00 00 90 90
[    9.219100] RSP: 0018:ffffae5d004afd98 EFLAGS: 00010206
[    9.219109] RAX: 0000000008000000 RBX: ffff956206a86600 RCX: 0000000008000000
[    9.219115] RDX: 00000000fffeab00 RSI: ffffae5d01065004 RDI: ffffae5d004afda4
[    9.219120] RBP: ffffae5d004afda4 R08: 0000000008000000 R09: 0000000000000000
[    9.219126] R10: ffffffffbd725b80 R11: 0000000000000045 R12: 00000000fffeaaf9
[    9.219131] R13: ffff956200df6c00 R14: ffffae5d004afda8 R15: ffff956206a866c0
[    9.219136] FS:  0000000000000000(0000) GS:ffff95695d880000(0000) knlGS:0000000000000000
[    9.219144] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.219149] CR2: 0000565204e052d8 CR3: 00000002d7620006 CR4: 00000000003706f0
[    9.219155] Call Trace:
[    9.219160]  <TASK>
[    9.219164]  ? ucsi_reset_ppm+0x20c/0x220 [typec_ucsi 00de668eeb01d77497a0f63c13d6c42ea01c5f1c]
[    9.219179]  ? __warn.cold+0x8e/0xe8
[    9.219192]  ? ucsi_reset_ppm+0x20c/0x220 [typec_ucsi 00de668eeb01d77497a0f63c13d6c42ea01c5f1c]
[    9.219211]  ? report_bug+0xff/0x140
[    9.219226]  ? handle_bug+0x3c/0x80
[    9.219236]  ? exc_invalid_op+0x17/0x70
[    9.219245]  ? asm_exc_invalid_op+0x1a/0x20
[    9.219259]  ? ucsi_reset_ppm+0x20c/0x220 [typec_ucsi 00de668eeb01d77497a0f63c13d6c42ea01c5f1c]
[    9.219274]  ? ucsi_reset_ppm+0xfc/0x220 [typec_ucsi 00de668eeb01d77497a0f63c13d6c42ea01c5f1c]
[    9.219287]  ? finish_task_switch.isra.0+0x99/0x2e0
[    9.219305]  ucsi_init_work+0x3c/0xb20 [typec_ucsi 00de668eeb01d77497a0f63c13d6c42ea01c5f1c]
[    9.219320]  ? process_one_work+0x186/0x340
[    9.219328]  ? kfree+0x2ca/0x2f0
[    9.219343]  process_one_work+0x186/0x340
[    9.219353]  worker_thread+0x2eb/0x410
[    9.219362]  ? __pfx_worker_thread+0x10/0x10
[    9.219369]  kthread+0xcf/0x100
[    9.219379]  ? __pfx_kthread+0x10/0x10
[    9.219389]  ret_from_fork+0x31/0x50
[    9.219401]  ? __pfx_kthread+0x10/0x10
[    9.219410]  ret_from_fork_asm+0x1a/0x30
[    9.219426]  </TASK>
[    9.219430] ---[ end trace 0000000000000000 ]---
[    9.367173] typec port0: bound usb1-port4 (ops connector_ops)
[    9.367207] typec port0: bound usb2-port1 (ops connector_ops)

This is the usb controller on the nvidia GPU, https://download.nvidia.com/XFree86/Lin … sAndW6426e
Not sure whether the Oops there would lead to "freeze during upgrade or shutdown" - was that actually still the case with any of the recent tests (535xx, nouveau) or did you just go by the ucsi timeout?

Offline

#20 2024-06-04 19:00:14

ECC83
Member
Registered: 2022-12-11
Posts: 32

Re: Occasional system freeze [solved]

seth wrote:

"freeze during upgrade or shutdown" - was that actually still the case with any of the recent tests (535xx, nouveau) or did you just go by the ucsi timeout?

Just the timeout. And it happens in approx 1 of 10 power on/off cycles.

Offline

#21 2024-06-04 20:08:00

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,122

Re: Occasional system freeze [solved]

Please re-test the current kernel w/ the 535xx drivers then - te 550xx are on the record and the oops in and by itself doesn't seem very likely to be the cause.
You can however also remove the device as show in the nvidia readme.

Offline

#22 2024-06-07 16:35:53

ECC83
Member
Registered: 2022-12-11
Posts: 32

Re: Occasional system freeze [solved]

seth wrote:

Please re-test the current kernel w/ the 535xx drivers

I decided to take a few days to perform testing of the latest linux package with the 535xx drivers. I did not face the system freeze. But the message still occurs. It's pretty rare event though. It has no any period or something and I failed to find a trigger of it.

Offline

#23 2024-06-07 19:42:57

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,122

Re: Occasional system freeze [solved]

The one posted is during the setup phase, the nvidia document caveats power management related issues - you could try to move the nvidia drivers into the initramfs to give it a head-start.
If you don't have or use a usb-c output (VR headset?) or experience no issues with that I'd frankly just ignore it.

Please always remember to mark resolved threads by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.

Offline

Board footer

Powered by FluxBB