You are not logged in.

#1 2025-03-05 09:29:37

Zsar
Member
Registered: 2023-02-12
Posts: 10

How to find out what a hanging udev worker is trying to do?

Today I updated egl-wayland via `sudo pacman -Syu` and in post-installation udev hung.

It eventually timed out and everything seems to work even after a restart, but - a bit too late - I wonder: As the task was killed while waiting for the device, could that mean that something that should have happened post-installation did not happen?

Unfortunately neither the Wiki nor the man page seem to spell out (in a way I recognise) how to go from udev worker ID (424) or task SEQNUM (3748) to rule, so I can look up what it was supposed to do.

Is such an udev hang guaranteed to be harmless and if not, how can I react if it reoccurs in the future?

journalctl entry here. I can see where it hung, I can see what it was doing at the time, but I see nothing that would allow me to find the rule, which caused it to try.

Mar 05 07:30:50 Nr7 sudo[783]:     Zsar : TTY=tty1 ; PWD=/home/Zsar ; USER=root ; COMMAND=/usr/bin/pacman -Syu
Mar 05 07:30:50 Nr7 sudo[783]: pam_unix(sudo:session): session opened for user root(uid=0) by Zsar(uid=1000)
Mar 05 07:31:05 Nr7 systemd-udevd[421]: 0000:01:00.0: Worker [424] processing SEQNUM=3748 is taking a long time
Mar 05 07:32:03 Nr7 systemd-logind[625]: Watching system buttons on /dev/input/event3 (Power Button)
Mar 05 07:32:03 Nr7 systemd[759]: Reached target Sound Card.
Mar 05 07:32:03 Nr7 systemd-logind[625]: Watching system buttons on /dev/input/event1 (Sleep Button)
Mar 05 07:32:03 Nr7 systemd-logind[625]: Watching system buttons on /dev/input/event0 (Power Button)
Mar 05 07:32:03 Nr7 systemd-logind[625]: Watching system buttons on /dev/input/event2 (Lid Switch)
Mar 05 07:32:03 Nr7 systemd-logind[625]: Watching system buttons on /dev/input/event8 (Intel HID events)
Mar 05 07:32:03 Nr7 systemd-logind[625]: Watching system buttons on /dev/input/event5 (AT Translated Set 2 keyboard)
Mar 05 07:33:15 Nr7 systemd-udevd[421]: 0000:01:00.0: Worker [424] processing SEQNUM=3748 killed
Mar 05 07:33:18 Nr7 systemd[1]: Started Getty on tty2.
Mar 05 07:33:23 Nr7 login[919]: pam_unix(login:session): session opened for user Zsar(uid=1000) by Zsar(uid=0)
Mar 05 07:33:23 Nr7 login[919]: LOGIN ON tty2 BY Zsar
Mar 05 07:33:23 Nr7 systemd-logind[625]: New session 3 of user Zsar.
Mar 05 07:33:23 Nr7 systemd[1]: Started Session 3 of User Zsar.
Mar 05 07:34:02 Nr7 kernel: INFO: task (udev-worker):424 blocked for more than 122 seconds.
Mar 05 07:34:02 Nr7 kernel:       Tainted: G           OE      6.13.5-arch1-1 #1
Mar 05 07:34:02 Nr7 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 05 07:34:02 Nr7 kernel: task:(udev-worker)   state:D stack:0     pid:424   tgid:424   ppid:421    flags:0x00004006
Mar 05 07:34:02 Nr7 kernel: Call Trace:
Mar 05 07:34:02 Nr7 kernel:  <TASK>
Mar 05 07:34:02 Nr7 kernel:  __schedule+0x425/0x12b0
Mar 05 07:34:02 Nr7 kernel:  ? _kgspRpcDrainEvents+0x1d4/0x1780 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  ? os_acquire_spinlock+0x12/0x30 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  schedule+0x27/0xf0
Mar 05 07:34:02 Nr7 kernel:  schedule_preempt_disabled+0x15/0x30
Mar 05 07:34:02 Nr7 kernel:  rwsem_down_write_slowpath+0x1d3/0x660
Mar 05 07:34:02 Nr7 kernel:  ? _tlsThreadEntryGet+0x82/0x90 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  down_write+0x5a/0x60
Mar 05 07:34:02 Nr7 kernel:  os_acquire_rwlock_write+0x2b/0x40 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  portSyncRwLockAcquireWrite+0x26/0x80 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  rmapiLockAcquire+0x274/0x350 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  kgspInitRm_IMPL+0xb15/0x1560 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  ? rm_get_uefi_console_status+0x32/0x40 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  RmInitAdapter+0x1282/0x1d20 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  ? os_get_current_tick+0x3b/0xa0 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  rm_init_adapter+0xad/0xc0 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  nv_open_device+0x2e9/0x9c0 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  nvidia_dev_get+0x77/0xd0 [nvidia b3ebcc7d18d66b6df0f6b6669fa874235a2b1e09]
Mar 05 07:34:02 Nr7 kernel:  nvkms_open_gpu+0x4b/0x90 [nvidia_modeset 00dcb4193012b4b2990cc98b6b6db04c7b630048]
Mar 05 07:34:02 Nr7 kernel:  AllocateDevice+0x48/0x680 [nvidia_modeset 00dcb4193012b4b2990cc98b6b6db04c7b630048]
Mar 05 07:34:02 Nr7 kernel:  ? _printk+0x6c/0x90
Mar 05 07:34:02 Nr7 kernel:  nv_drm_load+0xb5/0x81c [nvidia_drm ac9bbedd7e5c76883078d1ee65403b0ad6e2f03b]
Mar 05 07:34:02 Nr7 kernel:  ? __remove_hrtimer+0x39/0x90
Mar 05 07:34:02 Nr7 kernel:  ? __pfx_nv_drm_event_callback+0x10/0x10 [nvidia_drm ac9bbedd7e5c76883078d1ee65403b0ad6e2f03b]
Mar 05 07:34:02 Nr7 kernel:  ? hrtimer_try_to_cancel.part.0+0x50/0xe0
Mar 05 07:34:02 Nr7 kernel:  ? sched_balance_newidle+0x2c8/0x460
Mar 05 07:34:02 Nr7 kernel:  ? sched_clock+0x10/0x30
Mar 05 07:34:02 Nr7 kernel:  ? sched_clock_cpu+0xf/0x1d0
Mar 05 07:34:02 Nr7 kernel:  ? psi_group_change+0x13b/0x310
Mar 05 07:34:02 Nr7 kernel:  drm_dev_register+0x122/0x250
Mar 05 07:34:02 Nr7 kernel:  nv_drm_register_drm_device+0xa2/0x180 [nvidia_drm ac9bbedd7e5c76883078d1ee65403b0ad6e2f03b]
Mar 05 07:34:02 Nr7 kernel:  nv_drm_probe_devices+0x99/0xd0 [nvidia_drm ac9bbedd7e5c76883078d1ee65403b0ad6e2f03b]
Mar 05 07:34:02 Nr7 kernel:  ? __pfx_nv_linux_drm_init+0x10/0x10 [nvidia_drm ac9bbedd7e5c76883078d1ee65403b0ad6e2f03b]
Mar 05 07:34:02 Nr7 kernel:  do_one_initcall+0x58/0x310
Mar 05 07:34:02 Nr7 kernel:  do_init_module+0x60/0x230
Mar 05 07:34:02 Nr7 kernel:  init_module_from_file+0x89/0xe0
Mar 05 07:34:02 Nr7 kernel:  idempotent_init_module+0x115/0x310
Mar 05 07:34:02 Nr7 kernel:  __x64_sys_finit_module+0x65/0xc0
Mar 05 07:34:02 Nr7 kernel:  do_syscall_64+0x82/0x190
Mar 05 07:34:02 Nr7 kernel:  ? clear_bhb_loop+0x25/0x80
Mar 05 07:34:02 Nr7 kernel:  ? clear_bhb_loop+0x25/0x80
Mar 05 07:34:02 Nr7 kernel:  ? clear_bhb_loop+0x25/0x80
Mar 05 07:34:02 Nr7 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Mar 05 07:34:02 Nr7 kernel: RIP: 0033:0x7c412332588d
Mar 05 07:34:02 Nr7 kernel: RSP: 002b:00007fff649aa9b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Mar 05 07:34:02 Nr7 kernel: RAX: ffffffffffffffda RBX: 000062da942de950 RCX: 00007c412332588d
Mar 05 07:34:02 Nr7 kernel: RDX: 0000000000000004 RSI: 00007c412394b2f2 RDI: 0000000000000013
Mar 05 07:34:02 Nr7 kernel: RBP: 00007fff649aaa50 R08: 0000000000000000 R09: 000062da942df110
Mar 05 07:34:02 Nr7 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007c412394b2f2
Mar 05 07:34:02 Nr7 kernel: R13: 0000000000020000 R14: 000062da942dab50 R15: 000062da942de950
Mar 05 07:34:02 Nr7 kernel:  </TASK>

Offline

#2 2025-03-05 14:56:38

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,375

Re: How to find out what a hanging udev worker is trying to do?

I assume 0000:01:00.0 is your nvidia GPU (because it also fails to init later on)?

For an idea what's generally wrong with it, please post your complete system journal for the boot:

sudo journalctl -b | curl -F 'file=@-' 0x0.st

Offline

#3 2025-03-07 07:17:02

Zsar
Member
Registered: 2023-02-12
Posts: 10

Re: How to find out what a hanging udev worker is trying to do?

So far it only seems to happen when updating packages that nvidia-utils (transitively) depends on. Graphics work.

Offline

#4 2025-03-07 08:33:34

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,375

Re: How to find out what a hanging udev worker is trying to do?

It's a hybrid graphics system - can you "prime-run glxgears"?
Does this also happen w/ nvidia instead of nvidia-open?

https://bugs.archlinux.org/task/77789.html
https://gitlab.archlinux.org/archlinux/ … 97e8b3baff

sudo touch /etc/systemd/do-not-udevadm-trigger-on-update

Offline

#5 2025-03-08 08:10:59

Zsar
Member
Registered: 2023-02-12
Posts: 10

Re: How to find out what a hanging udev worker is trying to do?

I installed nvidia-prime to get prime-run. It has a wide range of behaviours:

  • when the udev-worker hangs, everywhere:

    $ prime-run glxinfo
    Error: unable to open display
    $ prime-run glxgears
    Error: unable to open display (null)
    $ vulkaninfo
    (hangs indefinitely without output)
  • when the udev-worker does not hang, in tty or Weston: same output for prime-run, but vulkaninfo provides expected output

  • when the udev-worker does not hang, in gdm:

    $ prime-run glxinfo
    (a lot of correct-looking but mostly arcane info about the nVidia card; no errors)
    $ prime-run glxgears
    Running synchronized to the vertical refresh.  The framerate should be
    approximately the same as the monitor refresh rate.
    288 frames in 5.0 seconds = 57.433 FPS
    301 frames in 5.0 seconds = 60.051 FPS
    318 frames in 5.0 seconds = 63.437 FPS
    (etc.; a separate window displays the gears as expected)
    $ vulkaninfo
    (same output as on tty and Weston)

Today I also got the same udev-worker hang right on login, without a pacman update as the trigger, so apparently it has a risk to hang on any udev update, not just on the ones triggered by a package change.

I will try the nvidia package for a number of days and see whether it happens again. If not, should I report this to the nvidia-open maintainer and/or what information should I sensibly collect/include?

... Could you perhaps come back to the original question, how I can find the rule that is the one which hangs?

  • My /etc/udev/rules.d directory is empty, my /etc/udev/udev.conf is commented out completely, so is my /etc/udev/iocost.conf.

  • The stacktrace in journalctl does not tell me, what it was trying to do, only which specific instruction locked it up.

  • The only actionable advice on the udev Wiki page seems to be to enable hardware debugging, everything else seems to assume that I myself would have created many rules for it to hang on.
    ... Should I have created such rules? Then I would have accidentally skipped over that step when following the installation instructions and a heads-up would be appreciated.

Even if I resolve this particular issue, which seems to be solely related to the graphics card, so far I have not come any closer to proficiency in handling general udev-worker failures.

Offline

#6 2025-03-08 08:19:15

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,375

Re: How to find out what a hanging udev worker is trying to do?

Forget about the udev hang, your GPU doesn't respond on the bus.
That's why usr/lib/udev/rules.d/60-nvidia.rules from nvidia-utils stalls (i don't think there's a way to infer the originating file, but the kernel tells you where udev-worker is hanging - which is obviously probing your nvidia GPU) but that's really just a symptom of the much bigger problem.

In case it's not nvidia-open, is there a parallel OS (windows)?

Offline

#7 2025-03-13 22:20:50

Zsar
Member
Registered: 2023-02-12
Posts: 10

Re: How to find out what a hanging udev worker is trying to do?

There is no parallel OS, no.

I replaced nvidia-open on the same day and today, five days later, it happened again, right on startup.
I had the computer (and the nVidia card) in use every day in the meanwhile, without issue.

- Feels "less often than nvidia-open", but evidently does not solve the issue completely.

Here is today's log. Stacktrace looks differently (of course) but all (nvidia-open and nvidia) end at

entry_SYSCALL_64_after_hwframe+0x76/0x7e

... So far this has only ever happened after the computer had been shutdown completely and separated from the electric grid for several hours, e.g. at the first start of the day. Restarting had a 100% success chance to fix the issue. Shutting down the computer and restarting it only an hour or two later has exhibited caused the issue.
- Could it be that the job queries the card before it is ready? Could I test this by reordering the job to start later or adding a delay to the job itself?

Last edited by Zsar (2025-03-13 22:38:38)

Offline

#8 2025-03-14 09:34:42

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,375

Re: How to find out what a hanging udev worker is trying to do?

Mar 13 22:59:13 Nr7 kernel: simple-framebuffer simple-framebuffer.0: [drm] Registered 1 planes with drm panic

Add "nvidia_drm.modeset=1" to the https://wiki.archlinux.org/title/Kernel_parameters to block the simpledrm device and wrt

Could it be that the job queries the card before it is ready?

try to enable https://wiki.archlinux.org/title/Kernel … _KMS_start (ie. move the nvidiazd  modules into the initramfs)
But

Mar 13 22:59:16 Nr7 kernel: nvidia: loading out-of-tree module taints kernel.
Mar 13 22:59:18 Nr7 kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Mar 13 23:03:13 Nr7 kernel: INFO: task (udev-worker):403 blocked for more than 122 seconds.

the 2 minute timeout is logged 4 minutes after the driver loads - this isn't very likely to be a matter of "the device just takes 2-3s to wakeup"

Try to add "pcie_aspm=off" to the https://wiki.archlinux.org/title/Kernel_parameters, if this helps we may look for a way to disable aspm for the nvidia GPU specifically.

Offline

Board footer

Powered by FluxBB