You are not logged in.
Pages: 1
Hello,
recently I have an problem with a process hanging, which also makes reboot/poweroff extremely long. I have the following message repeating every 2 minutes in dmesg:
[ 982.666986] INFO: task systemd-udevd:388 blocked for more than 120 seconds.
[ 982.667025] Tainted: G O 4.17.12-arch1-1-ARCH #1
[ 982.667053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 982.667087] systemd-udevd D 0 388 376 0x80000124
[ 982.667090] Call Trace:
[ 982.667097] ? __schedule+0x282/0x890
[ 982.667100] ? preempt_count_add+0x68/0xa0
[ 982.667102] schedule+0x32/0x90
[ 982.667108] __sev_do_cmd_locked+0xd7/0x270 [ccp]
[ 982.667111] ? wait_woken+0x80/0x80
[ 982.667113] ? 0xffffffffc0584000
[ 982.667117] __sev_platform_init_locked+0x2f/0x80 [ccp]
[ 982.667118] ? _raw_write_unlock_irqrestore+0x1c/0x30
[ 982.667121] sev_platform_init+0x1d/0x30 [ccp]
[ 982.667125] psp_pci_init+0x40/0xe0 [ccp]
[ 982.667126] ? 0xffffffffc0584000
[ 982.667129] sp_mod_init+0x16/0x1000 [ccp]
[ 982.667131] do_one_initcall+0x46/0x1f5
[ 982.667134] ? free_unref_page_commit+0x70/0xf0
[ 982.667136] ? kmem_cache_alloc_trace+0x181/0x1d0
[ 982.667138] ? do_init_module+0x22/0x210
[ 982.667139] do_init_module+0x5a/0x210
[ 982.667141] load_module+0x247a/0x29f0
[ 982.667144] ? vmap_page_range_noflush+0x276/0x350
[ 982.667146] ? __se_sys_init_module+0x10c/0x170
[ 982.667147] __se_sys_init_module+0x10c/0x170
[ 982.667150] do_syscall_64+0x5b/0x170
[ 982.667152] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 982.667153] RIP: 0033:0x7f86c41f426e
[ 982.667154] RSP: 002b:00007ffd89d57858 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 982.667155] RAX: ffffffffffffffda RBX: 000055e7ef8150b0 RCX: 00007f86c41f426e
[ 982.667156] RDX: 00007f86c3a8decd RSI: 000000000002c8a8 RDI: 000055e7f0045f90
[ 982.667157] RBP: 00007f86c3a8decd R08: 0000000000000006 R09: 0000000000000005
[ 982.667157] R10: 000055e7ef7f2010 R11: 0000000000000246 R12: 000055e7f0045f90
[ 982.667158] R13: 000055e7ef818ea0 R14: 0000000000020000 R15: 000055e7ef8150b0
not even sure if it's related but I found the following with systemctl --all | grep udev:
initrd-udevadm-cleanup-db.service loaded inactive dead Cleanup udevd DB
I have no clue about that. I tried searching for information but I really don't know what to look for.
Any advice?
Update:
1) After upgrading to kernel 4.17.14 I get a kernel panic during boot.
2) To fix the above, I booted an USB with arch install media, downgraded kernel and virtualbox and rebooted.
Now the worst part (for me), with the install media I have the same problem described above. The kernel is 4.16.12. The thing is I didn't had any problem during arch install using that same USB.
By exclusion, I suppose the problem is related to my hardware... anyone have an idea about? Is it related to CPU or something else?
Note that I didn't change anything in my hardware since first install.
Update 2:
the kernel panic is not related to the kernel version, I get it with 4.6.12, 4.7.13, 4.7.14. Here is the dump:
Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.17.14-arch1-1-ARCH #1
Hardware name: To be Filled by O.E.M. To be Filled by O.E.M./X470 Master SLI, BIOS P1.40 07/04/2018
Call Trace:
dump_stack+0x5c/0x80
panic+0xe7/0x247
panic_if_irq_remap.cold.0+0x5/0xe
setup_IO_APIC_pin+0xb8/0x110
x86_late_time_init+0x17/0x1c
start_kernel+0x473/0x535
secondary_startup_64+0xa5/0xb0
---[ end Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC ]---
Anybody that can make sense of that?
Last edited by ghetto.ch (2018-08-12 09:16:32)
Offline
Welcome to the arch linux forums ghetto.ch. initrd-udevadm-cleanup-db.service is probably unrelated it should not be executed outside of the initrd and on arch only if the systemd is used in mkinitcpio.conf.
Is the system using an AMD ryzen based CPU?
Offline
Hi loqs,
thank you for the answer. Yes it's a ryzen 2700X. In fact I made a little step forward and found the following:
systemctl status 388:
systemd-udevd.service - udev Kernel Device Manager
Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static; vendor preset: disabled)
Active: active (running) since Mon 2018-08-06 19:40:28 CEST; 2h 3min ago
Docs: man:systemd-udevd.service(8)
man:udev(7)
Main PID: 376 (systemd-udevd)
Status: "Processing with 40 children at max"
Tasks: 25
Memory: 57.8M
CGroup: /system.slice/systemd-udevd.service
├─376 /usr/lib/systemd/systemd-udevd
├─381 /usr/lib/systemd/systemd-udevd
├─382 /usr/lib/systemd/systemd-udevd
├─384 /usr/lib/systemd/systemd-udevd
├─385 /usr/lib/systemd/systemd-udevd
├─388 /usr/lib/systemd/systemd-udevd
├─389 /usr/lib/systemd/systemd-udevd
├─392 /usr/lib/systemd/systemd-udevd
├─393 /usr/lib/systemd/systemd-udevd
├─395 /usr/lib/systemd/systemd-udevd
├─396 /usr/lib/systemd/systemd-udevd
├─397 /usr/lib/systemd/systemd-udevd
├─400 /usr/lib/systemd/systemd-udevd
├─402 /usr/lib/systemd/systemd-udevd
├─403 /usr/lib/systemd/systemd-udevd
├─407 /usr/lib/systemd/systemd-udevd
├─408 /usr/lib/systemd/systemd-udevd
├─411 /usr/lib/systemd/systemd-udevd
├─412 /usr/lib/systemd/systemd-udevd
├─415 /usr/lib/systemd/systemd-udevd
├─417 /usr/lib/systemd/systemd-udevd
├─418 /usr/lib/systemd/systemd-udevd
├─419 /usr/lib/systemd/systemd-udevd
├─420 /usr/lib/systemd/systemd-udevd
└─421 /usr/lib/systemd/systemd-udevd
Aug 06 19:43:29 gondolin systemd-udevd[376]: worker [406] terminated by signal 9 (KILL)
Aug 06 19:43:29 gondolin systemd-udevd[376]: worker [406] failed while handling '/devices/system/cpu/cpu7'
Aug 06 19:43:29 gondolin systemd-udevd[376]: worker [409] terminated by signal 9 (KILL)
Aug 06 19:43:29 gondolin systemd-udevd[376]: worker [409] failed while handling '/devices/system/cpu/cpu8'
Aug 06 19:43:29 gondolin systemd-udevd[376]: worker [410] terminated by signal 9 (KILL)
Aug 06 19:43:29 gondolin systemd-udevd[376]: worker [410] failed while handling '/devices/system/cpu/cpu5'
Aug 06 19:43:29 gondolin systemd-udevd[376]: worker [413] terminated by signal 9 (KILL)
Aug 06 19:43:29 gondolin systemd-udevd[376]: worker [413] failed while handling '/devices/system/cpu/cpu2'
Aug 06 19:43:29 gondolin systemd-udevd[376]: worker [416] terminated by signal 9 (KILL)
Aug 06 19:43:29 gondolin systemd-udevd[376]: worker [416] failed while handling '/devices/system/cpu/cpu14'
Everything was working fine until some days ago (I don't poweroff/reboot very often, I don't know exactly when it appeared).
In mkinitcpio.conf I added the resume hook months ago and didn't change anything else.
Do you think is some issue with ryzen that I need to wait for a fix?
Offline
looks similar to https://bugs.archlinux.org/task/59483
Offline
Yes, looks like. Thank you for your help! I'll try the LTS kernel and report the results.
Offline
same here with ryzen 2700x and nvidia 1060 on 396 and 390 drivers
EDIT: LTS kernel does not show this behaviour for me - however, the LTS kernel is not yet having the ryzen k10 temperature module correctly working. i didnt yet find any bugs posted upstream (kernel.org or nvidia)
Last edited by dp (2018-08-08 06:15:42)
The impossible missions are the only ones which succeed.
Offline
Radeon RX560 here.
I found the same information about K10 temperature, this is the first reason I didn't switch to LTS yet. The second is that virtualbox requires the latest version, and I rather keep long reboots than be without virtualbox (I don't reboot/poweroff daily anyway).
Up to now I didn't notice other problems, so I'm thinking to just wait for a kernel update to fix this.
Offline
Update:
1) After upgrading to kernel 4.17.14 I get a kernel panic during boot.
2) To fix the above, I booted an USB with arch install media, downgraded kernel and virtualbox and rebooted.
Now the worst part (for me), with the install media I have the same problem described above. The kernel is 4.16.12. The thing is I didn't had any problem during arch install using that same USB.
By exclusion, I suppose the problem is related to my hardware... anyone have an idea about? Is it related to CPU or something else?
Note that I didn't change anything in my hardware since first install.
Offline
Update 2:
the kernel panic is not related to the kernel version, I get it with 4.6.12, 4.7.13, 4.7.14. Here is the dump:
Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.17.14-arch1-1-ARCH #1
Hardware name: To be Filled by O.E.M. To be Filled by O.E.M./X470 Master SLI, BIOS P1.40 07/04/2018
Call Trace:
dump_stack+0x5c/0x80
panic+0xe7/0x247
panic_if_irq_remap.cold.0+0x5/0xe
setup_IO_APIC_pin+0xb8/0x110
x86_late_time_init+0x17/0x1c
start_kernel+0x473/0x535
secondary_startup_64+0xa5/0xb0
---[ end Kernel panic - not syncing: timer doesn't work through Interrupt-remapped IO-APIC ]---
Anybody that can make sense of that?
Offline
Do the boot parameters intremap=off or amd_iommu=off have any affect on the panic?
Edit:
grammar Do not Does
Last edited by loqs (2018-08-11 20:00:46)
Offline
No effect with intermap=off and amd_iommu=off. Interesting is that now I can't boot anymore, even with usb which worked before, like if something is degrading. I don't know how to confirm it, but the only explanation I see is the CPU dying or something with the motherboard.
Another thing is, when editing the grub parameters it's very slow, like 1-2 seconds lag after each key press. The same happen if I boot from usb and try to use HDT, everything is incredibly slow.
Before I also made a memory test which passed without errors.
Offline
https://www.kernel.org/doc/Documentatio … meters.txt
perhaps acpi=off or pci=noioapicreroute
Offline
So, I experimented a bit with parameters from the link you sent me. Not that I found a solution but a new information yes:
When I start editing in grub the system is very slow, but if I wait long enough it becomes normal. Then I boot and it's ok, without changing any parameter.
I still have the original behaviour of the post though.
Same if I boot from usb, I just wait some minutes, the system becomes responsive and I can boot.
Offline
I had the same problem. I downgraded to a UEFI version with AGESA 1.0.0.2 Patch C, which fixed it. It seems 1.0.0.4 breaks something.
Offline
Hi schnilch, thank you for your answer. I tried with the original bios with no luck (first release, which worked perfectly for a month or so). With the version you suggested I can now boot without problems and also the original issue disappeared!
At the end I have no clue of what's going on but... back to normality.
Thank you very much to you and loqs for your help!
Offline
Same problem at boot just after upgrading bios to AGESA 1.0.0.4 it has to be a bug with that version.
Last edited by 4ronie4 (2018-08-18 15:39:47)
Offline
AMD AGESA to 1.0.0.6 is out and i see it on my X470 taichi ultimate as upgrade option - anyone already tried? i am still on the beta bios 1.36 with 1.0.0.2 which is stable, so i have not a strong urge to upgrade unless this instability is fixed in 1.0.0.6 again.
The impossible missions are the only ones which succeed.
Offline
The kernel now contains https://github.com/torvalds/linux/commi … d6005b72c1 which will limit the PSP module device probing to 5 seconds and command execution to 100 seconds.
Last edited by loqs (2019-01-31 22:17:17)
Offline
sweet, thank you for the pointer!
i switched to 1.0.0.6 (bios 2.0 for x470 asrock) and it works without hangovers
The impossible missions are the only ones which succeed.
Offline
Pages: 1