You are not logged in.
Here are the logs for the last 3 hours: https://0x0.st/s/ywu1x8nCIND9ID0Et8vs5g/H2E5.dbg (lfc-interval = 1h).
Offline
Does it still crash?
You might want to link out the parameters
echo "$@" >> /tmp/kea-lfc.parms
before "exec kea-lfc.bin" and see whether you can make it crash on explicit invocation.
Cause then you can crash it in gdb
Offline
I've only had one crash so far today, but at the time I hadn't yet deployed the script you advised.
Aug 03 06:47:51 home kernel: kea-lfc[1504]: segfault at 7f5e94000020 ip 00007f5e9c0aa170 sp 00007f5e9b240db0 error 6 likely on CPU 0 (core 0, socket 0)
Aug 03 06:47:51 home kernel: Code: ee 48 89 df e8 b1 f3 06 00 85 c0 0f 85 41 01 00 00 48 8b 05 4a 71 14 00 48 83 e8 01 48 39 e8 0f 82 b5 00 00 00 66 0f 6f 0c 24 <4c> 89 6b 20 0f 11 4b 10 90 48 83 c4 18 48 89 d8 5b 5d 41 5c 41 5d
You mean like this?
❯ cat /usr/bin/kea-lfc
#!/bin/sh
echo "$@" >> /tmp/kea-lfc.parms
exec /usr/bin/kea-lfc.bin -d "$@" >> /tmp/kea-lfc.dbg 2>&1
Offline
Yup.
Offline
Okay, I got the parameters:
❯ cat kea-lfc.parms
-4 -x /var/lib/kea/dhcp4.leases.2 -i /var/lib/kea/dhcp4.leases.1 -o /var/lib/kea/dhcp4.leases.output -f /var/lib/kea/dhcp4.leases.completed -p /var/lib/kea/dhcp4.leases.pid -c ignored-path
Offline
See whether you can make
kea-lfc.bin -4 -x /var/lib/kea/dhcp4.leases.2 -i /var/lib/kea/dhcp4.leases.1 -o /var/lib/kea/dhcp4.leases.output -f /var/lib/kea/dhcp4.leases.completed -p /var/lib/kea/dhcp4.leases.pid -c ignored-path
crash
Last edited by seth (2023-08-03 13:10:54)
Offline
I ran the command repeatedly, but there was no crash. Or at least I never see any trace.
Offline
ISC Kea support gives hands off Arch Linux ... because of the rolling release. I understand their approach. https://gitlab.isc.org/isc-projects/kea … ote_393555
The DHCP server is actually still working "just" experiencing those segfault errors. It's not like it stopped working and I had to restart or start again. So if I didn't check the logs, I wouldn't even know about it.
I've been using ISC Kea for +- 2 years and so far there hasn't been a problem. The problem has dragged on for the last 2-3 months or so. I have tried different versions of Kernel, systemd, but nothing has solved it.
Offline
You're not noticing anything because
kea-lfc is a service process that removes redundant information from the files used to provide persistent storage for the Memfile database backend. This service is written to run as a standalone process.
The annoying part is the non-deterministic behavior, it'll likely be down to some entry in that storage.
Did you meanwhile get a crash with the debug script in place?
Edit: the other user in the upstream bug has the dhcp servers crashing, completely different process, thus likely entirely unrelated problem.
Last edited by seth (2023-08-03 15:19:23)
Offline
"Did you meanwhile get a crash with the debug script in place?"
Not yet
Offline
today's news
I have recorded two crashes... one without using the script and one with it. Unfortunately again no crashdump was generated.
❯ uptime
09:15:18 up 1 day, 11:27, 1 user, load average: 0.00, 0.00, 0.00
First crash without script:
Aug 03 06:47:51 home kernel: kea-lfc[1504]: segfault at 7f5e94000020 ip 00007f5e9c0aa170 sp 00007f5e9b240db0 error 6 likely on CPU 0 (core 0, socket 0)
Aug 03 06:47:51 home kernel: Code: ee 48 89 df e8 b1 f3 06 00 85 c0 0f 85 41 01 00 00 48 8b 05 4a 71 14 00 48 83 e8 01 48 39 e8 0f 82 b5 00 00 00 66 0f 6f 0c 24 <4c> 89 6b 20 0f 11 4b 10 90 48 83 c4
Second crash with deployed script from seth:
Aug 03 23:48:02 home kernel: kea-lfc.bin[3145]: segfault at 7fcef8000020 ip 00007fcf014aa170 sp 00007fcefef11db0 error 6 likely on CPU 1 (core 1, socket 0)
Aug 03 23:48:02 home kernel: Code: ee 48 89 df e8 b1 f3 06 00 85 c0 0f 85 41 01 00 00 48 8b 05 4a 71 14 00 48 83 e8 01 48 39 e8 0f 82 b5 00 00 00 66 0f 6f 0c 24 <4c> 89 6b 20 0f 11 4b 10 90 48 83 c4
❯ coredumpctl list
No coredumps found.
kea-lfc.dbg: https://0x0.st/s/EA3AgLKfaA32o4kMNo_tPg/H2D6.dbg
Offline
None of the debug outputs sticks out and you've one every hour.
Might be a crash-on-exit
Offline
Any other ideas or should I just give up and not deal with it? And expect it to resolve itself over time with an update to something?
Offline
There's probably not much you could or need to do (the process isn't critical, seems to do its ob before segfaulting and also doesn't pester the system w/ coredumps
Upstream likely wants to take a closer look at their process tear-down routines and/or install a signal handler for some internal debugging routines.
Stack corruptions can be nasty to track down, though.
Offline
Thank you all for your help ... especially @seth has given me a huge amount of time. I learned new things and thank you for that.
@"segfault errors" - F*ck you - I'm done with you.
closing...
Offline