You are not logged in.
@cfr: I have no file corruption when running 'journalctl --verify', all the journal files pass the test without showing an error.
# journalctl --verify
PASS: /var/log/journal/fc81bd99216cf2b07c45befd49ca9ff8/system@0004ce4868f13d20-f0e7271c788631a7.journal~
PASS: /var/log/journal/fc81bd99216cf2b07c45befd49ca9ff8/system@0004ce20a303bdf0-8da3a5fefe57b1e2.journal~
PASS: /var/log/journal/fc81bd99216cf2b07c45befd49ca9ff8/system@0004ce991dd5391b-f3315acf4447a6b5.journal~
...
PASS: /var/log/journal/fc81bd99216cf2b07c45befd49ca9ff8/system.journal
PASS: /var/log/journal/fc81bd99216cf2b07c45befd49ca9ff8/system@0004cd6e2e157ae8-4f147c6751f5d67e.journal~
PASS: /var/log/journal/fc81bd99216cf2b07c45befd49ca9ff8/system@0004cd1b03e3b276-ac33100401c62342.journal~
PASS: /var/log/journal/fc81bd99216cf2b07c45befd49ca9ff8/system@0004ce87e96a0721-d3b64809e899fb43.journal~
...
I have 31 rotated files with '.journal~' suffix, the current system.journal recreated at each reboot, and one file with a different name format:
system@8e716738599f481c989455a9cacf0444-0000000000000001-0004cd56757713eb.journal
I don't know why, but it passes the verify command without error.
Today I have:
$ journalctl --disk-usage
Journals take up 22.7M on disk.
it grows continually.
I also added the 'MaxRetentionSec' option in /etc/systemd/journald.conf:
# grep -v '^#' journald.conf
[Journal]
SystemMaxUse=16M
MaxRetentionSec=10day
ForwardToSyslog=no
But this morning it had no effect to the cleanup of the journal directory, even though there are logged messages from more than 10 days old.
I need feedbacks from other systemd users about that: how do they manage the disk usage of the systemd-journald service?
Do they confirm that problem? Thanks for the answers.
Offline
Files suffixed with a tilde are files which were marked as not closed cleanly. I suggest you figure out why this happens for you at shutdown (separate /usr or encrypted root? No shutdown hook?). My guess is that the journal counts these corrupted files for disk usage but doesn't factor them in at all for rotation.
Offline
It's probably a good guess. I can confirm that I also had lots of ~ ending files when this used to be an issue for me, but none now.
Offline
Files suffixed with a tilde are files which were marked as not closed cleanly. I suggest you figure out why this happens for you at shutdown (separate /usr or encrypted root? No shutdown hook?). My guess is that the journal counts these corrupted files for disk usage but doesn't factor them in at all for rotation.
I have about 8 of these.
I have an encrypted root but I definitely have the shutdown hook:
$ grep shutdown /etc/mkinitcpio.conf
# usr, fsck and shutdown hooks.
HOOKS="base udev autodetect pata scsi sata encrypt lvm2 resume filesystems usbinput fsck timestamp shutdown"
However, there are independent reasons to not be terribly surprised if some of mine were not cleanly closed. (Almost certainly my office electricity was triggering emergency shutdowns by my laptop. Since this is obviously not the first explanation which would spring to mind, this took quite a while - and a number of such sudden shutdowns - to figure out. It is still not absolutely confirmed.)
Is it safe to delete these and files which journalctl reports as corrupted? Or is there another procedure for this?
I have:
total 132M
-rw-r----- 1 root adm 516K Tach 27 00:35 system.journal
-rw-r----- 1 root adm 1.5M Hyd 26 14:21 system@0004ccf631c67d61-dad75dfba905fb47.journal~
-rw-r----- 1 root adm 1.1M Hyd 26 14:44 system@0004ccf684d619fd-a35650edb5ca058c.journal~
-rw-r----- 1 root adm 2.0M Hyd 26 17:46 system@0004ccf90e8136a9-16d2b4136e7a6b97.journal~
-rw-r----- 1 root adm 2.3M Hyd 28 21:52 system@0004cd25912f5847-c188e6000eb864ed.journal~
-rw-r----- 1 root adm 3.0M Tach 1 18:09 system@0004cd72ecac6fc2-5b001278e9a0980a.journal~
-rw-r----- 1 root adm 1.3M Tach 1 18:52 system@0004cd738629bb23-985fd7216a96030b.journal~
-rw-r----- 1 root adm 912K Tach 10 00:35 system@0004ce194051c901-2713482b762547c8.journal~
-rw-r----- 1 root adm 1.3M Tach 12 18:59 system@0004ce50e5e117a7-14701a4973e958b9.journal~
-rw-r----- 1 root adm 3.5M Tach 1 22:12 system@1973d1a5c8d04d7bab50f7b7652e2051-0000000000000001-0004cd73855ee4fd.journal
-rw-r----- 1 root adm 4.2M Tach 1 22:16 system@1973d1a5c8d04d7bab50f7b7652e2051-0000000000001149-0004cd765035c9fc.journal
-rw-r----- 1 root adm 6.3M Tach 6 00:55 system@1973d1a5c8d04d7bab50f7b7652e2051-00000000000021f3-0004cd765f3796b9.journal
-rw-r----- 1 root adm 6.3M Tach 6 11:21 system@1973d1a5c8d04d7bab50f7b7652e2051-0000000000004889-0004cdc90ea660ba.journal
-rw-r----- 1 root adm 5.1M Tach 7 21:34 system@1973d1a5c8d04d7bab50f7b7652e2051-000000000000728d-0004cdd1cf828e55.journal
-rw-r----- 1 root adm 4.4M Tach 7 22:13 system@1973d1a5c8d04d7bab50f7b7652e2051-0000000000009542-0004cdee7add9c15.journal
-rw-r----- 1 root adm 4.2M Tach 7 22:18 system@1973d1a5c8d04d7bab50f7b7652e2051-000000000000a62b-0004cdef0648350c.journal
-rw-r----- 1 root adm 3.7M Hyd 28 02:17 system@2e630b6e53b34043ba9f6f4c6ed96ccb-0000000000000001-0004ccf90cc2af82.journal
-rw-r----- 1 root adm 4.2M Hyd 28 02:30 system@2e630b6e53b34043ba9f6f4c6ed96ccb-00000000000013a0-0004cd15269604af.journal
-rw-r----- 1 root adm 3.4M Hyd 28 16:59 system@2e630b6e53b34043ba9f6f4c6ed96ccb-0000000000002438-0004cd15560ffa19.journal
-rw-r----- 1 root adm 3.3M Tach 13 18:47 system@4f69ffa89e954faa9c0d7c38b6c0e26e-0000000000000001-0004ce50e4f727bd.journal
-rw-r----- 1 root adm 4.2M Tach 14 23:34 system@4f69ffa89e954faa9c0d7c38b6c0e26e-0000000000001441-0004ce64d96aa590.journal
-rw-r----- 1 root adm 4.0M Tach 18 04:01 system@4f69ffa89e954faa9c0d7c38b6c0e26e-0000000000003001-0004ce7cf95237be.journal
-rw-r----- 1 root adm 4.2M Tach 18 04:17 system@4f69ffa89e954faa9c0d7c38b6c0e26e-0000000000004539-0004cebd0cc975f3.journal
-rw-r----- 1 root adm 3.8M Tach 19 23:39 system@4f69ffa89e954faa9c0d7c38b6c0e26e-00000000000055d9-0004cebd46d38113.journal
-rw-r----- 1 root adm 4.2M Tach 24 01:15 system@4f69ffa89e954faa9c0d7c38b6c0e26e-00000000000069b9-0004cee1a00025b3.journal
-rw-r----- 1 root adm 1.0M Tach 24 01:19 system@4f69ffa89e954faa9c0d7c38b6c0e26e-00000000000080ca-0004cf336ee1299b.journal
-rw-r----- 1 root adm 660K Tach 24 02:39 system@4f69ffa89e954faa9c0d7c38b6c0e26e-00000000000085b2-0004cf337ed5a8c2.journal
-rw-r----- 1 root adm 572K Tach 24 20:39 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000899f-0004cf349c9dd65d.journal
-rw-r----- 1 root adm 720K Tach 25 01:16 system@4f69ffa89e954faa9c0d7c38b6c0e26e-0000000000008ce4-0004cf43b44ab626.journal
-rw-r----- 1 root adm 692K Tach 25 03:59 system@4f69ffa89e954faa9c0d7c38b6c0e26e-0000000000009203-0004cf47907c366e.journal
-rw-r----- 1 root adm 748K Tach 25 05:01 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000963f-0004cf49da317ca2.journal
-rw-r----- 1 root adm 700K Tach 25 05:05 system@4f69ffa89e954faa9c0d7c38b6c0e26e-0000000000009a10-0004cf4ab6bc0543.journal
-rw-r----- 1 root adm 708K Tach 25 05:07 system@4f69ffa89e954faa9c0d7c38b6c0e26e-0000000000009d0b-0004cf4ac4e81540.journal
-rw-r----- 1 root adm 680K Tach 25 05:13 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000a002-0004cf4acc083d8a.journal
-rw-r----- 1 root adm 680K Tach 25 05:13 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000a301-0004cf4ae0140cdd.journal
-rw-r----- 1 root adm 680K Tach 25 05:15 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000a5f5-0004cf4ae2090d21.journal
-rw-r----- 1 root adm 672K Tach 25 05:15 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000a8eb-0004cf4ae69ba4dc.journal
-rw-r----- 1 root adm 668K Tach 25 05:16 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000abe0-0004cf4ae904e1ee.journal
-rw-r----- 1 root adm 668K Tach 25 05:17 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000aed4-0004cf4aeae9c3bf.journal
-rw-r----- 1 root adm 664K Tach 25 05:18 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000b1c8-0004cf4aed38aef5.journal
-rw-r----- 1 root adm 684K Tach 25 07:23 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000b4be-0004cf4af10c1547.journal
-rw-r----- 1 root adm 720K Tach 25 12:02 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000b8a9-0004cf4cb16b5655.journal
-rw-r----- 1 root adm 724K Tach 25 16:33 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000bdc9-0004cf5098eaafb9.journal
-rw-r----- 1 root adm 720K Tach 25 21:17 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000c2eb-0004cf5461f237b8.journal
-rw-r----- 1 root adm 1.0M Tach 26 00:22 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000c815-0004cf585689d74c.journal
-rw-r----- 1 root adm 612K Tach 26 01:10 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000cf0c-0004cf5aed0fbfc0.journal
-rw-r----- 1 root adm 564K Tach 26 14:11 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000d262-0004cf5b9b35a87f.journal
-rw-r----- 1 root adm 504K Tach 26 20:22 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000d57b-0004cf668134c0e2.journal
-rw-r----- 1 root adm 596K Tach 26 22:29 system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000d840-0004cf6bb151bfc5.journal
-rw-r----- 1 root adm 3.5M Tach 11 00:34 system@9281656745904d03947a69addc4a68ef-0000000000000001-0004ce193fd4e67d.journal
-rw-r----- 1 root adm 3.8M Tach 11 04:25 system@9281656745904d03947a69addc4a68ef-0000000000001848-0004ce2d57b2ba9c.journal
-rw-r----- 1 root adm 4.2M Tach 11 04:33 system@9281656745904d03947a69addc4a68ef-0000000000002db4-0004ce30917983de.journal
-rw-r----- 1 root adm 4.2M Tach 11 16:14 system@9281656745904d03947a69addc4a68ef-0000000000003e68-0004ce30b0b4b585.journal
-rw-r----- 1 root adm 3.5M Tach 12 15:02 system@9281656745904d03947a69addc4a68ef-0000000000004f2a-0004ce3a7a8bb7eb.journal
-rw-r----- 1 root adm 3.5M Hyd 26 00:01 system@d7c85d791846420dbd2a54ed61ec2617-000000000000c4df-0004ccea30056b1d.journal
-rw-r----- 1 root adm 3.2M Hyd 29 16:05 system@e95150f1850c4ee1888ac96b52ffc936-0000000000000001-0004cd25901c5dd8.journal
-rw-r----- 1 root adm 3.7M Hyd 30 14:06 system@e95150f1850c4ee1888ac96b52ffc936-0000000000001444-0004cd34d5817a6d.journal
-rw-r----- 1 root adm 3.5M Hyd 31 23:57 system@e95150f1850c4ee1888ac96b52ffc936-0000000000002b30-0004cd474b5e1183.journal
-rw-r-----+ 1 root adm 184K Hyd 22 00:15 user-1000.journal
(Should be capped at 50M.)
Then again, I have more corruption than yesterday and I haven't had any emergency shutdowns:
...
File corruption detected at /var/log/journal/997d1b9ffaa8efe300b8ff4100003004/system@4f69ffa89e954faa9c0d7c38b6c0e26e-00000000000080ca-0004cf336ee1299b.journal:1045208 (of 1048576, 99%).
...
File corruption detected at /var/log/journal/997d1b9ffaa8efe300b8ff4100003004/system@4f69ffa89e954faa9c0d7c38b6c0e26e-000000000000c815-0004cf585689d74c.journal:1047312 (of 1048576, 99%).
Is it odd these are at the same point? These files have recent times so I assume the corruption isn't caused by the abrupt shutdowns. (I've been avoiding my office electricity until they fit a filtering socket.)
Last edited by cfr (2012-11-27 00:45:25)
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
From falconindy's post and also looking into the systemd journal source code, I found out that I actually had a problem with probably an unclean shutdown of the systemd-journald service.
That problem caused the creation of the '*.journal~' files at startup, and prevented the cleaning of the journal directory (though I don't understand why, from the source code, the cleaning is prevented by that; but it is the case from observation).
As I have not a separate /usr or an encrypted root, it was the shutdown hook in the initial ramdisk which was missing.
So I added it, and after the first reboot the journald process takes up 100% cpu usage of one core, without ending running like that.
I rebooted again, and it was not able to boot and put me in the systemd emergency prompt. Not good at this point.
Then I manually poweroff/restart the machine using the power button, and it could reboot apparently without problem.
And at last the cleaning of the journald directory was done! It removed all the files containing log messages older than 10 days.
I now have:
$ journalctl --disk-usage
Journals take up 7.8M on disk.
So in my case the solution is not the re-creation of the /var/log/journal directory, but the addition of the shutdown hook in the initial ramdisk, which consequently seems required for the systemd journal to work properly.
I will wait several days to confirm that nothing nasty happens after that change, and will post again about that.
Offline
Am I supposed to do anything else to handle the encrypted root and keep systemd happy? I don't actually use systemd to handle the encryption stuff, I don't think, because systemd only starts after it is decrypted. Is this the source of the problem?
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
From falconindy's post and also looking into the systemd journal source code, I found out that I actually had a problem with probably an unclean shutdown of the systemd-journald service.
That problem caused the creation of the '*.journal~' files at startup, and prevented the cleaning of the journal directory (though I don't understand why, from the source code, the cleaning is prevented by that; but it is the case from observation).As I have not a separate /usr or an encrypted root, it was the shutdown hook in the initial ramdisk which was missing.
So I added it, and after the first reboot the journald process takes up 100% cpu usage of one core, without ending running like that.
I rebooted again, and it was not able to boot and put me in the systemd emergency prompt. Not good at this point.Then I manually poweroff/restart the machine using the power button, and it could reboot apparently without problem.
And at last the cleaning of the journald directory was done! It removed all the files containing log messages older than 10 days.
I now have:$ journalctl --disk-usage Journals take up 7.8M on disk.
So in my case the solution is not the re-creation of the /var/log/journal directory, but the addition of the shutdown hook in the initial ramdisk, which consequently seems required for the systemd journal to work properly.
I will wait several days to confirm that nothing nasty happens after that change, and will post again about that.
The shutdown hook is a requirement of having /usr on a separate partition. systemd is just the messenger.
Offline
I've basically stopped the service, removed the old journal directory, restarted the service, realised I forgot to recreate the directory, recreated it, restarted the service etc. Time will tell.
I'm thinking maybe the abrupt shutdowns caused by the electricity in my office may have played havoc. If so, it should be OK now. (They are meant to be putting a work around in place but it is dragging on. In the meantime, I don't use that electricity so no more sudden shutdowns.) I'd still like to know if that is why...
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
This morning at the restart of my desktop computer, I got the issue of the journald process taking up again permanently 100% cpu usage of one core.
The command
# systemctl stop systemd-journald
stops the 100% cpu usage, but when I ran
# systemctl start systemd-journald
the 100% cpu usage started again.
I then decided to re-create the /var/log/journal:
# systemctl stop systemd-journald
# mv /var/log/journal /var/log/journal.bak
# mkdir /var/log/journal
# systemctl start systemd-journald
The journald process is now quiet again. But is this solved?
I have to see at the next reboots if that issue reappears, and like cfr wrote 'Time will tell'.
The shutdown hook is a requirement of having /usr on a separate partition. systemd is just the messenger.
My /usr is not on a separate partition. And the journald service doesn't seem to shutdown cleanly without this hook.
Offline
In my case, I'm not sure it is clean even with it. (I need it, though, as / is encrypted - I'm just not sure what more might be.)
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
Since I added the shutdown hook, trying to solve the systemd cleaning problem of the journald files, I see disturbing bad consequences:
at the machine restart today, I was again placed at the recovery prompt of the initial ramdisk fs;
I pressed the button for a hot reboot (without poweroff) and the machine started normally, it seems, as well as the systemd-journald service;
then I wanted to test another reboot using the command line 'reboot' in a console;
the machine restarted, but the journald process again took up 100% of cpu usage on one core;
I stopped the journal and I wanted to recreate the initial ramdisk with mkinitcpio,
but as I saw an available kernel update, I chose to do a full upgrade with 'pacman -Syu' instead;
I rebooted but only after a complete poweroff, using the 'poweroff' command in a console;
after that everything is working quietly again.
But all this is not very reassuring to me and I am troubled now; I don't know what to think about all that.
$ grep -v '^#' mkinitcpio.conf
MODULES=""
BINARIES=""
FILES=""
HOOKS="base udev autodetect pata scsi sata filesystems usbinput shutdown"
Last edited by berbae (2012-11-29 15:15:10)
Offline
Maybe try adding fsck? Not sure...
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
Offline
Thank you for the answers.
This morning, at restart, the journald process took up again 100% cpu usage of one core.
When this happens it doesn't log anything in the journal files and there is no failing services I can see: all is working normally apart from the 100% cpu usage of the journald process. Nothing appears abnormal to me in the journal logged messages.
If I stop/start the systemd-journald service, it again uses the 100% cpu usage, and here are the logged messages:
nov. 30 09:42:29 arch64 systemd[1]: Stopping Journal Service...
nov. 30 09:42:29 arch64 systemd-journal[120]: Journal stopped
nov. 30 10:10:08 arch64 systemd-journal[1301]: Allowing system journal files to grow to 16.0M.
nov. 30 10:10:08 arch64 systemd-journald[120]: Sleeping for 691728286 ms
nov. 30 10:10:08 arch64 systemd-journald[120]: Sleeping for 691728286 ms
... (72 such lines)
nov. 30 10:10:08 arch64 systemd-journald[120]: Sleeping for 691728286 ms
nov. 30 10:10:08 arch64 systemd-journald[120]: Sleeping for 691728285 ms
nov. 30 10:10:08 arch64 systemd-journald[120]: Received SIGTERM
nov. 30 10:10:08 arch64 systemd[1]: Stopped Journal Service.
nov. 30 10:10:08 arch64 systemd[1]: Starting Journal Service...
nov. 30 10:10:08 arch64 systemd[1]: Started Journal Service.
nov. 30 10:10:08 arch64 systemd-journal[1301]: Journal started
I stopped the journal at 09:42 and restarted it at 10:10.
Then I stopped again the journal at 10:11, and just after that I logged out of GUI, which caused the journald to automatically start again; here are the logged messages for that:
nov. 30 10:11:16 arch64 systemd[1]: Stopping Journal Service...
nov. 30 10:11:16 arch64 systemd-journal[1301]: Journal stopped
nov. 30 10:11:43 arch64 systemd-journal[1307]: Allowing system journal files to grow to 16.0M.
nov. 30 10:11:43 arch64 systemd-journald[1301]: Sleeping for 690001483 ms
nov. 30 10:11:43 arch64 systemd-journald[1301]: Sleeping for 690001483 ms
nov. 30 10:11:43 arch64 systemd-journald[1301]: Received SIGTERM
nov. 30 10:11:43 arch64 systemd[1]: Stopped Journal Service.
nov. 30 10:11:43 arch64 systemd[1]: Starting Journal Service...
nov. 30 10:11:43 arch64 systemd[1]: Started Journal Service.
nov. 30 10:11:43 arch64 systemd-journal[1307]: Journal started
nov. 30 10:11:43 arch64 su[1100]: pam_unix(su-l:session): session closed for user root
nov. 30 10:12:47 arch64 polkitd[1082]: Registered Authentication Agent for unix-process:1312:284743 (system bus name :1.9 [/usr/bin/pkt
nov. 30 10:12:47 arch64 systemd-logind[310]: System is rebooting.
Then I rebooted at 10:12 with the 'reboot' command and the systemd-journald service is quiet again.
And there is no failed service dumping the journal with error messages when the journald process consumes 100% cpu usage.
From the present observations the systemd-journald service works normally every two boot/reboot, but I don't know if this is a rule here.
Edit at 11:58 : that may be an effective rule, because I had to poweroff the computer, and when I restarted it, I saw the 100% cpu usage of the journald process and after another 'reboot' command, it is quiet again.
I am stumped here presently...
Last edited by berbae (2012-11-30 11:11:48)
Offline
Thank you for the answers.
This morning, at restart, the journald process took up again 100% cpu usage of one core.
When this happens it doesn't log anything in the journal files and there is no failing services I can see: all is working normally apart from the 100% cpu usage of the journald process. Nothing appears abnormal to me in the journal logged messages.
strace is your friend.
Offline
strace is used at the command line, but the systemd-journald is managed by systemd, not by me at the CLI.
And am I required to study in detail the C source code of the journald to use it properly?
It seems almost impossible to master a source code written by another developer, and understand it enough to correct a dysfunction.
I don't think I can do that in this case, and it should not be a requirement when only using a software not developing it.
So what are my options now, if I cannot do what you suggest to me?
Offline
Is there an automated or routine way to deal with jounral files left by unclean shutdowns or which contain corruption?
(Apparently the problem with my laptop shutting down but only in my office is not fixed... I'm not sure if this is responsible for the corruption but probably so. Sure it is responsible for the ~ files.)
CLI Paste | How To Ask Questions
Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L
Offline
@cfr: From my observations it seems that the cleaning of the journal is prevented only when the last journald shutdown was unclean; something happens at the opening of the uncleanly closed journal which prevents the cleaning procedure to be ran; the corrupted journal file is renamed with the '.journal~' suffix and a new journal file is started, but no cleaning is executed.
But if the last shutdown was clean, the cleaning seems to work and to be applied to all the files in the journal directory, even the '.journal~' files left from previous unclean shutdowns. And the parameters in /etc/systemd/journald.conf are used for the cleaning.
This is only what I observed on my machine and have understood myself.
It seems also that a rotation/cleaning of the journal files can be forced using the SIGUSR2 signal:
see 'man systemd-journald.service',
and the content of '/usr/lib/systemd/system/systemd-journal-flush.service' where the SIGUSR1 signal is used to force a flush.
I will post again about my situation, but for clarity I preferred to try an answer first to cfr's preceding post.
Last edited by berbae (2012-12-01 14:58:14)
Offline
short story:
1) Since the last kernel upgrade, I don't seem to have startup issues anymore, which puts me in the recovery mode.
2) I returned to the default value '#MaxRetentionSec=' intead of 'MaxRetentionSec=10day' in /etc/systemd/journald.conf,
and since then I didn't see the 100% cpu usage of the journald process every two boots/reboots.
longer story:
I used the strace utility when the journald process was stuck at full usage of one core;
I let it run 30s and got a >40M trace file with ~185000 lines resembling these:
writev(6, [{"<46>", 4}, {"systemd-journald", 16}, {"[154]: ", 7}, {"Sleeping for 603758694 ms", 25}, {"\n", 1}], 5) = 53
epoll_wait(7, {?} 0x7fffdaf01280, 1, 603758694) = 1
read(8, "46,15231914,104321394,-;systemd-"..., 8192) = 73
So the process seemed trapped in an infinite loop and it tried to write indefinitely the message "Sleeping for xxxxxx ms"
in the journal file, where all these messages were not visibled because there is a limitation to the number of messages
written in a given time.
Looking into the C souce code of 'journald.c', I saw this part:
if (server.max_retention_usec > 0 && server.oldest_file_usec > 0) {
/* The retention time is reached, so let's vacuum! */
if (server.oldest_file_usec + server.max_retention_usec < n) {
log_info("Retention time reached.");
server_rotate(&server);
server_vacuum(&server);
continue;
}
/* Calculate when to rotate the next time */
t = (int) ((server.oldest_file_usec + server.max_retention_usec - n + USEC_PER_MSEC - 1) / USEC_PER_MSEC);
log_info("Sleeping for %i ms", t);
}
where it appears that the message "Sleeping for" is written only when there is a value in the MaxRetentionSec parameter.
So I worked around that problem by removing its value. It's not a solution, it only masks a possible bug here;
but I tried to study the source code and I could not understand the exact mechanism involved here, ie why it was trapped in an infinite loop and why it worked every two boots/reboots.
I am sorry for not going further into the study of this dysfunction, but presently I want to do something else than that debugging task.
If someone is interested, maybe there is a bug to fix here in the code...
Also I don't know if the absence of cleaning of the journald directory after an unclean shutdown of the systemd-journald service is intended or if it can be considered a dysfunction/bug...
Finally I confirm that since I added the shutdown hook in the initial ramdisk, the shutdown of the systemd-journald service is clean,
and I do not get '.journal~' files anymore.
So the shutdown hook seems required for the systemd journal, even without a /usr on a separate partition.
Last edited by berbae (2012-12-02 16:13:03)
Offline
@berbae, have you at least filed a bug report?
Offline
Yeah, that's a pretty clear bug. If epoll_wait() continually returns a positive number, it means there's an unhandled event on the epoll FD.
Feel free to file a bug: https://bugs.freedesktop.org/enter_bug. … ct=systemd
Offline
'SystemMaxUse' seems now to be respected since systemd 198-2 and the patch to fix vacuum problem added to it.
I have:
SystemMaxUse=16M
and
$ cd /var/log/journal/
$ du -h
16M ./xxxxxxxxxxxxxxxxxxxxx (number is masked here)
16M .
Offline
I also noticed that my logs are no longer corrupted every 2 seconds (<--exaggeration)!!!
I think that this means that mine will end up being respected as well. For a while I was just deleting the old logs after some time in order to keep the size down. I also have no tildes after any of my journals. I did see some stuff on the systemd mailing list about flushing the journal out to disk at some sane interval, which maybe has helped... I didn't read those threads closely enough to be able to say whether or not this is what it was handling.
Offline