You are not logged in.

#1 2013-03-04 02:54:29

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

journald corruption

There have been several threads about this but the usual diagnosis seems to be that corruption is caused by lack of a shutdown hook or by unclean shutdowns of the system.

I'm seeing regular journal corruption even when I've had no unclean shutdowns and despite using the shutdown hook.

My question is whether this is considered "normal". That is, I assume it is not considered how things are meant to work but if it is a known issue experienced by everyone or a number of people, that's one thing. If not, I would like to know how I might investigate what might be causing it. Since the files are binary, I'm not sure how to tell what is occurring at the points journald identifies as problematic or quite what I'm meant to do with the information journalctl reports as the nature of the corruption of any particular file.

As far as I can tell, this isn't part of file system corruption more generally. (I see fsck clean stuff if I get an unclean shutdown, of course, but that's only expected. Otherwise everything seems OK.) That is, I don't think it is a hardware problem - at least, not a disk problem - or a general problem with the filesystem.

If this is "normal" is there any service which will remove corrupted journal files automatically? Checking for them and removing them is now routine but I don't see much good in doing it manually since the information I get about the corruption doesn't help to resolve the underlying problem.

Of course, I wouldn't want to bury the evidence if this is not expected as then I'd like to address the underlying cause if possible.

Note: Please refrain from turning this into a discussion of the relative merits or demerits of systemd. Experience unfortunately suggests it is not safe to take such courtesy for granted.


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#2 2013-03-04 10:40:28

berbae
Member
From: France
Registered: 2007-02-12
Posts: 1,302

Re: journald corruption

There are these bug reports with status NEW:
journald doesn't rotate uncleanly closed files : last change 2012-12-20
systemd-journald eats 100% CPU : last change 2013-02-27
But they are not even assigned...

But I could see nothing about log file corruptions. Maybe you can try to open a new bug;
or post in the mailing list where developers are active and answer posts.

You could also use the 'SplitMode' parameter of /etc/systemd/journald.conf to try to see where the corruption comes from, user login or system.

Last edited by berbae (2013-03-04 10:41:02)

Offline

#3 2013-03-04 15:44:06

ninian
Member
From: United Kingdom
Registered: 2008-02-24
Posts: 726
Website

Re: journald corruption

cfr wrote:

There have been several threads about this but the usual diagnosis seems to be that corruption is caused by lack of a shutdown hook or by unclean shutdowns of the system.

I'm seeing regular journal corruption even when I've had no unclean shutdowns and despite using the shutdown hook.

My question is whether this is considered "normal" ...

Several times I have deleted my journals and started afresh, but still on all my systems there are numerous messages about logs being corrupted when I issue the command:

journalctl --verify

I really don't think there is anything much amiss - there are no other signs of trouble on my systems, but I will have a look at berbae's suggestions (eg SplitMode) to see whether I can pin things down a bit. Just for the record, in case anyone else is curious, here's the kind of message I get:

FAIL: /var/log/journal/673a82dd9c2a93960ce98f2a49d8c441/system@9c8ac2c72d85411b95f30b8971a4da8c-000000000000088c-0004d66756569b6c.journal (Bad message)
Invalid object contents at 296864██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  49%
File corruption detected at /var/log/journal/673a82dd9c2a93960ce98f2a49d8c441/system@9c8ac2c72d85411b95f30b8971a4da8c-0000000000000e29-0004d67ae7f00466.journal:296864 (of 303104, 97%).

Curious ...

Offline

#4 2013-03-04 17:08:38

sano
Member
Registered: 2012-02-11
Posts: 114

Re: journald corruption

berbae wrote:

You could also use the 'SplitMode' parameter of /etc/systemd/journald.conf to try to see where the corruption comes from, user login or system.

'SplitMode' is set to 'login' by default, so I went through the output of 'journactl --verify'. All my corrupted files come from system.

When this problem first appeared (after enabling persisten storage) I remember a message flashing by sometimes on reboot/shutdown about "too many arguments" or "argument too long", which I never was able to find in the logs. I tried to catch those messages by shutting down with 'systemctl halt' for a while, but gave up on this after some unsuccessful attempts.

Offline

#5 2013-03-04 22:07:02

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: journald corruption

Yes, those are the type of messages. I cleaned up yesterday, but I already have another 2 files corrupted:

...
Invalid object contents at 84064░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   0%
File corruption detected at /var/log/journal/997d1b9ffaa8efe300b8ff4100003004/system@2e9601889caf4e4ea08de54ea07ba3bf-0000000000015d73-0004d7096aeede08.journal:84064 (of 86016, 97%).
FAIL: /var/log/journal/997d1b9ffaa8efe300b8ff4100003004/system@2e9601889caf4e4ea08de54ea07ba3bf-0000000000015d73-0004d7096aeede08.journal (Bad message)
...
Invalid object contents at 38352░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░   0%
File corruption detected at /var/log/journal/997d1b9ffaa8efe300b8ff4100003004/system@2e9601889caf4e4ea08de54ea07ba3bf-0000000000015dd0-0004d709b66ab773.journal:38352 (of 712704, 5%).
FAIL: /var/log/journal/997d1b9ffaa8efe300b8ff4100003004/system@2e9601889caf4e4ea08de54ea07ba3bf-0000000000015dd0-0004d709b66ab773.journal (Bad message)
...

It is annoying because the journal does not seem to manage the amount of disk space it uses correctly if corrupt logs are present.

I am very pleased to here that others are seeing similar issues with no signs of any other trouble. That is, I'm not glad you are having issues per se but I am pleased that it suggests some sort of bug rather than, say, that my hard drive is dying!

I don't quite understand what the SplitMode being set to login by default does. All the cases I've seen start system@...  but the same is true of all my journal logs so I don't understand what is being split? Or are there other journal logs elsewhere which I'm not aware of?


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#6 2013-03-05 00:50:06

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: journald corruption

Just to put you one post closer to being a bit more at ease with this, I too have noticed this issue. It is rather annoying that this happens consistently.

I was just never sure what to do about it and since it east hurting anything, I just left it and lodged a note in the back of my mind to keep an eye out for a solution.

Maybe the systemd mailing list would be the best place to ask about this....

Offline

#7 2013-03-05 00:55:07

richli
Member
From: USA
Registered: 2012-03-20
Posts: 15

Re: journald corruption

I was about to make my own thread, but I'll just chime in here instead. I, too, have some journal files that are corrupted. My main problem, however, is that occasionally systemd-journal takes up 100% cpu. If I kill it, it just respawns and still uses 100% cpu.

I'm not sure what triggers it, it may only happen after I resume from suspend/hibernation.

Offline

#8 2013-03-05 01:05:24

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: journald corruption

richli, I think that is a whole separate issue.  Corrupt journal files may or may not be a result of that, but the thread here is regarding corrupted journals without further issues.

Search the threads, there area  couple other threads with your exact issue around here.

Offline

#9 2013-03-05 03:24:23

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: journald corruption

WonderWoofy wrote:

Just to put you one post closer to being a bit more at ease with this, I too have noticed this issue. It is rather annoying that this happens consistently.

I was just never sure what to do about it and since it east hurting anything, I just left it and lodged a note in the back of my mind to keep an eye out for a solution.

Maybe the systemd mailing list would be the best place to ask about this....

Thank you. That is reassuring - especially coming from somebody who seems as well-versed in systemd as you are.

Perhaps the systemd mailing list would be a good place. I'll procrastinate a little more just in case somebody happens along with a solution but I am beginning to think this must be a bug. At least I can rule out any link with my ongoing hardware issues.


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#10 2013-03-08 22:44:48

ninian
Member
From: United Kingdom
Registered: 2008-02-24
Posts: 726
Website

Re: journald corruption

Adding the 'shutdown' hook to /etc/mkinitcpio.conf does seem to be working for me now on my main fully updated system. No corruption has occurred since I added the hook several days ago.
Will now persevere with other systems ... fingers crossed.

Offline

#11 2013-03-08 23:40:20

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: journald corruption

Yeah, I have the shutdown hook and corruption seems to happen to my journals more often than not.  I hope it works for you ninian, but don't be surprised if it doesn't.

Offline

#12 2013-03-09 00:21:21

tekstryder
Member
Registered: 2013-02-14
Posts: 114

Re: journald corruption

Another me-too for the OP. Journals are regularly corrupted without unclean shutown instances. Conversly, an unclean shutdown does not always result in corrupt journals.

As the size limit (SystemMaxUse) is ignored, apparently due to the corrupt journal files, I regularly stop the journald service, delete all but the current system and user log files and restart the service. I've never seen the journal files grow to the 20MB limit I've set sans corruption, so I don't know if they would ever be properly rotated.

All other systemd/journal functions are nominal. It would be nice to not need to babysit journald though!

Offline

#13 2013-03-09 00:30:05

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: journald corruption

I've started deleting them without even stopping the service. As I understand it, systemd starts a new journal when corruption occurs so I assume it cannot be accessing them at the time.


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#14 2013-03-09 05:22:25

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: journald corruption

I thought that it started a new journal upon corruption as well, but the last time I cleaned them out, I did a verify after deleting everything but the present journal and found that the only remaining (active) journal was actually corrupt!  So I haven't a clue what it is that we do or do not do to cause this.  Keep in mind though that if you want to force rotation, you can send it the -USR2 signal with kill[all].

Offline

#15 2013-03-09 23:30:18

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: journald corruption

I've been wondering about just automating it with a cron job. (Though it would be wonderful to do it as a systemd service, I think.) In that case, I would stop journald and restart it. However, I hesitate to actually do this in case there are compelling reasons not to. It is just that the corrupt files seem to be useless and given no useful information whatsoever. They just stop the journal limiting the space it eats up. Not sure...

This is the biggest downside of systemd for me. I didn't get this problem with plain text log files! (And I could read them with the most basic tools - less, cat, more, whatever.)


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#16 2013-03-10 01:10:54

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: journald corruption

Yeah, I understand what you mean by "the biggest downside of systemd" as it is just kind of frustrating that this is one area in which you aren't provided any clear information about what is going awry.

When I set up systemd on my home media server, I installed syslog-ng as I didn't fully trust binary logs either.  Interestingly though, I set the logs to persistent on my server for a bit and they don't seem to corrupt.  But since I already have syslog-ng running perfectly fine, I just set it back to volatile after a few days.  Though this machine is not shutdown unless a kernel update occurs though (which has been dreadful lately because it is LVM2, but that is a whole different story).

Offline

#17 2013-03-10 16:46:43

berbae
Member
From: France
Registered: 2007-02-12
Posts: 1,302

Re: journald corruption

I have not found a solution, but I looked into the source code of the journalctl verify feature.

The 'Invalid object contents at xxx' message comes from the 'journal_file_object_verify()' function in 'journal-verify.c' source code;
it happens at the very beginning of the verification procedure;
here are the comments regarding that, in the source code:

        /* First iteration: we go through all objects, verify the
         * superficial structure, headers, hashes. */

and in the 'journal_file_object_verify()' function which is called for each objects in a journal file:

        /* This does various superficial tests about the length and
         * possible field values. It does not follow any references to
         * other objects. */

Inside that function there are 23 possible causes to generate an error resulting in the ' Invalid object contents at xxx' message;
but no debug feature is integrated to tell which one is implicated.

I try to find a way to show the entire object in the position xxx in a journal file, so as to see something corrupted there; but I don't see how to use that value; if someone would mind to help me with that, I would be glad...
or to add debug outputs in the 'journal_file_object_verify()' function...

I'm seeking ideas to study and possibly contribute to clarify that issue.

Last edited by berbae (2013-03-10 16:52:06)

Offline

#18 2013-03-10 22:47:40

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: journald corruption

Maybe the systemd mailing list would be a good place to ask about that? That would also tell you if anybody else is working on the issue.


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#19 2013-03-11 14:22:52

Lekensteyn
Member
From: Netherlands
Registered: 2012-06-19
Posts: 192
Website

Re: journald corruption

FYI, journald does not flush its contents to disk unless you rotate logs (USR2) or shut down normally. I lost some kernel logs due to this fact, see http://unix.stackexchange.com/q/67394/8250 for details (and the Arch Wiki article that is linked in it: https://wiki.archlinux.org/index.php/Rs … _messages)

Offline

#20 2013-03-11 15:41:27

berbae
Member
From: France
Registered: 2007-02-12
Posts: 1,302

Re: journald corruption

That's interesting, thanks for the infos Lekensteyn;
but the log file corruptions apparently happen during normal operations of journald, and are then found in already flushed and rotated log files.
So I don't find this no regular flush issue evidently related to the thread subject; at least it seems to me...

Nevertheless it shows that the systemd-journald log service has several serious limitations presently, and needs many improvements for future releases.

Offline

#21 2013-07-07 13:01:31

zbyszek
Member
Registered: 2013-07-07
Posts: 2

Re: journald corruption

It'd be great if you could try with recent version of systemd (e.g. 205). You can run journalctl directly from the build directory, without installation, if that is easier. It is possible that the "curruptions" you are seeing are just journalctl being overzealous.

Offline

#22 2013-07-07 14:23:01

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: journald corruption

zbyszek wrote:

It'd be great if you could try with recent version of systemd (e.g. 205). You can run journalctl directly from the build directory, without installation, if that is easier. It is possible that the "curruptions" you are seeing are just journalctl being overzealous.

Zbigniew, are you saying to grab the systemd 205 sources, build it (but don't make install), then just point systemd-journald.service to that built binary instead of /lib/systemd/systemd-journald?

Offline

#23 2013-07-07 14:45:16

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: journald corruption

I can't test this right now because I am curiously free from corrupt journal files for once (with systemd 204).

I thought that Zbigniew was just suggesting running e.g. journalctl --verify from the build directory. That is, journalctl 205 might not see files as corrupt which journalctl 204 identifies as corrupt if 205 is less zealous than 204. That's why I'm thinking I'm not currently in a position to test.


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#24 2013-07-07 15:30:20

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: journald corruption

The new systemd breaks user sessions for now (while the cgroup slice stuff gets worked out)... so I was a bit apprehensive about trying it.  But if I can run journald from the build directory, then I think I would have no problem testing this out.  Maybe I can get to it this afternoon.

The way I read Zbigniew's post was that journald was being overzealous is the way it functioned in general... not in the way it was simply checking for corruption.  This is why I was asking, as I was really not sure what he meant by that exactly. 

But in any case, I think that either way should not be terribly hard to try.  In fact, my system is still set up to use "startx" if I please, so maybe I will just upgrade to 205 and see what happens.  The worst that can happen is that I come back to 204 (and with btrfs snapshots... there is no risk!).

Offline

#25 2013-07-07 15:59:00

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: journald corruption

Are you seeing corruption with 204, then? I figured that was necessary for comparison purposes.


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

Board footer

Powered by FluxBB