Cannot use systemd-coredumpctl on Firefox cores

codemac · 2014-04-30 23:34:09

My firefox process is dying regularly and is crashing and creating core dumps. I think it has something to do with an interaction between a few of my plugins and extensions, but it's only a theory so far.

I didn't know where the corefiles went and I noticed core_pattern was a pipe!

% cat /proc/sys/kernel/core_pattern
|/usr/lib/systemd/systemd-coredump %p %u %g %s %t %e

So systemd is taking all cores instead of just the daemons/procs it runs. This is ... definitely a bit controversial for me, but I'll worry about that later. I've found the systemd-coredumpctl util, and I can find the firefox core files inside of it:

% systemd-coredumpctl
TIME                                         PID   UID   GID SIG EXE
[... snipped some lines ...]
              Sun 2014-04-20 23:15:33 PDT  21858  1000   100  11 /usr/lib/firefox/firefox
              Thu 2014-04-24 21:55:17 PDT  10059  1000   100  11 /usr/lib/firefox/firefox
              Mon 2014-04-28 16:17:37 PDT  25162  1000   100  11 /usr/lib/firefox/firefox
              Tue 2014-04-29 18:14:13 PDT   5607  1000   100  11 /usr/lib/firefox/firefox
              Wed 2014-04-30 13:22:20 PDT  30645  1000   100  11 /usr/lib/firefox/firefox

Ok sweet, so it's there, let's try and use it..

% systemd-coredumpctl gdb
TIME                                         PID   UID   GID SIG EXE
              Wed 2014-04-30 13:22:20 PDT  30645  1000   100  11 /usr/lib/firefox/firefox
Failed to retrieve COREDUMP field: No such file or directory

This *works* for corefiles other than firefox. Just call systemd-coredumpctl gdb blah and it brings up the proper gdb session. Not so for any firefox core. My next thought was to get the core file out, as maybe the firefox binary was a shell script or something, and didn't really reference the object that gdb wanted to look for.

% systemd-coredumpctl dump
TIME                                         PID   UID   GID SIG EXE
              Wed 2014-04-30 13:22:20 PDT  30645  1000   100  11 /usr/lib/firefox/firefox
Refusing to dump core to tty
% systemd-coredumpctl dump > ~/firefox.core
TIME                                         PID   UID   GID SIG EXE
              Wed 2014-04-30 13:22:20 PDT  30645  1000   100  11 /usr/lib/firefox/firefox
Failed to retrieve COREDUMP field: No such file or directory

Ok so now I'm getting upset - let's look at the systemd-coredumpctl source code

From: https://github.com/systemd/systemd/blob … ctl.c#L402

        r = sd_journal_get_data(j, "COREDUMP", (const void**) &data, &len);
        if (r < 0) {
                log_error("Failed to retrieve COREDUMP field: %s", strerror(-r));
                return r;
        }

Ok, so getting the data out of the journal is failing with an ENOENT it seems.. Let's look at sd_journal_get_data:
https://github.com/systemd/systemd/blob … al.c#L1956

_public_ int sd_journal_get_data(sd_journal *j, const char *field, const void **data, size_t *size) {
        JournalFile *f;
        uint64_t i, n;
        size_t field_length;
        int r;
        Object *o;

        assert_return(j, -EINVAL);
        assert_return(!journal_pid_changed(j), -ECHILD);
        assert_return(field, -EINVAL);
        assert_return(data, -EINVAL);
        assert_return(size, -EINVAL);
        assert_return(field_is_valid(field), -EINVAL);

        f = j->current_file;
        if (!f)
                return -EADDRNOTAVAIL;

        if (f->current_offset <= 0)
                return -EADDRNOTAVAIL;

        r = journal_file_move_to_object(f, OBJECT_ENTRY, f->current_offset, &o);
        if (r < 0)
                return r;

        field_length = strlen(field);

        n = journal_file_entry_n_items(o);
        for (i = 0; i < n; i++) {
                uint64_t p, l;
                le64_t le_hash;
                size_t t;

                p = le64toh(o->entry.items[i].object_offset);
                le_hash = o->entry.items[i].hash;
                r = journal_file_move_to_object(f, OBJECT_DATA, p, &o);
                if (r < 0)
                        return r;

                if (le_hash != o->data.hash)
                        return -EBADMSG;

                l = le64toh(o->object.size) - offsetof(Object, data.payload);

                if (o->object.flags & OBJECT_COMPRESSED) {

#ifdef HAVE_XZ
                        if (uncompress_startswith(o->data.payload, l,
                                                  &f->compress_buffer, &f->compress_buffer_size,
                                                  field, field_length, '=')) {

                                uint64_t rsize;

                                if (!uncompress_blob(o->data.payload, l,
                                                     &f->compress_buffer, &f->compress_buffer_size, &rsize,
                                                     j->data_threshold))
                                        return -EBADMSG;

                                *data = f->compress_buffer;
                                *size = (size_t) rsize;

                                return 0;
                        }
#else
                        return -EPROTONOSUPPORT;
#endif

                } else if (l >= field_length+1 &&
                           memcmp(o->data.payload, field, field_length) == 0 &&
                           o->data.payload[field_length] == '=') {

                        t = (size_t) l;

                        if ((uint64_t) t != l)
                                return -E2BIG;

                        *data = o->data.payload;
                        *size = t;

                        return 0;
                }

                r = journal_file_move_to_object(f, OBJECT_ENTRY, f->current_offset, &o);
                if (r < 0)
                        return r;
        }

        return -ENOENT;
}

Ok - now I'm officially lost. I'm way too inexperienced with systemd source code to make decent progress this route.

Either journal_file_entry_n_items returned 0 items, which is confusing because it's part of the systemd-coredumpctl listing?, or one of these calls to journal_file_move_to_object is returning -ENOENT. I can't find anywhere that this is actually true in it's call graph.. so I'm going to assume that journal_file_entry_n_items returned 0.

What does this mean? How do I fix this?

As an aside - I'd like to debug my larger firefox issue without changing how corefiles are handled in a default arch install, as that seems a bit much.. but if anyone knows how to disable the systemd corefile handling on any process not launched by systemd, but keep using it for daemons (I can totally see the need for better corefile handling with these auto-started processes) please let me know!

Last edited by codemac (2014-04-30 23:38:30)

codemac · 2014-05-20 18:43:47

I want to mention that I *still* haven't fixed this!

I'm at a loss of what to do, and this bug continues in systemd-coredumpctl. Would any Arch devs like to clarify why we're using this broken piece of systemd for the time being? Or am I overreacting and I'm the only one who's ever seen this?

If I fix this I will post back here, but it's looking pretty unlikely.

alexis_evo · 2014-06-05 04:03:33

Exact same problem here with qemu.

[root@snuggles alexis]# systemd-coredumpctl gdb 2366
TIME                                         PID   UID   GID SIG EXE
              Wed 2014-06-04 21:57:01 MDT   2366     0     0  11 /usr/bin/qemu-system-x86_64
Failed to retrieve COREDUMP field: No such file or directory

bspar · 2014-06-06 04:37:05

Well, here's another "me too." With chrome this time:

bspar@bspararch:/x/src/ > out/Release/chrome --user-data-dir=/home/bspar/Documents/tmp --no-sandbox | tools/valgrind/asan/asan_symbolize.py| c++filt
...
zsh: abort (core dumped)  out/Release/chrome --user-data-dir=/home/bspar/Documents/tmp | 
zsh: done                 tools/valgrind/asan/asan_symbobspar@bspararch:/cacisecure/BITS/src/ > sudo systemd-coredumpctl gdb 6061

bspar@bspararch:/x/src/ > sudo systemd-coredumpctl gdb 6061
TIME                                         PID   UID   GID SIG EXE
              Fri 2014-06-06 00:15:38 EDT   6061  1000  1000   6 /x/src/out/Release/chrome
Failed to retrieve COREDUMP field: No such file or directory

ewaller · 2014-06-06 05:03:09

I am not an expert, and have only played with core dumps from crashware that I write. It seems the three programs mentioned here probably have really big core files. Perhaps the system does not keep really big cores by default?

bspar · 2014-06-06 05:22:54

Yep, looks like that's the problem. journalctl:

Jun 06 01:11:25 bspararch systemd-coredump[10820]: Core too large, core will not be stored.
Jun 06 01:11:25 bspararch systemd-coredump[10820]: Process 10814 (chrome) dumped core.

I don't know how I missed that before... Thanks for the tip

Now, how would I work to fix that? I've done a little research on my own, but I'll have to look into it more tomorrow - it's getting late over here. And I'm not too familiar with all of ulimit, but I'm still having problems with this configuration:

bspar@bspararch:/x/BITS/src/ > ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             unlimited
-c: core file size (blocks)         unlimited
-m: resident set size (kbytes)      unlimited
-u: processes                       2000
-n: file descriptors                4096
-l: locked-in-memory size (kbytes)  64
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 94126
-q: bytes in POSIX msg queues       819200
-e: max nice                        20
-r: max rt priority                 0
-N 15:                              unlimited

codemac · 2014-06-06 05:28:47

Looks like systemd is overriding login ulimit settings - which is completely unexpected.

Looking at more systemd-coredump source..

codemac · 2014-06-06 05:32:56

In coredump.c:

/* Make sure to not make this larger than the maximum journal entry
 * size. See ENTRY_SIZE_MAX in journald-native.c. */
#define COREDUMP_MAX (767*1024*1024u)
assert_cc(COREDUMP_MAX <= ENTRY_SIZE_MAX);

FUCKING FUCK ASS.

Ok. Looks like they don't write out anything bigger than 767MiB. It's a compiled in value, there's no way to change it. So now I have to make sure no deployed systems are using this fucking coredumpctl bullshit.. because nerd-alert: most linux systems running today have more than 1 process at a time that contain that much mapped memory.

codemac · 2014-06-06 18:24:21

https://bugs.archlinux.org/task/40737

I waited a day and calmed down, and filed this bug.

ewaller · 2014-06-06 20:21:43

codemac wrote:

https://bugs.archlinux.org/task/40737
I waited a day and calmed down, and filed this bug.

Thank you. On both counts.
I'm glad I nudged you in the correct direction. Well done on researching the code.

bspar · 2014-06-07 00:07:28

Thanks codemac and ewaller!

codemac · 2014-06-26 01:34:52

As a heads up, this is fixed in systemd-214-2.

cat'ing /proc/sys/kernel/core_pattern should just show the word 'core' again.

PReP · 2014-06-28 00:21:28

Kudos and cookies and thumbs up for everyone involved here!
Makes me a proud Arch'er, with good practice examples like this one

Arch Linux

#1 2014-04-30 23:34:09

Cannot use systemd-coredumpctl on Firefox cores

#2 2014-05-20 18:43:47

Re: Cannot use systemd-coredumpctl on Firefox cores

#3 2014-06-05 04:03:33

Re: Cannot use systemd-coredumpctl on Firefox cores

#4 2014-06-06 04:37:05

Re: Cannot use systemd-coredumpctl on Firefox cores

#5 2014-06-06 05:03:09

Re: Cannot use systemd-coredumpctl on Firefox cores

#6 2014-06-06 05:22:54

Re: Cannot use systemd-coredumpctl on Firefox cores

#7 2014-06-06 05:28:47

Re: Cannot use systemd-coredumpctl on Firefox cores

#8 2014-06-06 05:32:56

Re: Cannot use systemd-coredumpctl on Firefox cores

#9 2014-06-06 18:24:21

Re: Cannot use systemd-coredumpctl on Firefox cores

#10 2014-06-06 20:21:43

Re: Cannot use systemd-coredumpctl on Firefox cores

#11 2014-06-07 00:07:28

Re: Cannot use systemd-coredumpctl on Firefox cores

#12 2014-06-26 01:34:52

Re: Cannot use systemd-coredumpctl on Firefox cores

#13 2014-06-28 00:21:28

Re: Cannot use systemd-coredumpctl on Firefox cores

Board footer