Memory allocation errors on 6.1.1

eightball · 2022-12-30 13:52:32

When trying to start ghci via stack on a ~1k module Haskell project I repeatedly get a segfault trying to allocate 4k memory.

This only happens on 6.1.1. The problem goes away if I roll back to 6.0.12

I've definitely got free memory available (~40GB available)

The error is repeatable but occurs at inconsistent times during the REPL launching (i.e. at different modules).

A full compilation (stack build) works fine.

I have smaller Haskell projects which don't exhibit this problem so I'm having a hard time producing a minimal example. It could be a GHC bug, I suppose, but the kernel seems suspect since it only happens on 6.1.1

The segfault in dmesg looks like

ghc:w[648474]: segfault at 0 ip 00007f4df1657f64 sp 00007f4cf1ff6de8 error 6 in libHSghci-9.0.2-ghc9.0.2.so[7f4df14b9000+242000] likely on CPU 8 (core 8, socket 0)

and from ghc

ghc: mmap 4096 bytes at (nil): Cannot allocate memory
ghc: Try specifying an address with +RTS -xm<addr> -RTS
Segmentation fault (core dumped)

I've ran a full memtest86+ test and all my RAM is fine.

CPU: AMD Ryzen 9 5950X 16-Core Processor

My next step is to try this against the linux-mainline from AUR (at 6.1.1) and then report it once I know if it's Arch specific or not.

Other suggestions are welcome.

eightball · 2023-01-05 14:31:43

This occurs on 6.1.0-mainline, too.

I guess I'll try to bisect to determine which kernel version it was introduced in so I can submit a useful bug to the kernel tracker.

eightball · 2023-01-05 14:32:42

Also worth noting that this occurs on a teammates machine and my laptop under the same circumstances.

My laptop is an i7 so I suppose it isn't something AMD specific.

seth · 2023-01-06 10:00:30

ghc:w[648474]: segfault at 0 ip

nullptr dereference.

Not sure what or how the kernel update causes this, but it's not a memory issue and given it's haskell and

A full compilation (stack build) works fine.

it's probably just haskell
Also a rebuild seems on the way, https://archlinux.org/packages/communit … /ghc-libs/

(Do NOT enable the community-staging repo!!!)
https://wiki.archlinux.org/title/Offici … positories

eightball · 2023-01-07 19:28:51

I'm not using the Arch haskell libs for building anything - this happens with GHC from ghcup installed directly or via the docker images I normally use for building.

I agree this feels like a Haskell issue but it only occurs on Linux 6.1 and up. I went back to 6.0 on mainline and the issue goes away. On arch 6.1.x, mainline 6.1.x or zen 6.1.x the issue happens consistently. The issue also goes away with linux-lts.

I'll start with reporting this to GHC, though. While the kernel is the only thing I'm changing it's hard to imagine what the kernel could be doing here that's only causing GHC to freak out.

artisdom · 2023-01-12 06:57:04

when playing with "hyper-haskell", encounter some similer errors.

Got Haskell expression, evaluating
hyper-haskell-server: mmap 73728 bytes at (nil): Cannot allocate memory
hyper-haskell-server: Try specifying an address with +RTS -xm<addr> -RTS
Wrote result
Waiting for Haskell expression
UnknownError "loadObj \"/home/user/.ghcup/ghc/8.8.4/lib/ghc-8.8.4/transformers-0.5.6.2/HStransformers-0.5.6.2.o\": failed"

$ uname -a
Linux arch 6.1.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 07 Jan 2023 15:10:07 +0000 x86_64 GNU/Linux
$ stack --version
Version 2.9.1, Git revision 409d56031b4240221d656db09b2ba476fe6bb5b1 x86_64 hpack-0.35.0
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.8.4

Downgraded kernel from 6.1.4 to 6.0.8 fixed the error, not sure what's causing it.

$ uname -a
Linux arch 6.0.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 10 Nov 2022 21:14:24 +0000 x86_64 GNU/Linux

Last edited by artisdom (2023-01-12 07:12:36)

swt2c · 2023-02-02 22:44:35

Hi all, seeing similar issues on Debian with kernel 6.1 and GHC (but downgrading to 6.0 resolves the issue). Did anyone report this to GHC? I couldn't find an existing bug.

eightball · 2023-02-06 23:38:33

FYI I'm experiencing this on GHC 9.0.2. I haven't tried it on another GHC as it's some work to move the project in question there and it doesn't happen on smaller projects I have.

I haven't submitted a GHC bug with this - I haven't had time to collect enough details on this to submit a meaningful bug report.

swt2c · 2023-02-07 21:28:06

Yes, GHC 9.0.2 here as well.

So, I went to go try to do a kernel bisection between 6.0 and 6.1 to see if that might point to a clue for GHC. However, I'm now observing that this problem seems to be fixed (or at least I can't reproduce it) in kernel 6.1.10. I can still reproduce the problem in with 6.1.9. Can anyone else confirm this?

What's weird is that I can't find anything obviously relevant in the kernel changelog...

eightball · 2023-02-11 14:44:34

Shoot, I replicated it with whatever the latest kernel was yesterday but I didn't pay attention to the minor version.

I have also noticed that it does not occur on GHC 9.4. I haven't tried 9.2, though.

eightball · 2023-02-11 16:15:22

I do get the segfault on GHC 9.2

janust · 2023-03-17 23:42:22

TL;DR It's a kernel bug. Not fixed in 6.1.10 (and most likely neither in 6.2.7).

I have copied the content of the pastes below to the bottom of this post.

22:59 -!- Topic for #ghc: GHC Development | GHC 9.4.4 Released! | GitLab: https://gitlab.haskell.org/ghc | Please ask user questions in #haskell
22:59 -!- Topic set by bgamari[m] [] [Sat Dec 24 23:40:52 2022]
22:59 -!- Irssi: #ghc: Total of 149 nicks [0 ops, 0 halfops, 0 voices, 149 normal]
22:59 -!- Channel #ghc created Wed May 19 15:50:35 2021
22:59 -!- Irssi: Join to #ghc was synced in 8 secs
23:00 <........... janus> is there a way to make ./hadrian/ghci load modules interpreted instead of compiled?
23:01 <........... janus> i am trying to trigger a crash which i think only happens if you load many interpreted modules
23:09 <........... int-e> janus: afaics it uses -fno-code to prevent compilation (while still generating .hi files)
23:12 <........... int-e> Oh I did get a segfault, let's see if I can do it again from scratch.
23:12 <........... janus> oooh great!
23:12 <........... janus> on the ghc codebase?
23:12 <........... int-e> yeah
23:19 <........... int-e> janus: So what I did was run `hadrian/ghci`, extract the command line using `ps axuwwwwww | grep ghc` (surely there's a better way) and then drop the -fno-code, and run the resulting command. And 
                          that seems to crash around 100-200 modules loaded, just as you said.
23:20 <.... geekosaur[m]> edit hadrian/ghci-cabal (or hadrian/ghci-cabal.in and regenerate)
23:20 <........... janus> i made 1000 modules with 'f$i :: (); f$i = ()' and that loads with no crash
23:21 <........... janus> but great that you reproduced with ghc!
23:21 <........... int-e> So the call that fails for me is an ordinary-looking `mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0)`
23:29 <.......... adamse> int-e: that call returns error? or segfaults? MAP_32BIT is probably a limited resource
23:30 <........... int-e> Ah, that is an excellent point.
23:32 <........... int-e> That said though, I can easily satisfy 64k such requests on a 5.10 kernel and my 6.1 kernel fails after 1k - 4k or so.
23:34 <........... int-e> janus: http://paste.debian.net/1274452/ if you want to see that in action. There seems to be some address randomization going on now? The old kernel happily hands out consecutive addresses; the new 
                          one is all over the place.
23:35 <........... int-e> Oh no, that is because it's filling gaps first...
23:38 <........... int-e> To clarify, even on the old kernel the first allocations are all over the place, seemingly random. But at some point it switches from that behavior to filling up memory all the way to 0x7ffff000. And 
                          that no longer happens with the 6.1 kernel. Instead it just stops there.
23:43 <........... int-e> It's even better, retrying the `mmap` call generally succeeds.
23:48 <........... int-e> LD_PRELOAD-ing this ridiculous hack http://paste.debian.net/1274454/ makes it load all 775 modules for me.
23:49 <........... int-e> janus: At this point it's safe to say that this is a kernel bug.
23:51 <........... janus> int-e: thanks for researching! very interesting
23:55 <........... janus> int-e: so you think it is a mere coincidence that it happens to work on ghc 9.4? that said, i havn't verified myself, maybe it is possible to trigger there also, just with lower probability
23:57 <........... int-e> Hmm
23:58 <........... janus> but yeah, given that it fails first and then succeeds, that doesn't seem sensible
23:58 <........... int-e> I'll try what happens for me with 9.4.4
23:59 <........... int-e> (and if it works, whether I can spot a difference in the `mmap` calls)
Day changed to 18 Mar 2023
00:11 <........... int-e> 9.4.4 uses MAP_32BIT just as heavily but passes preferred addresses to `mmap` rather than NULL.
00:24 <........... int-e> janus: Oh a lot of things happened there on the ghc side, for various reasons, I'll stop digging. But yes, 9.4.4 is different in its use of mmap and that's likely why it works on Linux 6.1.
00:28 <........... int-e> The issue has been reported to LKML, though with little discussion so far: https://marc.info/?t=167777133200002&r=1&w=2
00:30 <........... janus> oh excellent! so they know about it
00:30 <........... janus> can i copy this discussion to the arch forum? because people would wanna know
00:32 <........... int-e> sure
00:33 <........... int-e> beware that my pastes expire in 90 days... you may want to duplicate them for use outside of IRC (which I consider to be ephemeral)
00:33 <........... int-e> (though hopefully this whole topic will be mostly irrelevant by then)

#include <sys/mman.h>
#include <stddef.h>
#include <stdio.h>

int main()
{
    for (int i = 0; i < (1 << 16); i++) {
        void *p = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0);
        printf("%p\n", p);
        if (p == (void *)-1) {
            printf("%d\n", i);
            return 0;
        }
    }
}

/*
gcc -O -Wall -fPIC -shared -ldl -o mmap.so mmap.c
*/

#define _GNU_SOURCE
#define _XOPEN_SOURCE

#include <sys/mman.h>
#include <dlfcn.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

static void *(*orig_mmap)(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

static void init() __attribute__((constructor));

static void init()
{
    orig_mmap = dlsym(RTLD_NEXT, "mmap");
}

void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
{
    void *p = orig_mmap(addr, length, prot, flags, fd, offset);
    int retry = 0;
    while (retry < 3 && p == (void *)-1) {
        retry++;
        p = orig_mmap(addr, length, prot, flags, fd, offset);
    }
    return p;
}

loqs · 2023-03-17 23:52:17

If I understand correctly the fix is https://git.kernel.org/pub/scm/linux/ke … 079a7ea480 which is not in mainline yet.

Last edited by loqs (2023-03-18 00:54:44)

janust · 2023-03-19 18:24:15

loqs wrote:

If I understand correctly the fix is https://git.kernel.org/pub/scm/linux/ke … 079a7ea480 which is not in mainline yet.

I applied this patch and it did fix the issue as reproduced on the GHC repo. Thanks for the link.

loqs · 2023-03-28 21:34:10

The fix is queued for 6.2.9 https://git.kernel.org/pub/scm/linux/ke … 1deb8d7db0

nuclearpidgeon · 2023-04-07 15:02:08

I ran into this issue just today when trying to bring up an old project that makes use of a build tool written in Haskell that does a fair bit of dynamic compilation (see [1]). Output was as follows:

Scanning directory tree...
Creating Makefile...
Evaluating 79 Hakefiles...
hake: mmap 4096 bytes at (nil): Cannot allocate memory
hake: Try specifying an address with +RTS -xm<addr> -RTS
include/Hakefile:
loadArchive "/usr/lib/ghc/base-4.9.1.0/libHSbase-4.9.1.0.a": failed
CallStack (from HasCallStack):
  error, called at libraries/ghci/GHCi/ObjLink.hs:91:21 in ghci-8.0.2:GHCi.ObjLink

I can confirm that the issue was present on kernel 6.2.8 and that updating to 6.2.9 fixed it.

Thanks all for documenting everything - was able to find this post easily and get to a fix

[1] The program in question is the 'Hake' build tool that's part of the toolchain for the Barrelfish research operating system, which slups up many 'Hakefiles' to dynamically evaluate them all into one big Haskell expression that gets mapped/translated out into a big Makefile. https://github.com/BarrelfishOS/barrelf … in.hs#L206

It may not really matter but I was running Hake inside an Ubuntu-based Docker image and still getting the issue, which backs up why it was a kernel issue and why the host kernel update fixed things. What a lovely little bug...

0m3 · 2023-05-04 15:52:44

Hello, all.

Ok, 19 modules loaded.
Loaded GHCi configuration from /tmp/haskell-stack-ghci/58d67de3/ghci-script
ghc: mmap 4096 bytes at (nil): Cannot allocate memory
ghc: Try specifying an address with +RTS -xm<addr> -RTS
[1]    32258 segmentation fault (core dumped)  stack ghci

❯ uname -a
Linux arch 6.3.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 01 May 2023 17:42:39 +0000 x86_64 GNU/Linux

PixelHamster · 2023-05-07 21:29:59

0m3 wrote:

Hello, all.

Ok, 19 modules loaded.
Loaded GHCi configuration from /tmp/haskell-stack-ghci/58d67de3/ghci-script
ghc: mmap 4096 bytes at (nil): Cannot allocate memory
ghc: Try specifying an address with +RTS -xm<addr> -RTS
[1]    32258 segmentation fault (core dumped)  stack ghci

❯ uname -a
Linux arch 6.3.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 01 May 2023 17:42:39 +0000 x86_64 GNU/Linux

If you can, as a workaround you can try updating your ghc to 9.4>
https://gitlab.haskell.org/ghc/ghc/-/issues/19421
I updated from 9.2.7 to 9.4.4 and the issue went away.

kode54 · 2023-05-11 07:52:59

janust wrote:

TL;DR It's a kernel bug. Not fixed in 6.1.10 (and most likely neither in 6.2.7).

Seems to be broken in 6.3.0 too, even though the fix linked above is already included in the commit tree.

janust wrote:

#include <sys/mman.h>
#include <stddef.h>
#include <stdio.h>

int main()
{
    for (int i = 0; i < (1 << 16); i++) {
        void *p = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_32BIT, -1, 0);
        printf("%p\n", p);
        if (p == (void *)-1) {
            printf("%d\n", i);
            return 0;
        }
    }
}

This snippet manages to get between a dozen and a thousand allocations here before failing, completely randomly.

fommil · 2023-05-11 08:42:47

I'm also experiencing this. To those who downgraded their kernels, do you have a link to a download of a linux kernel before the issue arose? e.g. I see that folk are saying it is fixed in 6.0.8 and 6.2.9, so it seems intermittent.

Unfortunately I'm also experiencing it in the LTS version of the kernel, 6.1.27, because running LTS would have been a convenient workaround.

UPDATE: I had a copy of 6.2.9 in my pacman cache, downgrading to that, but it won't help you out if you just arrived here looking to do the same. I can confirm that it did fix it! Looking forward to when I can return to following the latest releases.

Last edited by fommil (2023-05-11 08:51:21)

seth · 2023-05-11 08:50:12

https://wiki.archlinux.org/title/Arch_Linux_Archive

nb. that if you're usign any out-of-tree modules (nvidia, virtualbox, r8168, maybe some others) you'll have to either downgrade them as well or switch to the dkms version.

loqs · 2023-05-11 09:58:23

Possibly fixed by https://git.kernel.org/pub/scm/linux/ke … 7db21b67a8

fommil · 2023-05-18 14:48:17

still broken in 6.3.2

loqs · 2023-05-18 15:16:02

fommil wrote:

still broken in 6.3.2

Have you tried a kernel with https://git.kernel.org/pub/scm/linux/ke … ca1f273fe6 applied such as the one linked below?

https://drive.google.com/file/d/1uazu9w … share_link linux-6.3.2.arch1-1.2-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1tneREs … share_link linux-headers-6.3.2.arch1-1.2-x86_64.pkg.tar.zst

fommil · 2023-05-22 15:07:12

loqs wrote:

fommil wrote:
still broken in 6.3.2
Have you tried a kernel with https://git.kernel.org/pub/scm/linux/ke … ca1f273fe6 applied such as the one linked below?
https://drive.google.com/file/d/1uazu9w … share_link linux-6.3.2.arch1-1.2-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1tneREs … share_link linux-headers-6.3.2.arch1-1.2-x86_64.pkg.tar.zst

Nope, I've not got the bandwidth to apply a patch to a kernel at the moment... I'm betting on the fix being known and queued but the lack of the fix in 6.3.3 is starting to get me spooked.

Arch Linux

#1 2022-12-30 13:52:32

Memory allocation errors on 6.1.1

#2 2023-01-05 14:31:43

Re: Memory allocation errors on 6.1.1

#3 2023-01-05 14:32:42

Re: Memory allocation errors on 6.1.1

#4 2023-01-06 10:00:30

Re: Memory allocation errors on 6.1.1

#5 2023-01-07 19:28:51

Re: Memory allocation errors on 6.1.1

#6 2023-01-12 06:57:04

Re: Memory allocation errors on 6.1.1

#7 2023-02-02 22:44:35

Re: Memory allocation errors on 6.1.1

#8 2023-02-06 23:38:33

Re: Memory allocation errors on 6.1.1

#9 2023-02-07 21:28:06

Re: Memory allocation errors on 6.1.1

#10 2023-02-11 14:44:34

Re: Memory allocation errors on 6.1.1

#11 2023-02-11 16:15:22

Re: Memory allocation errors on 6.1.1

#12 2023-03-17 23:42:22

Re: Memory allocation errors on 6.1.1

#13 2023-03-17 23:52:17

Re: Memory allocation errors on 6.1.1

#14 2023-03-19 18:24:15

Re: Memory allocation errors on 6.1.1

#15 2023-03-28 21:34:10

Re: Memory allocation errors on 6.1.1

#16 2023-04-07 15:02:08

Re: Memory allocation errors on 6.1.1

#17 2023-05-04 15:52:44

Re: Memory allocation errors on 6.1.1

#18 2023-05-07 21:29:59

Re: Memory allocation errors on 6.1.1

#19 2023-05-11 07:52:59

Re: Memory allocation errors on 6.1.1

#20 2023-05-11 08:42:47

Re: Memory allocation errors on 6.1.1

#21 2023-05-11 08:50:12

Re: Memory allocation errors on 6.1.1

#22 2023-05-11 09:58:23

Re: Memory allocation errors on 6.1.1

#23 2023-05-18 14:48:17

Re: Memory allocation errors on 6.1.1

#24 2023-05-18 15:16:02

Re: Memory allocation errors on 6.1.1

#25 2023-05-22 15:07:12

Re: Memory allocation errors on 6.1.1

Board footer