Why does my system let GHC almost taking it down?

Enrico1989 · Today 13:54:28

This is the output of lscpu

Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           43 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  32
On-line CPU(s) list:                     0-31
Vendor ID:                               AuthenticAMD
Model name:                              AMD Ryzen Threadripper 1950X 16-Core Processor
CPU family:                              23
Model:                                   1
Thread(s) per core:                      2
Core(s) per socket:                      16
Socket(s):                               1
Stepping:                                1
Microcode version:                       0x8001129
Frequency boost:                         enabled
CPU(s) scaling MHz:                      66%
CPU max MHz:                             3400.0000
CPU min MHz:                             2200.0000
BogoMIPS:                                6786.53
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev
L1d cache:                               512 KiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                8 MiB (16 instances)
L3 cache:                                32 MiB (4 instances)
NUMA node(s):                            2
NUMA node0 CPU(s):                       0-7,16-23
NUMA node1 CPU(s):                       8-15,24-31
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Old microcode:             Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Mitigation; untrained return thunk; SMT vulnerable
Vulnerability Spec rstack overflow:      Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Retpolines; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Vulnerable

and this is the output of lsmem

RANGE                                  SIZE  STATE REMOVABLE  BLOCK
0x0000000000000000-0x000000007fffffff    2G online       yes   0-15
0x0000000100000000-0x000000087fffffff   30G online       yes 32-271

Memory block size:                128M
Total online memory:               32G
Total offline memory:               0B
Memmap on memory parameter:         no

I'm working on a Haskell project, and when I build it, I go for

cabal build all -jN

As regards N, if I use 32 because my system has 32 CPUs, the system becomes fairly unresponsive, and some other applications are negatively impacted not just in terms of responsiveness, but also in that pieces of them start "crashing".

Here a report of some browser's tabs being SIGKILLed. It turned out it's the OOM killer doing its job.

But my question is: why is my system allowing this?

Yes, I'm the one passing -j32, but my intention is more to tell the system "don't limit yourself to using less than 32 cores", but without encouraging it to steal such cores away from other applications.

Am I just misunderstanding what's expected of a program when telling it to use all cores? Or is it that GHC is too aggressively stealing computational resources? Or something else?

Last edited by Enrico1989 (Today 13:56:30)

seth · Today 14:17:05

But my question is: why is my system allowing this?

Because it doesn't know what you want.
https://wiki.archlinux.org/title/Cgroups
https://wiki.archlinux.org/title/Improv … conditions
https://www.baeldung.com/linux/memory-o … oom-killer

Raising the (io)nice'ness of the process will make it not/less steal CPU time but still use everything it's afforded.
nb. that a bigger problem w/ the massive parallel execution will be the RAM limitation - if you're running OOM the kernel has to kill stuff to free memory for the other consumers, but if you constrain the RAM access of a process (group) and glibc can no longer claim the demanded memory, there's a tremendous chance the process will just crash/abort.

marcoe · Today 15:34:53

Try running GHC in its own cgroup:

systemd-run --scope --slice-inherit cabal build all -jN ...

If that doesn't appear to help, check if it's memory bandwidth related:

for i in {1..8}; do memhog -r40 200M & done

And slowly increase the number of memhog workers, for example my system barely holds 13 workers - the responsiveness of everything drops dramatically after that - I can also observe it with ffmpeg encoding, it's just not worth it to spawn more than 16 ffmpeg encoding threads, because they are bottlenecked by memory bandwidth and as far as I know there is no (software) remedy for that -- are you sure using all 32 threads helps with compile times?

Last edited by marcoe (Today 15:41:23)

Lone_Wolf · Today 18:07:49

Thread(s) per core:                      2
Core(s) per socket:                      16

Total online memory:               32G

I had a 1920x threadripper with 12/24 c/t and 16 Gib .
Had to add swap of 24 Gib and reduce number of cores used in compiling to 18 to avoid OOM crashes.

Currently am using a ryzen 9 9950X with 16/32 C/T and 64 GiB .
No need to reduce number of cores used anymore .

Comments online related to 'ninja being very greedy and bringing system down' suggests each job takes 1 to 2 Gib of memory .
automake is / used to be better at managing lots of jobs but most projects have switched to cmake or meson .

How much swapspace does your system have ?
If it's less then 16 GiB increase it to that and test. (Don't be afraid to increase to 32 GiB if it still crashes) .

Arch Linux

#1 Today 13:54:28

Why does my system let GHC almost taking it down?

#2 Today 14:17:05

Re: Why does my system let GHC almost taking it down?

#3 Today 15:34:53

Re: Why does my system let GHC almost taking it down?

#4 Today 18:07:49

Re: Why does my system let GHC almost taking it down?

Board footer