You are not logged in.

#1 2023-03-09 12:21:45

endoriazel
Member
Registered: 2023-03-09
Posts: 1

Kernel compiled with KCFLAGS doesn't run at all.

Greetings,

I'm compiling my own kernel, mostly because I need a single line of code patched for it to better harmonize with my hardware, but also because I want to take advantage of a few config options.

While I'm at it I'm trying to compile it with CPU-specific optimizations. I took the original PKGBUILD and added my patch as well as the KCFLAGS environment variable.

I originally produced these on the target machine like so:

$ gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
... -march=skylake -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mno-avx -mno-avx2 -mno-sse4a -mno-fma4 -mno-xop -mno-fma -mno-avx512f -mno-bmi -mno-bmi2 -maes -mpclmul -mno-avx512vl -mno-avx512bw -mno-avx512dq -mno-avx512cd -mno-avx512er -mno-avx512pf -mno-avx512vbmi -mno-avx512ifma -mno-avx5124vnniw -mno-avx5124fmaps -mno-avx512vpopcntdq -mno-avx512vbmi2 -mno-gfni -mno-vpclmulqdq -mno-avx512vnni -mno-avx512bitalg -mno-avx512bf16 -mno-avx512vp2intersect -mno-3dnow -mno-adx -mabm -mno-cldemote -mclflushopt -mno-clwb -mno-clzero -mcx16 -mno-enqcmd -mno-f16c -mfsgsbase -mfxsr -mno-hle -msahf -mno-lwp -mlzcnt -mmovbe -mno-movdir64b -mno-movdiri -mno-mwaitx -mno-pconfig -mno-pku -mno-prefetchwt1 -mprfchw -mno-ptwrite -mno-rdpid -mrdrnd -mrdseed -mno-rtm -mno-serialize -msgx -mno-sha -mno-shstk -mno-tbm -mno-tsxldtrk -mno-vaes -mno-waitpkg -mno-wbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-amx-tile -mno-amx-int8 -mno-amx-bf16 -mno-uintr -mno-hreset -mno-kl -mno-widekl -mno-avxvnni -mno-avx512fp16 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=3072 -mtune=skylake ...

The full PKGBUILD can be viewed here: https://pastebin.com/q2xzcZf3
The .config can be viewed here: https://pastebin.com/cqGBss0t

lscpu says the following and I don't see any striking discrepancy between the compiler arguments and its capabilites:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Pentium(R) CPU G4560 @ 3.50GHz
    CPU family:          6
    Model:               158
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            9
    CPU(s) scaling MHz:  36%
    CPU max MHz:         3500.0000
    CPU min MHz:         800.0000
    BogoMIPS:            7002.48
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep erms invpcid mpx rdseed smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   64 KiB (2 instances)
  L1i:                   64 KiB (2 instances)
  L2:                    512 KiB (2 instances)
  L3:                    3 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
  Srbds:                 Mitigation; Microcode
  Tsx async abort:       Not affected

The kernel that gets produced by this doesn't run at all. According to a friend of mine, who looked at this problem, it doesn't even reach the entry point and gdb never shows anything helpful when running it in Qemu.

Fair enough, I thought, and broke those arguments down to export KCFLAGS="-mmmx -msse -msse2" which, from my research, should be what a generic x86_64 build implicitly enables anyways, but still no dice.

Another interesting point; if I leave out KCFLAGS completely I get a working kernel.

I know there's a well-tested patchset at https://github.com/graysky2/kernel_compiler_patch/, but it doesn't include one compatible with my gelded-Kabylake Pentium.

I'm compiling with gcc 12.2.1-2 (and perhaps nasm 2.15.05-1), as they come out of base-devel, on a more powerful machine in WSL2 with Windows-interop disabled (so there cannot be any weird interactions with build tools present on the host).

Why is this happening?
What am I doing wrong?
What's the correct way to get a fully optimized kernel built?

Offline

#2 2025-02-12 16:59:51

Hanabishi
Member
Registered: 2020-08-07
Posts: 45

Re: Kernel compiled with KCFLAGS doesn't run at all.

A bit late response, but for anyone happen to stumble to the topic.

Vector extensions are forbidden during kernel compilation and explicitly disabled by the x86 makefile. Overriding them via KCFLAGS obviously leads to kernel build being broken, so don't do that.
You are allowed to set -march/-mtune though, but it would not yield any significant results. Because again, vector extensions, which usually are responsible for performance gains, would not be used during kernel compilation anyway.
Also speaking of, the kernel is able to detect CPU features in runtime and tend to use them in places where they would provide real benifits, regardless of the flags it have been compiled with. So you shouldn't worry about your hardware capabilities being underutilized by the kernel.

Offline

Board footer

Powered by FluxBB