You are not logged in.

#1 2023-03-04 18:03:32

teobouvard
Member
Registered: 2023-03-04
Posts: 2

dlsym() returns illegal (FMA4) version of sin() in conda's libm 2.17

Hello,

On my workstation, when resolving the sin() function from the libm version 2.17 using dlsym(), I get the implementation which uses FMA4 (__sin_fma4), which is not supported by my CPU (AMD Ryzen 7 3700X). Running the resulting binary triggers an illegal instruction (vfmaddsd).
This particular version of the libm comes with the sysroot_linux-64 conda package, which comes with most conda environments bundling a compilation toolchain.

To reproduce the issue, you first need to get this specific version of the libm (you do not need to activate the environment, this is only needed to retrieve the shared library)

conda create --name libm_issue 'sysroot_linux-64=2.17'

Then, you can compile the following C program

#include <stdio.h>
#include <dlfcn.h>

int main(int argc, char** argv) {
    void *handle;
    double (*sin_func)(double);
    char *error;

    handle = dlopen(argv[1], RTLD_LAZY);
    if (!handle) {
        fprintf(stderr, "Error: %s\n", dlerror());
        return 1;
    }

    dlerror(); /* Clear any existing error */
    sin_func = dlsym(handle, "sin");
    error = dlerror();
    if (error != NULL) {
        fprintf(stderr, "Error: %s\n", error);
        dlclose(handle);
        return 1;
    }

    printf("sin(1) = %f\n", (*sin_func)(1.0));
    dlclose(handle);
    return 0;
}

with

gcc -g illegal.c

and then execute it, giving it the path to the libm shared library.

./a.out ~/.conda/envs/libm_issue/x86_64-conda-linux-gnu/sysroot/lib64/libm.so.6

On my host, this results in an illegal instruction. Running it through gdb shows that the cause is the FMA4 instruction.

Program received signal SIGILL, Illegal instruction.
0x00007ffff7a500e2 in __sin_fma4 () from /home/user/.conda/envs/libm_issue/x86_64-conda-linux-gnu/sysroot/lib64/libm.so.6

A few informations:

$ uname -a
Linux HoG 6.1.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 14 Feb 2023 22:08:08 +0000 x86_64 GNU/Linux

$ gcc --version
gcc (GCC) 12.2.1 20230201

$ ld --version
GNU ld (GNU Binutils) 2.40

$ cat /proc/cpuinfo | head -n 28
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 23
model		: 113
model name	: AMD Ryzen 7 3700X 8-Core Processor
stepping	: 0
microcode	: 0x8701013
cpu MHz		: 3876.885
cache size	: 512 KB
physical id	: 0
siblings	: 16
core id		: 0
cpu cores	: 8
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb
bogomips	: 7189.97
TLB size	: 3072 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

Note that this procedure works on CentOS7, which uses an older version of the linker (2.27), so I have a few questions:

  • Is this supposed to work ? Should I be able to call into a version of the libm which is older than the version of the linker ? I would think so, but if this assumption is wrong then that explains a lot.

  • Can this be a linker/loader bug ? I tried stepping through dl-lookup.c but I don't know enough about this subject to understand why it picks the wrong implementation of sin()

Thanks for your help!

Last edited by teobouvard (2023-03-05 13:02:39)

Offline

#2 2023-03-05 10:35:32

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 15,147

Re: dlsym() returns illegal (FMA4) version of sin() in conda's libm 2.17

I couldn't find a libm package so did run pacman -F libm.so.6 . That made clear it's part of glibc .

Please change the title (edit first post) to reflect you have issues with using an older glibc provided by conda .

Also check https://wiki.archlinux.org/title/Conda to see what steps are recommended to use conda on archlinux. (hint : out of the box conda often fails or causes issues)


Welcome to archlinux forums

Last edited by Lone_Wolf (2023-03-05 10:36:14)


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#3 2023-03-05 13:28:58

teobouvard
Member
Registered: 2023-03-04
Posts: 2

Re: dlsym() returns illegal (FMA4) version of sin() in conda's libm 2.17

Lone_Wolf wrote:

Welcome to archlinux forums

Thanks!

Lone_Wolf wrote:

Please change the title to reflect you have issues with using an older glibc provided by conda

I added to the title that this version of libm was provided by conda, but did not have enough space to also add that libm is part of glibc, but that is easy to figure out.

Lone_Wolf wrote:

Also check https://wiki.archlinux.org/title/Conda to see what steps are recommended to use conda on archlinux.

Note that I don't have any issues directly related to conda, which works perfectly fine on my system. My question was more related to backwards compatibility of symbols in older version of libm.

I checked if this issue could be reproduced by building libm 2.17 directly from glibc source, which would remove conda from the equation.
When running my test case against the built library, it fails at the dlopen() step

dlopen: ../glibc/build/math/libm.so.6: symbol __get_cpu_features, version GLIBC_PRIVATE not defined in file libc.so.6 with link time reference

That might explain why the libm provided by the conda package selects an incorrect implementation of sin(), if there is an issue related to __get_cpu_features.

I should probably open an issue on the conda package itself, but I would like to keep this one open in case someone with more knowledge about glibc's internals has an explanation for this particular situation. It's not a critical issue because the current version of libm works fine, but I'm just curious about why this happens.

Offline

Board footer

Powered by FluxBB