You are not logged in.

#1 2024-01-20 18:41:39

Goddard Guryon
Member
Registered: 2024-01-20
Posts: 3

Cannot run slurmd: "hybrid mode is not supported"

I'm trying to run slurm on my local machine to test some scripts I wrote for an HPC. For this purpose, I have enabled both slurmctld and slurmd services on the same device so that my 'login node' and 'compute node' are the same. Although the slurmctld service runs fine, starting the slurmd service gives me:

slurmd: error: Thread count (16) not multiple of core count (12)
slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: 1:net_cls:/
0::/user.slice/user-1000.slice/session-2.scope

While the first error seems to be because [apparently] not all my cores support hyperthreading (and is not an issue as far as I can tell), the second one is what's causing me trouble. Based on what I could find in slurm's documentation, this error would be caused because systemd is running both cgroups v1 and v2 on my device, which is something slurm cannot work with...except that that's not the case. I don't understand the details of what cgroups really are, but I found a shell command on SO:

[ $(stat -fc %T /sys/fs/cgroup/) = "cgroup2fs" ] && echo "unified" || ( [ -e /sys/fs/cgroup/unified/ ] && echo "hybrid" || echo "legacy")

which also confirms that only cgroup v2 is active in my case. I'm not even sure that this is what the 'hybrid mode' in the log above is talking about, but I haven't found any other post discussing this. Thanks in advance for any help.

Offline

#2 2024-01-20 19:11:35

Raynman
Member
Registered: 2011-10-22
Posts: 1,539

Re: Cannot run slurmd: "hybrid mode is not supported"

Offline

#3 2024-01-21 08:55:31

Goddard Guryon
Member
Registered: 2024-01-20
Posts: 3

Re: Cannot run slurmd: "hybrid mode is not supported"

Turns out the only thing I didn't try, setting the kernel parameter, was what's needed. Thanks!

But I'm facing a different problem now: getting slurmd up and rebooting suddenly broke, out of all things, the internet connection. Running ping 1.1.1.1 gives me "sendmsg: operation not permitted"; iptables has ACCEPT enabled for all incoming and outgoing, 'nmcli c s' shows I have no tun0 connection, and I currently have no VPN blocking my internet (I can't link to what I found online from my phone, but these are more or less the only possible fixes I've come across).

Could this somehow be related to setting up both slurmctld and slurmd on the same machine (since nothing else changed before and after the reboot)?

Offline

#4 2024-01-21 09:26:58

Raynman
Member
Registered: 2011-10-22
Posts: 1,539

Re: Cannot run slurmd: "hybrid mode is not supported"

Goddard Guryon wrote:

Turns out the only thing I didn't try, setting the kernel parameter, was what's needed. Thanks!

Interesting, because the bug reporter says it didn't help them, so I wondered whether maybe slurmd's cgroup detection was off.

Could this somehow be related to setting up both slurmctld and slurmd on the same machine (since nothing else changed before and after the reboot)?

Or it could be a side effect of adding that kernel parameter. I don't know anything about slurm and not that much about cgroups, so I can't say too much about this, but you could at least check if networking is still broken if you boot with all slurm services disabled while leaving that kernel parameter in place.

Offline

#5 2024-01-21 12:01:35

Goddard Guryon
Member
Registered: 2024-01-20
Posts: 3

Re: Cannot run slurmd: "hybrid mode is not supported"

Welp, turns out mullvad in the background was blocking internet access. Why it suddenly started acting this way (when I haven't used it in months), I have no clue. But thanks for all the help smile

Offline

Board footer

Powered by FluxBB