You are not logged in.

#1 2024-04-06 12:22:14

Crushable7908
Member
Registered: 2024-04-06
Posts: 2

Full system crash when bringing up docker,task blocked for 122 seconds

Hi all,

I've been using Arch for a few years, but this is the first time I've been completely stumped because of an issue so newbie corner seems like most appropriate place to ask, hopefully someone here can help me.... Maybe this is a docker issue or a "rogue container" issue, but I'm hoping at the very least someone here could point me to a part in the logs that could help me figure this out.

Context: Desktop PC used as server. Pretty much everything is run through docker containers with docker-compose. Everything has been running stable for a few years but suddenly I've been getting big crashes.

The problem: After booting, all docker containers are started with the docker service. This causes the load to shoot up and PC to become completely unresponsive within about 2 minutes of booting.

When I connect a screen through HDMI,invariably the following errors appear 2 minutes after the pc becomes unresponsive  "Task x blocked for more than 122 seconds. Not tainted 6.8.2-arch-2-1 #1" , where x is kworker/kswapd0/btrfs-transacti/containerd/dockerd.
See photo:
WRDOQW.signal-2024-04-06-135709-002.md.jpeg

Here's a full journalctl of a boot of this happening

Some more information:
- Server is running 80+ docker containers; All of which have their own docker network.
- I've already tried completely removing and reinstalling docker , docker-compose and /var/lib/docker and re-downloading all containers without success
- I've connected a keyboard directly to the PC in case the problem was SSH related but as soon as the system becomes unresponsive direct input became impossible as well
- In htop, the 1m load always seems to shoot up, sometimes it gets to 200+ before becoming unresponsive, sometimes already happens around 80. Here's one screenshot of the last info sent by htop before becoming unresponsive:
WRIMhS.htop-fullcrash.md.png
- The problem seems to happen consistently while bringing all containers up together.
- Bringing the containers up one at a time seems to slightly improve things; In one case the system stayed up for 2 days, sometimes it crashes after 1 minute, but it does go down each time.
- Issue first happened while bringing up one specific container; I've tried completely removing this container from the stack but even then the problem seems to persist.

Let me know if you need any more logs or information to help figure this out - Thanks in advance!!

Last edited by Crushable7908 (2024-04-06 12:25:01)

Offline

Board footer

Powered by FluxBB