[SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

mcloaked · 2023-07-22 20:46:07

Just to add to this thread - after reading here I just killed mysqld on my system running plasma desktop with the most recent kernel and that resolved the rogue core showing 100% also, at least according to the Simple System monitor widget on the plasma desktop - and that process was related to akonadi:

$ ps -eaf| grep mysql
mike        1215    1209  0 13:25 ?        00:00:03 /usr/bin/mysqld --defaults-file=/home/mike/.local/share/akonadi/mysql.conf --datadir=/home/mike/.local/share/akonadi/db_data/ --socket=/run/user/1000/akonadi/mysql.socket --pid-file=/run/user/1000/akonadi/mysql.pid

All I then did was killall mysqld and the 100% cpu went to near zero.

Another bit of evidence:

$ mpstat -P ALL
Linux 6.4.4-stable-1 (lenovo2)  22/07/23        _x86_64_        (8 CPU)

21:55:25     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
21:55:25     all    2.31    0.00    0.76   11.98    0.19    0.04    0.00    0.00    0.00   84.71
21:55:25       0    2.51    0.00    0.77    0.08    0.11    0.04    0.00    0.00    0.00   96.50
21:55:25       1    1.89    0.00    0.59    0.05    0.38    0.04    0.00    0.00    0.00   97.05
21:55:25       2    2.43    0.00    0.80    0.07    0.09    0.03    0.00    0.00    0.00   96.59
21:55:25       3    2.05    0.00    0.57   94.46    0.07    0.04    0.00    0.00    0.00    2.81
21:55:25       4    2.57    0.01    0.77    0.06    0.09    0.03    0.00    0.00    0.00   96.47
21:55:25       5    2.62    0.00    1.20    0.08    0.14    0.04    0.00    0.00    0.00   95.92
21:55:25       6    2.29    0.00    0.83    0.98    0.13    0.11    0.00    0.00    0.00   95.66
21:55:25       7    2.14    0.00    0.59    0.05    0.48    0.02    0.00    0.00    0.00   96.71
$ killall mysqld
$ mpstat -P ALL
Linux 6.4.4-stable-1 (lenovo2)  22/07/23        _x86_64_        (8 CPU)

21:55:46     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
21:55:46     all    2.31    0.00    0.76   11.98    0.19    0.04    0.00    0.00    0.00   84.71
21:55:46       0    2.51    0.00    0.77    0.08    0.11    0.04    0.00    0.00    0.00   96.50
21:55:46       1    1.89    0.00    0.59    0.05    0.38    0.04    0.00    0.00    0.00   97.05
21:55:46       2    2.43    0.00    0.80    0.07    0.09    0.03    0.00    0.00    0.00   96.59
21:55:46       3    2.05    0.00    0.57   94.40    0.07    0.04    0.00    0.00    0.00    2.88
21:55:46       4    2.57    0.01    0.77    0.06    0.09    0.03    0.00    0.00    0.00   96.47
21:55:46       5    2.62    0.00    1.20    0.08    0.14    0.04    0.00    0.00    0.00   95.92
21:55:46       6    2.29    0.00    0.83    1.03    0.13    0.11    0.00    0.00    0.00   95.61
21:55:46       7    2.14    0.00    0.59    0.05    0.48    0.02    0.00    0.00    0.00   96.71

So the 94.6 %iowait has not gone once mysqld is killed even though the Simple Monitor is no longer showing the rogue core at 100%. So in my case it must be something else causing core 3 to get 100% iowait. However the Simple System monitor widget is no longer showing 100% despite the mpstat showing the iowait is still high.

On the other hand on a second machine which showed the same behaviour, if I am not logged in to the plasma desktop running a wayland session, and ssh in, and run the mpstat command there is no high iowait on any of the cores. So this looks related in my case to running kernel 6.4.4 with the plasma desktop running.

Last edited by mcloaked (2023-07-22 21:26:25)

mcloaked · 2023-07-23 06:32:34

I downgraded the kernel back to 6.4.3 and the issue is no longer present - so it is clearly a bug in kernel 6.4.4

Fijxu · 2023-07-23 06:46:48

Ok, I will report this tomorrow or later to Bugzilla and I will link it here.

mcloaked · 2023-07-23 07:32:16

I reported at https://bugzilla.kernel.org/show_bug.cgi?id=217700 - please add any further info to that bug report to help the developers get this fixed.

Fijxu · 2023-07-23 07:37:29

It is my first time filling a Kernel bug but I guess this looks great. We don't have more information about it, just that mysqld process eats a whole core with io-wait.

Linux 6.4.4 Regression: MariaDB (mysqld) causes one core of the CPU to use 100% with io-wait operations.

Meanwhile, for people who have the same problem, you can use the linux-lts 6.1.38 package or Linux version 6.4.3.

Downgrading packages - Downgrading the kernel

I will leave direct links to Linux 6.4.3 in case someone needs it:

https://archive.archlinux.org/repos/202 … kg.tar.zst
https://archive.archlinux.org/repos/202 … kg.tar.zst

Linux LTS:

https://archive.archlinux.org/repos/202 … kg.tar.zst
https://archive.archlinux.org/repos/202 … kg.tar.zst

Last edited by Fijxu (2023-07-23 08:06:34)

Fijxu · 2023-07-23 07:39:36

mcloaked wrote:

I reported at https://bugzilla.kernel.org/show_bug.cgi?id=217700 - please add any further info to that bug report to help the developers get this fixed.

Woops, I was writing post #30 and then your post appeared. Anyways, we should use your bug report or mine?

mcloaked · 2023-07-23 07:43:19

No problem - I guess upstream will link the two bugs - or one of us can link from one to the other?

I have linked yours to mine upstream now.

Last edited by mcloaked (2023-07-23 07:45:03)

Fijxu · 2023-07-23 07:44:21

mcloaked wrote:

or one of us can link from one to the other?

Link your bug with mine, mine is linked to this thread

ugjka · 2023-07-23 07:44:31

linux-lts has the same problem

Fijxu · 2023-07-23 07:47:22

ugjka wrote:

linux-lts has the same problem

I will test it right now, last time that i tried Linux 6.1.38 the bug wasn't there

ugjka · 2023-07-23 07:51:28

It is at version 6.1.39-1 now

Fijxu · 2023-07-23 07:57:04

ugjka wrote:

It is at version 6.1.39-1 now

You are right, Linux-lts 6.1.39 also introduces this bug... I will modify my bugzilla report and my post #30. Thanks for reporting it.

https://i.ayaya.beauty/wWYx.mp4

loqs · 2023-07-23 08:01:20

@Fijxu I would suggest adding to the bug report that the issue has been bisected to bd4f737b145d85c7183ec879ce46b57ce64362e1 which is 8a796565cec3601071cbbd27d6304e202019d014 in mainline.
Edit:
There is a Bisected commit-id field.

Last edited by loqs (2023-07-23 08:02:40)

Fijxu · 2023-07-23 08:04:05

loqs wrote:

@Fijxu I would suggest adding to the bug report that the issue has been bisected to bd4f737b145d85c7183ec879ce46b57ce64362e1 which is 8a796565cec3601071cbbd27d6304e202019d014 in mainline.
Edit:
There is a Bisected commit-id field.

Thanks, I will add it.

mcloaked · 2023-07-23 08:29:27

loqs wrote:

@Fijxu I would suggest adding to the bug report that the issue has been bisected to bd4f737b145d85c7183ec879ce46b57ce64362e1 which is 8a796565cec3601071cbbd27d6304e202019d014 in mainline.
Edit:
There is a Bisected commit-id field.

Thanks, and I have added that to my bug upstream once the regression flag was changed from No to Yes.

post-factum · 2023-07-23 09:41:36

This should have been reported directly to maintainers using email instead of BZ (almost no one reads the BZ): [1].

[1] https://lore.kernel.org/lkml/12251678.O … enko.name/

xerxes_ · 2023-07-23 09:55:04

But is this high cpu io-wait really affecting performance of whole system or single processes?

GeneArch · 2023-07-23 10:01:27

@post-factum Thanks for pushing to email - faster than waiting for thorsten to tag the bz.
I can confirm linus tree as of now (commit c2782531397f5c) has same problem still.

mcloaked · 2023-07-23 13:10:19

Email is in LKML and two kernel BZ now so hopefully this will get resolved upstream now. One question: 6.4.5 is out now, but still has the problematic commit in - so should there be an arch flyspray bug report in about this issue so that if the arch 6.4.5 kernel is built, then it should have the listed bad commit reverted by the kernel maintainer before release?

I have reported at https://bugs.archlinux.org/task/79177 so this can be tracked by the arch kernel maintainer(s)

Last edited by mcloaked (2023-07-23 13:52:34)

loqs · 2023-07-23 16:15:30

https://lore.kernel.org/lkml/538065ee-4 … kernel.dk/

Jens Axboe wrote:

...this is very much expected. It's now just correctly reflecting that one thread is waiting on IO. IO wait being 100% doesn't mean that one core is running 100% of the time, it just means it's WAITING on IO 100% of the time.

Last edited by loqs (2023-07-23 16:16:11)

mcloaked · 2023-07-23 16:25:14

Yes - but it is not clear to me that if seeing the iowait pegged at 100% being the 'correct' behaviour means it is fine to have mariadb/mysql doing so - and whether this means that io to block device(s) will clear, or whether there would be potential loss of data if the flush doesn't complete. Perhaps mariadb has always had this issue, and the new commit merely exposed it? I don't know whether there is still an issue and whether a bug in userspace code needs to be found and fixed or not?

GeneArch · 2023-07-23 16:44:03

loqs wrote:

https://lore.kernel.org/lkml/538065ee-4 … kernel.dk/
Jens Axboe wrote:
...this is very much expected. It's now just correctly reflecting that one thread is waiting on IO. IO wait being 100% doesn't mean that one core is running 100% of the time, it just means it's WAITING on IO 100% of the time.

Right so this indeed sounds like not-a-bug.

Question : when this came to my attention, I had an unused mariadb running - so it should be doing nothing. At least nothing useful.

Be good to understand what mariadb is doing to cause the iowaits to persist for such a long time (many hour since machine was booted)s?

I was checking mpstat with 2 second interval (mpstat -P ALL 2 10) - so in theory things could have flushed from kernel buffers out to physical and then more IO comes in and goes back to iowait().
And this is to a spinning disk. In this case mpstat mioght indeed show 100% iowait as observed by (slow to update) user space tool.

I would also think that after mariadb is shutdown, that eventually the block layer would flush through and iowaits would die away.

On 6.4.5 stable with the revert discussed here, 'iotop -bo | grep maria' shows nothing at all. Seemingly no IO happening.

Can anyone help explain why the iowaits as triggered by mariadb remain high even with idle mariadb process apparently doing no IO?

frostschutz · 2023-07-23 17:50:54

According to strace, mariadb seems to be doing a lot of futex() and not much else. I'm not sure how that translates to iowait…

ugjka · 2023-07-23 19:32:35

Need to make a bugreport to KDE that their cpu monitor reports IO wait as actual cpu utilization. Right now I get 2 cores pegged to 100% lol

mcloaked · 2023-07-23 19:44:10

I don't know if the issue I posted at https://github.com/dhabyx/plasma-simple … /issues/61 is where to put this or somewhere else?

Arch Linux

#26 2023-07-22 20:46:07

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#27 2023-07-23 06:32:34

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#28 2023-07-23 06:46:48

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#29 2023-07-23 07:32:16

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#30 2023-07-23 07:37:29

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#31 2023-07-23 07:39:36

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#32 2023-07-23 07:43:19

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#33 2023-07-23 07:44:21

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#34 2023-07-23 07:44:31

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#35 2023-07-23 07:47:22

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#36 2023-07-23 07:51:28

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#37 2023-07-23 07:57:04

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#38 2023-07-23 08:01:20

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#39 2023-07-23 08:04:05

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#40 2023-07-23 08:29:27

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#41 2023-07-23 09:41:36

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#42 2023-07-23 09:55:04

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#43 2023-07-23 10:01:27

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#44 2023-07-23 13:10:19

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#45 2023-07-23 16:15:30

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#46 2023-07-23 16:25:14

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#47 2023-07-23 16:44:03

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#48 2023-07-23 17:50:54

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#49 2023-07-23 19:32:35

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

#50 2023-07-23 19:44:10

Re: [SOLVED] Linux 6.4.4 introduces high IO-Wait CPU operations on one ...

Board footer