Observed Intermittent skipping, poor performance, ONLY on Kernel 5.4+!

Zaxth · 2020-02-21 19:00:12

There's more to this bug than just emulation. I've noticed that when running anything remotely intensive the system gets very sluggish, like that bug that was a couple years back where if you copied a lot of files to a hdd, the system got very sluggish and unresponsive. This is exactly the same behaviour.

Zaxth · 2020-02-21 19:51:39

I made a video showcasing the issue, sometimes pictures and sound are better at conveying a message than words.
https://www.youtube.com/watch?v=YLiYzNg6CJE

Zaxth · 2020-02-21 22:15:42

I don't get it. The last good is 0b3e797 on the 17th.
The next on the 17th is 16208cd.
So I do git bisect start
git bisect good 0b3e797
git bisect bad 16208cd
Bisecting: a merge base must be tested
[a55aa89aab90fae7c815b0551b07be37db359d76] Linux 5.3-rc6
But this is no help at all, because 5.3 is confirmed working. I am lost and out of patience, I've spent a week on this, which is more than a user should be expected to do trying to fix an operating system. I've compiled 28 different kernels trying to diagnose this.
If noone has an inkling of whats going on, and isn't experiencing this themselves, then I guess I can't do anything.

Last edited by Zaxth (2020-02-21 22:16:44)

progandy · 2020-02-21 23:11:29

This is the bisection strategy for branches at work. 0b3e797 is not a direct merge onto the torvalds master branch, so both of your chosen commits are on different branches, and the bisection algorithm goes back to the common ancestor. The algorithm wants to make sure the bug did not already exist before the branches diverged and has been inadvertently fixed in the "good" branch.
Your frustration seems to stem from your manual bisection process and choosing commits that won't help you in finding the issue.

This is a good read:
https://mirrors.edge.kernel.org/pub/sof … k2009.html

Last edited by progandy (2020-02-21 23:14:09)

Zaxth · 2020-02-22 00:20:42

progandy wrote:

This is the bisection strategy for branches at work. 0b3e797 is not a direct merge onto the torvalds master branch, so both of your chosen commits are on different branches, and the bisection algorithm goes back to the common ancestor. The algorithm wants to make sure the bug did not already exist before the branches diverged and has been inadvertently fixed in the "good" branch.
Your frustration seems to stem from your manual bisection process and choosing commits that won't help you in finding the issue.
This is a good read:
https://mirrors.edge.kernel.org/pub/sof … k2009.html

Yes that is a good read, but I'm afraid I'm going to have to leave it here. I have spent too much time on this already.

Ropid · 2020-02-22 04:04:25

I remembered I started to get machine-check-event warnings/errors about the PCIe bus recently. I had to add "pcie_aspm=off" to the kernel command line to fix those warnings/errors. I blamed an update to the motherboard's BIOS for this, but now that I think about it some more, the problem might have coincided with the update from kernel 5.4.x to 5.5.x.

The way things showed up for me here was, the errors showed up when the machine was under stress, for example when compiling something on all cores or when testing with "stressapptest" or with "mprime -t". There were no errors when the machine was mostly idle, for example when just using the web browser.

Maybe check in your logs to see if you have a similar problem? You might have overlooked those kinds of errors messages. Here's an example of the error messages:

Feb 13 19:13:32 hostname kernel: pcieport 0000:00:03.1: AER: Multiple Corrected error received: 0000:00:00.0
Feb 13 19:13:32 hostname kernel: pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Feb 13 19:13:32 hostname kernel: pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00000040/00006000
Feb 13 19:13:32 hostname kernel: pcieport 0000:00:03.1: AER:    [ 6] BadTLP

Last edited by Ropid (2020-02-22 04:04:37)

Zaxth · 2020-02-23 00:31:27

Ropid wrote:

I remembered I started to get machine-check-event warnings/errors about the PCIe bus recently. I had to add "pcie_aspm=off" to the kernel command line to fix those warnings/errors. I blamed an update to the motherboard's BIOS for this, but now that I think about it some more, the problem might have coincided with the update from kernel 5.4.x to 5.5.x.
The way things showed up for me here was, the errors showed up when the machine was under stress, for example when compiling something on all cores or when testing with "stressapptest" or with "mprime -t". There were no errors when the machine was mostly idle, for example when just using the web browser.
Maybe check in your logs to see if you have a similar problem? You might have overlooked those kinds of errors messages. Here's an example of the error messages:
Feb 13 19:13:32 hostname kernel: pcieport 0000:00:03.1: AER: Multiple Corrected error received: 0000:00:00.0
Feb 13 19:13:32 hostname kernel: pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
Feb 13 19:13:32 hostname kernel: pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00000040/00006000
Feb 13 19:13:32 hostname kernel: pcieport 0000:00:03.1: AER:    [ 6] BadTLP                

Thank you for your suggestion, however I didn't have this issue.
I have found a workaround however. Turning off C-States in the bios gave me my performance back on kernels 5.4 through kernel 5.6rc2.
I suppose that limits the possible bad commits. Given that 5.3 didn't require me to turn off c-states.

Arch Linux

#26 2020-02-21 19:00:12

Re: Observed Intermittent skipping, poor performance, ONLY on Kernel 5.4+!

#27 2020-02-21 19:51:39

Re: Observed Intermittent skipping, poor performance, ONLY on Kernel 5.4+!

#28 2020-02-21 22:15:42

Re: Observed Intermittent skipping, poor performance, ONLY on Kernel 5.4+!

#29 2020-02-21 23:11:29

Re: Observed Intermittent skipping, poor performance, ONLY on Kernel 5.4+!

#30 2020-02-22 00:20:42

Re: Observed Intermittent skipping, poor performance, ONLY on Kernel 5.4+!

#31 2020-02-22 04:04:25

Re: Observed Intermittent skipping, poor performance, ONLY on Kernel 5.4+!

#32 2020-02-23 00:31:27

Re: Observed Intermittent skipping, poor performance, ONLY on Kernel 5.4+!

Board footer