You are not logged in.
I just got my new laptop (BTO, i9 5Ghz 8 core dual threaded, 4K Intel UHD 660/NVidia RTX 2070, 32GB) and immediately installed Arch with i3 on it.
On my previous BTO, when my machine was idle, or playing video or other relaxing stuff, the coolers never really spun up. It was a super silent machine. Only when I started gaming, it started to make some noise.
But with this laptop, it even spins up without anything running. And it's a lot of noise when it does. The CPU temperature is constantly higher than I think it should be, between 63-75 degrees Celcius fluctuating (yes, during idling).
The load average is around 0.1 when idle, but when I look in htop, I see that one core, and it's every time the same one, the 9th one, is constantly doing around 65%. The bar is red, which is if I'm not mistaken kernel usage.
In the processes, I don't see anything special except that /usr/lib/Xorg :0 -seat seat0 -auth /run/lightdm/root/:0 -nolisten tcp vt7 -notvswitch is almost always on top with 3-8% cpu usage (which still isn't 65%).
I don't know what that is. I don't remember seeing it on my other laptop.
I don't really see any other things that may cause it.
Edit: normal top command tells me: %Cpu(s): ~1.1 us, ~0.3 sy, 0.0 ni, ~95.0 id, 0.0 wa, ~4.0 hi, 0.0 si, 0.0 st. ~ because I typed them one at a time and they won't add up.
As I run i3 with i3 blocks, configured the kernel to always use Nvidia (so CPU shouldn't be heating up for graphics), I would think that this system would be really cool on idle. My desktop PC does as steady 38 C on idle.
I have been looking for answers for a week now and I figured it was time to ask for help. Anyone?
Last edited by scippie (2020-01-17 07:09:16)
Offline
There must be at least some checks you guys can tell me to do, no?
Have I missed a wiki page?
Offline
Check your CPU Frequency Scaling.
I believe you can set the scaling for each individual core.
Offline
I believe you can set the scaling for each individual core.
I have been doing this to lower the cooling noise when not asking high performance stuff.
But that is not my question...
I need to find out where this high kernel cpu usage is coming from and I don't know how to do that.
I don't have this on any other Arch installation (I have 4) and I use about the same software on all of them.
Offline
Profile the kernel, maybe you're hitting sort of https://bbs.archlinux.org/viewtopic.php?id=252066 (ie. idle_poling for unknown reasons) and we can look for patterns…
Offline
I didn't know I could do that (can't find a wiki page on that topic). I'll certainly look into it later today! Thanks!
Offline
About that red color in htop's CPU usage graph, that could be caused by a "kernel thread" or by a hardware device sending a lot of interrupts.
For kernel threads, type Shift+K in htop to toggle the listing of kernel threads on or off. In the tree view, the kernel thread lines will show up at the end, after the normal processes. The lines will show up in a different color than normal processes.
About hardware interrupts, try to research this by running the following in a terminal window:
watch -n.5 cat /proc/interrupts
See if that table that gets displayed has a number that is increasing fast. That line might then be about the device that's causing the CPU usage. At the end of the line there will be a name about what driver is serving the interrupt.
Offline
Sorry that it took me so long.
Profile the kernel, maybe you're hitting sort of https://bbs.archlinux.org/viewtopic.php?id=252066 (ie. idle_poling for unknown reasons) and we can look for patterns…
I was unsuccesful at profiling the kernel. I added profile=2 to the kernel parameters which caused a /proc/profile file to be created.
But when I then tried readprofile, it complained that no /boot/kernel....map file was found. Don't remember the exact filename of it, but nothing that looked like it was there.
I am booting with EFI. Is that maybe a problem?
I did try explicitly setting the kernel parameter idle=halt. It didn't change anything.
For kernel threads, type Shift+K in htop to toggle the listing of kernel threads on or off. In the tree view, the kernel thread lines will show up at the end, after the normal processes. The lines will show up in a different color than normal processes.
Found them. What should I look into? I don't see anything 'odd' there...
About hardware interrupts, try to research this by running the following in a terminal window:
watch -n.5 cat /proc/interrupts
See if that table that gets displayed has a number that is increasing fast. That line might then be about the device that's causing the CPU usage. At the end of the line there will be a name about what driver is serving the interrupt.
I also tried watching the /proc/interrupts file suggested above and it indeed shows something that increases fast:
10: 0 0 0 0 0 0 0 0 366093551 0 0 0 0 0 0 0 IR-IO-APIC 10-fasteoi tpm0
The high number is on CPU8 which is core 9 in htop so this seems right and doesn't look normal.
But I don't know what IR-IO-APIC is. I know the APIC is the advanced programmable interrupt controller (I have written my own OS).
Does this line say where the problem lies, or do I need to research further?
Edit: I noticed that watch prevents me from viewing below the first n lines, so I watched it manually by repeating the command quickly.
This line also went higher quicker than others:
RES: 908849 453870 438798 181506 176933 138664 118404 106964 11854957 110938 103016 113868 90854 91028 91673 89905 Rescheduling interrupts
The high number (not as high as the first one) is also on CPU8.
Last edited by scippie (2020-01-16 11:21:05)
Offline
That "tpm0" word at the very end of the line is the interesting one. It points to a module or device with that name.
This "tpm" name is probably about the "trusted platform module" hardware in your laptop. It's a chip that's for saving encryption keys.
Last edited by Ropid (2020-01-16 20:28:49)
Offline
This "tpm" name is probably about the "trusted platform module" hardware in your laptop. It's a chip that's for saving encryption keys.
Thanks. I didn't know about the existence of this hardware and that it was installed in my laptop.
I have done some research about it on the wiki so now I know what it does.
So I went to the BIOS and disabled it and hooray... you are completely right. The high kernel usage is gone.
Is this TPM hardware important? I mean, does it make my system more secure or something?
As I didn't know I had it, I never missed it of course, I never used it manually at least.
But maybe the kernel uses it and it is just better that way for some reasons?
Can you elaborate on that?
If it is not really necessary to have it enable, I really don't care.
But still, I guess that it should just work without overloading the system with interrupts of course, so why doesn't it work as it should on my computer?
Is there a way for me to find out?
Big thanks for helping me find the problem!
Offline
It's okay that the TPM device is missing. This device is only standard on laptops. On desktop PCs, it's generally not there and things run fine.
The perhaps most interesting thing you can do with the TPM is make it hold the key for disk encryption. When you type your key's passphrase at boot, you get the key to unlock the encrypted drive out of the TPM.
You can also make the TPM share the key automatically without the need for any passphrase at boot. You can then have full encryption without any hassle, the laptop will behave just like a non-encrypted system and boot to the normal login screen where you type your user's password. But when someone tries to boot from a USB flash drive, then the TPM will refuse to share the key without passphrase and the encrypted drive can't be unlocked, making this kind of setup somewhat safe.
About what to do about the interrupt problem, I'd try finding the exact name of the kernel module for your TPM hardware and then try to search for bug reports for that module and see what people discuss about it.
Offline
Ok, thanks Ropid for the extensive help. I'll mark this thread as solved.
Offline