You are not logged in.

#1 2015-07-10 23:33:17

abstrakct
Member
Registered: 2009-07-28
Posts: 35

[Solved, in a way] CPU goes nuts, can't find the reason!

Hello fellow archers. Last night I did a pacman -Syu for the first time in about a month. It upgraded the kernel to 4.0.7-2 among other things. After rebooting I noticed that my system used a lot less CPU while idling than before, and I thought that was great. CPU temp was nice and cool. After a few hours of uptime I noticed that the CPU was going wild, and with it the fans, while there was no reason for it do so! htop showed nothing out of the ordinary in the process list, but the bar for each CPU core was at at least 50% usage, and at least half of each bar kernel usage. I rebooted, and things seemed to back to normal - low idle cpu usage. Today the problem has kept coming back several times, the most recent after about 6 hours of uptime (see screenshot below). I can't find any pattern as to what triggers this behavior! I've tried googling around for explanations/solutions, but haven't been able to find anything useful. I didn't find anything here in the kernel forum as well.

Here's a screenshot from htop showing the situation:
8f49d90bc32779f51c1e490fdeeb1a23.th.png

There's no way the processes as htop lists them adds up to that kind of CPU usage! I've tried ending all open programs, stopping background services like transmission and such, with no change.

Here's a list of upgraded packages:
http://pastebin.com/HHiLif8M

Does anyone know what could cause this, or where I could start looking for the cause? I can't remember having had this problem before (I've used Linux for >15 years so I'm reasonably familiar with it).

I started thinking I might have been infected by a rootkit or something like that, so I installed rkhunter and unhide, but rkhunter won't work ("Invalid BINDIR configuration option: Invalid directory found: .").
"unhide brute" found a whole bunch of hidden PIDs saying "... maybe a transitory process".
"unhide proc" found nothing.
"unhide procall" found nothing.
"unhide procfs" found a couple of "... maybe a transitory process".
"unhide quick" found nothing
"unhide reverse" found nothing
"unhide sys" found nothing
These unhide tests were done when system was acting normally.


Thanks for any help solving this!!


-- mod edit: read the Forum Etiquette and only post thumbnails http://wiki.archlinux.org/index.php/For … s_and_Code [jwr] --

edit: sorry about that, fixed it.

final edit: Well, nothing seems to have worked, but I decided to try the linux-ck kernel, and now the problem is gone. so I don't know exactly what caused it, but it's somehow solved. Also, the -ck kernel (with BFQ) seems to have made my system noticably more responsive (linux-ck-k10, AMD Phenom II X4 965 BE cpu).

Last edited by abstrakct (2015-07-15 22:32:30)

Offline

#2 2015-07-11 05:20:44

ooo
Member
Registered: 2013-04-10
Posts: 1,638

Re: [Solved, in a way] CPU goes nuts, can't find the reason!

If there's no process utilizing the cpu, this would most likely be a issue in kernel (or external module like binary nvidia drivers etc.). Did you try downgrading your kernel or linux-lts if the issue persists?

Offline

#3 2015-07-11 10:27:16

Soukyuu
Member
Registered: 2014-04-08
Posts: 854

Re: [Solved, in a way] CPU goes nuts, can't find the reason!

I see btrfs-progs. If you're using btrfs and have run duperemove, the CPU will spike while the kernel is processing the dedupe requests. Could take a while depending on your data.
Saw that behavior yesterday on my system, with ~90% of CPU used but no processes actually using that much.


[ Arch x86_64 | linux | Framework 13 | AMD Ryzen™ 5 7640U | 32GB RAM | KDE Plasma Wayland ]

Offline

#4 2015-07-11 21:33:24

abstrakct
Member
Registered: 2009-07-28
Posts: 35

Re: [Solved, in a way] CPU goes nuts, can't find the reason!

Thanks for your suggestions. I've downgraded the kernel and the nvidia drivers to the previous version I had installed, but the problem is still there. This time it started less than 10 minutes after boot.

I do use btrfs, on my /home partition, but haven't run duperemove. But maybe btrfs is doing something? There's a lot of btrfs- kernel threads in the process list, I'm not sure if that's normal, I started using btrfs a couple of months ago. But something might be up, available space on /home has gone down about 10 GB the last few days or so, but I haven't added anything special to the file system (yeah, I know free space calculation is weird on btrfs, but still, it might be related?)..

I'll investigate more, and appreciate any more input! Gotta reboot again now...

Last edited by abstrakct (2015-07-11 21:47:42)

Offline

#5 2015-07-12 00:29:54

Soukyuu
Member
Registered: 2014-04-08
Posts: 854

Re: [Solved, in a way] CPU goes nuts, can't find the reason!

I'm not too experienced with btrfs, but I when I used it for /home I had hangs every now and then, especially under heavy I/O (more than expected from "just" having heavy disk I/O). If you are using ANY databases (baloo, akonadi) or any other files with lots of random write access, you should disable COW on those directories by setting the C attribute

chattr +C /path/to/dir

Else you will have performance issues because btrfs will be rewriting the whole file every time you write to it (or something along these lines).
But simply setting the attribute is not enough, you will have to copy the files to another dir, then copy them back - only newly created files will be affected by the C attribute, so moving does not work.

I have a lot of btrfs processes myself, but there have been no hangs so far. I'm using it for / only though, /home is plain ext4.

Another possibility is that you don't have your /home mounted with autodefrag, in which case there might be some heavily fragmented files (chromium profile/cache, baloo db) which according to btrfs wiki lead to spikes in CPU usage/hangs. That could explain why the problem appeared recently - the files weren't fragmented at first.

You could run a "btrfs filesystem defragment", but beware that if you have any snapshots, they will take their FULL space after the defragmentation, because defrag is currently not snapshot aware. You could run duperemove on your btrfs-root to dedupe the snapshots after the defrag operation though.


[ Arch x86_64 | linux | Framework 13 | AMD Ryzen™ 5 7640U | 32GB RAM | KDE Plasma Wayland ]

Offline

#6 2015-07-13 12:04:19

abstrakct
Member
Registered: 2009-07-28
Posts: 35

Re: [Solved, in a way] CPU goes nuts, can't find the reason!

Thanks again!
Looking at iotop when the problem is happening shows that kworker is doing something about once a second. "jbd2/sda5-8" is also doing something quite often. sda5 is my root partition, using ext4. edit: this behavior seems normal, it's like this when the problem is not happening as well.

I've tried disabling COW on firefox/chromium cache directories and cleaned out the cache, but that didn't do anything. Not much is running in the background, no databases or such. Btrfs is not mounted with autodefrag option. Is it on by default? Also, to better describe what's happening: it's not like a hang or a short term spike in cpu usage, once it happens the cpu usage gets very high (40-75% on each core), about half of it is used by the kernel, and it stays like that until I reboot. (See screenshot in first post).

Edit: I'm thinking maybe systemd/journald is to blame? I had set journald to use max 3Gb, and the journal directory was almost 3G, so maybe that's it? The journal gets full, journald has to do a lot of work to clean out old logs? I also noticed stuff like transmission-daemon and sonarr were writing to the journal quite often. I don't know how intelligent journald is, but maybe my system got into a kind of permanent "write to journal -> oh no it's full -> delete old stuff -> hey, also write this -> oh wait, i must delete more -> and so on ..." kind of loop?!

I also realized I installed "snapper" a while ago and kinda forgot that I'd enabled it. It takes hourly snapshots of my /home btrfs, maybe that's somehow the cause of the problem? I disabled it for now, let's see if that makes any difference.

Last edited by abstrakct (2015-07-13 13:23:43)

Offline

Board footer

Powered by FluxBB