You are not logged in.
Pages: 1
Hi,
this is more of a curiosity question rather than a problem, although it is in fact a problem.
What's the deal with all the crappy linux I/O and CPU scheduling? Or am I just missing something?
I mean how come untarring a full gzipped server backup (say (30)10GB of (un)compressed data, tens of thousands of files) from a secondary hard drive to (the same) secondary hard drive on a dedicated(!) controller slows down the entire system?
Using CFQ elevator makes the system virtually unusable - mouse, keyboard, audio, video, heck even scrolling in terminal and window switching lags in order of tenths of seconds (sometimes even up to a second). On an i7 (4-core) system with 8GB RAM. Switching to BFQ makes things a bit better - lags scale down about ten times but it's still there - nicely visible in videos or even small animated gifs. The sluggish feel is just there. And it won't stop until the tar archive is extracted.
BFQ seems to be a bit more CPU-intensive (a rough estimate made just by looking at sensors plugin gadget). Did I mention I'm currently running 3.12.2-1-ck kernel with BFS CPU scheduler?
So, is it a hardware problem? Like, is the x86_64 architecture design so flawed it just makes one I/O channel influence all the buses there are? Are 8 cpus (ok, four CPUs, but there is this HyperThreading thingie) not enough to handle a couple of gigs of data? Or is it more of a software problem? And if so, where is it and what can I do to help fixing it?
Yes, I did google (and yandexed, ddg-ed, hekc even binged) and I found many links, about 80% of which were more or less spam and the rest just blamed the all-inferior CFQ.
I do agree with many people that CFS and CFQ are bad choices for most desktop-like scenarios (who the hell needs 8192 CPUs scaling?) but for me not even the famous BFQ did a decent job.
Why? (this is not a rethorical question   )
 )
-miky
What happened to Arch's KISS? systemd sure is stupid but I must have missed the simple part ...
... and who is general Failure and why is he reading my harddisk?
Offline

Well there's lots of anecdotal evidence there as to what it going on, but no hard evidence of a reason.
Have a look at top (particularly the "%cpu(s)" line) to see if it is the CPU being overloaded or not) while the system is in it's lagged state. Also iostat and iotop may give you some indication of where exactly the issue is. Does dmesg show anything out of the ordinary? Have your fsck'ed the filesystem? What filesystem are you using? It could be as simple as a bad hard drive cable or a failing hard drive. Check with smartmontools to see if the hard drive is having a lot of re-reads or reallocated sectors.
Are you familiar with our Forum Rules, and How To Ask Questions The Smart Way?
BlueHackers // fscanary // resticctl
Offline
OK, I give up. I did all the test I could think of before posting, mostly what you suggested:
Have a look at top (particularly the "%cpu(s)" line) to see if it is the CPU being overloaded or not) while the system is in it's lagged state.
CPUs are pretty much bored, occasional spikes on one or two CPUs up to 60-70% for far less than a second.
Also iostat and iotop may give you some indication of where exactly the issue is.
Not really. Although I've never heard of iostat before (hey, thanks!) its output doesn't seem to provide any leads, looks quite normal. iotop says exactly what I expected it to say - it shows just one process (gzip or dd, see below) consuming about 95% IO.
Does dmesg show anything out of the ordinary? Have your fsck'ed the filesystem? What filesystem are you using?
nope, yes (no errors), ext4 (realtime,data=ordered)
It could be as simple as a bad hard drive cable or a failing hard drive. Check with smartmontools to see if the hard drive is having a lot of re-reads or reallocated sectors.
nope, drive's healthy like a fresh born baby. Even ran a short SMART self-test, no errors whatsoever.
Funnily enough, after running these tests again, just to be sure, plus some more tests (notably the dd and cat ones, below), the problem seems to have vanished.
For the other tests, here's the playset: the drive is a WDC WD20EARS (2TB Caviar Green Eco-something - the one that's supposed to be energy-efficient, well, it's slow and quiet allright, one of the quietest drives I had. It uses 4k blocks internally but emulates 512B LBA), with MBR (yes, I know..) and two partitions (properly aligned - at least according to partition table, only God and Western Digital know how are they stored internally) - first being some 50Gigs of Windows remnants, not even mounted, and the other one filling up the rest of the drive, encrypted using LUKS. On the LUKS partition there is LVM set with currently four volumes, swap, two ext4's and one freshly created, unformatted. The original problem was noticable when using one of the ext4 partitions, the read-extract-write process was isolated to that one partition (while the other was also mounted but not used actively).
The other tests involved crazy stuff like
cat /dev/sdb2 > /dev/null
dd if=/proc/kcore of=/dev/mapper/volgroup-testand variations there upon. All ran as root (I'm hardcore  ).
 ).
In no test since posting the OP I have ever encountered a similar problem. Bites me. The only noticeable lag was when I tried to play a 1080p video from that very same drive (during all of the tests) which resulted in playback not choppy but interrupted occasionally (every 5 secs or so) suggesting a bad cache management rather than IO problems. Smaller videos played flawlessly.
Now what to do? 
No way I'm switching back to CFQ though! Not today that is, must have some sleep...
-m.
What happened to Arch's KISS? systemd sure is stupid but I must have missed the simple part ...
... and who is general Failure and why is he reading my harddisk?
Offline

MikyMaus, a possible cause might be the HDD controller you use.
I use a dual socket servertype motherboard (AMD opteron) for my main desktop, with a Nvidia MCP55Pro chipset .
4 SATA drives + 2 optical drives are connected to it, and i don't notice those problems when i copy large stuff on the same drive.
another possible cause are WM/DE , init system.
I use openrc as init system , NOT systemd, and LXDE/Openbox .
Also the program/command you use might have an influence, i typically use Krusader.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
 Try clean chroot manager by graysky
Offline

Did you have a look into sysctl.conf ?
There are a lot of switches to fine tune a particular system.
Offline

It's a famous problem.
Personally, I mount ext4 with:
mount -o remount,rw,noatime,nobarrier,commit=60 /To turn ext4 barriers off, and a "commit" of 60 seconds rather than 5.
Offline

itman's post reminded me i had changed sysctl.conf a few months ago :
# Contains, as a percentage of total system memory, the number of pages at which
# a process which is generating disk writes will start writing out dirty data.
vm.dirty_ratio = 3
# Contains, as a percentage of total system memory, the number of pages at which
# the background kernel flusher threads will start writing out dirty data.
vm.dirty_background_ratio = 2My system has 16 GB ram, so 3 % is 480 MiB and 2% 320 MiB .
That setting may very well be the reason my system is a lot less sluggish.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
 Try clean chroot manager by graysky
Offline
MikyMaus, a possible cause might be the HDD controller you use.
I use a dual socket servertype motherboard (AMD opteron) for my main desktop, with a Nvidia MCP55Pro chipset .
4 SATA drives + 2 optical drives are connected to it, and i don't notice those problems when i copy large stuff on the same drive.another possible cause are WM/DE , init system.
I use openrc as init system , NOT systemd, and LXDE/Openbox .Also the program/command you use might have an influence, i typically use Krusader.
Well, i have 6 or 7 hard disk drive and i found that copying large amount of data between my hard disk the quad core cpu (AMD X4-955 Phenom II with Asus MA770 motherboard, there is also AMD chipset on it) is being used around 80-90% of his capabilities (in english - i would have to use "power" in term of speed ?), so in fact...it can slow down the system, but i copy files over ntfs-3g on ntfs disk and partitions.
Last edited by firekage (2013-12-05 23:08:00)
Offline
itman's post reminded me i had changed sysctl.conf a few months ago :
Where this conf file is located? I tried to search it, i don't have it.
Offline

It's a famous problem.
I don't suffer from this issue whatsoever... but that is a neat article. Thanks for the link brebs.
Offline
Did you have a look into sysctl.conf ?
Offline
Guys, I'm impressed how rational this thread is. Frankly, I was afraid of this turning into a flamewar 
Anyway, I had this problem before, even on different hardware, mostly PATA-based way before EXT4 and even on some server hardware, it just wasn't bad enough. I lost my nerves one day though, googled around and then posted this question. I'm pretty sure now it's not a hardware-related problem. Not directly that is.
For reasons I have yet to discover the problem disappeared after second reboot with the elevator=bfq kernel parameter - I'm pretty sure there was no other change involved. So far I did not try any tweak suggested here, but I will, as soon as the problem reappears (if it does).
Just some info about my box:
i7 3770 Ivy Bridge - factory clock settings (UEFI/BIOS set to "auto")
some 8 gigs of DDR3 at 1833MHz
integrated graphics HD 4000 (dynamic shared memory as far as I know)
Asus P7V series mainboard based on Z77 chipset with two SATA controllers:
00:1f.2 SATA controller: Intel Corporation 7 Series/C210 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
03:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01) //this is where the hard drive in question is connectedNo special controller settings (like RAID or anything).
Using pretty much default Arch configuration (systemd) with full-blown XFCE, no special hardware-related tweaks, no compositor (as if that was related but one never knows). The only programs that were running save from XFCE-related stuff was Firefox with about 20 tabs (with Flash), couple of idle terminal windows and the I/O consuming tar in one of the terminals (plus later some stress-testing apps like mplayer), no KDE or QT-related services. Pulseaudio was up though.
Running ivy kernel from pre-compiled package at http://repo-ck.com/. I vaguely remember this problem might have been introduced with CFQ but I wouldn't vouch for that. I remember 2.4 kernel being noticeably faster than 2.6 in it's infancy (<2.6.10) but I have no idea if that may even be related to the current problem.
-miky
What happened to Arch's KISS? systemd sure is stupid but I must have missed the simple part ...
... and who is general Failure and why is he reading my harddisk?
Offline

Firefox
Firefox uses sqlite, which uses fsync - recompile sqlite with fsync off.
Offline
mr.MikyMaus wrote:Firefox
Firefox uses sqlite, which uses fsync - recompile sqlite with fsync off.
Good point, but not in this case. Firefox profile directory is on a different physical hard drive on a different physical controller.
-m.
What happened to Arch's KISS? systemd sure is stupid but I must have missed the simple part ...
... and who is general Failure and why is he reading my harddisk?
Offline
It may not be an I/O issue, on 3-12 kernel I have huge performance issues on system load, see filed bug
https://bugzilla.kernel.org/show_bug.cgi?id=66141
Offline
Pages: 1