You are not logged in.
The lore I found while googling is that cfq is the best I/O scheduler to use for HDDs. I was therefore under the preconception that cfq was best for HDDs, but you can see in the data that is NOT the case under most of the tested situations.
Experiment design
Iozone was used via the following script to evaluate the effect of different I/O schedulers on a single HDD system (standard workstation). In all, the three standard I/O schedulers were used (cfq, deadline, and noop) under 1, 2, or 3 threads (that is, 1, 2, or 3 writer apps). Since the machine has 8 GB of physical memory, the test file size was chosen to be a little beyond this @ 8.3 GB to prevent actions from going into page cache. The max size of the buffer on this HDD is 16,384 bytes so that is the value chosen for the transfer size. Note that I setup this expeirment in consultation with Don Capps one of the iozone devs.
Test system details
Intel X3360 CPU under Arch x86_64 (linux-3.0-2 package) with an up-to-date system as of 31-July-2011
Mobo is an Intel P45-based one w/ 8 G of RAM.
The HDD used for the tests is Seagate 31000528AS with AHCI and NCQ enabled.
I did the benchmarks writing to the 2nd partition (about 80 G into the disk) which was formated to ext4 and mounted with the Arch defaults +noatime.
#!/bin/bash
if [ ! -d /dev/shm/iozone ]; then
tar Jxf /root/bin/iozone.tar.xz -C /dev/shm
fi
for pass in 1 2 3; do # run each three times
for i in noop deadline cfq bfq; do
echo $i >/sys/block/sda/queue/scheduler
cd /home
/dev/shm/iozone/iozone -R -i 0 -i 1 -i 2 -i 8 -s 4404020 -r 16384 -b /home/$i-t1-$pass.xls -l 1 -u 1 -F /home/1
/dev/shm/iozone/iozone -R -i 0 -i 1 -i 2 -i 8 -s 4404020 -r 16384 -b /home/$i-t2-$pass.xls -l 2 -u 2 -F /home/1 /home/2
/dev/shm/iozone/iozone -R -i 0 -i 1 -i 2 -i 8 -s 4404020 -r 16384 -b /home/$i-t3-$pass.xls -l 3 -u 3 -F /home/1 /home/2 /home/3
done
done
Data
Note that I just ran this set with an n=1; the reported values do not have error bars. This raises questions about where one can statistically differential between close results. I will re-reun the entire benchmark 3 times tonight, and average the results. Where appropriate, I will indicate the standard error on the graphs to show statical significance. Stay tuned...
Conclusion
In general for systems with similar specs, either noop or deadline give superior results to cfq for both mixed workload and write scenarios. In general, cfq gives slightly better results for heavy read scenarios. I switched over to noop after running this experiment given my typical system usage. I think this is a reflection that, for this HDD anyway, the onboard queuing via NCQ does a better job than the ones in the linux kernel. Hope others find this insightful!
Last edited by graysky (2011-08-01 18:59:14)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
no bfq?
i'll look into this ~ and get back to you later on
Last edited by triplesquarednine (2011-07-30 14:31:40)
Offline
What is the significance of threads when it comes to disk io?
I think of it in terms like cpu threads, but most things are single-threaded when it comes to cpu. Are all disk operations single-threaded? Does testing multiple threads equate to testing multiple different apps accessing the HDD?
Offline
@triple - It's not clear to me that the 3D plots area all that telling (perhaps my ignorance). Guess what I'd like to see is a relevant read of performance as I have presented them... throughput vs. threads. Also, no bfq because paolo has yet to release one for the 3.0.x series of kernels.
@pog - I think threads refers to how many different apps are accessing at any given timepoint.
Last edited by graysky (2011-07-31 21:08:37)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
@triple - It's not clear to me that the 3D plots area all that telling (perhaps my ignorance). Guess what I'd like to see is a relevant read of performance as I have presented them... throughput vs. threads
Hey graysky, sorry i haven't gotten back to you sooner. I've emailed an acquaintance of mine, who is very knowledgeable when it comes to this kind of thing, being as a big part of his work is system analysis. i also posted a link to this thread for him to look at... ~ i haven't heard back from him yet, but hopefully will soon. I also haven't had a chance to test your script out (as i haven't been home all weekend, but i should be home tomorrow. )..
i am not positive on whether or not this would be the best test. hopefully, the guy i know (or possibly some other guru) might be able to shed light on the subject. *wink* *wink* - if there is such an archer reading this post
about the 3dplot ~ on top of specific tests, i would also like a nice big overview of each scheduler's benchmarks. they may be other considerations and/or strengths/weaknesses that i think might be nice to have a look at.
@pog - I think threads refers to how many different apps are accessing at any given timepoint.
this is what i also believe.
cheerz
Offline
Some quick notes; be sure to drop caches before each test: echo 3 > /proc/sys/vm/drop_caches
Use a test size of at least 2x RAM
Offline
OK guys, after several emails with iozone dev Don Capps, I re-ran the benchmarks under more relevant conditions and edited the first post of this thread accordingly.
Last edited by graysky (2011-07-31 21:09:17)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Sweet, testing this on my HTPC now. (which also runs other things like squid etc)
Offline
OK guys, after several emails with iozone dev Don Capps, I re-ran the benchmarks under more relevant conditions and edited the first post of this thread accordingly.
Nice one Graysky, you went directly to the source (the developer). Interesting results. it would seem in most situations noop is the winner. i think i will be switching/testing it out when i get home.
I was pretty sure iozone was going to be the tool of choice, for this type of benchmarking....but thanks for all of your work and research on this, i'm sure others will find it interesting and useful. I would have liked to contribute more, but on short notice - i got invited up to my friend's cottage for the long weekend, so i only got an old craptop with me, hardly worth testing this kind of thing on.
cheerz
Offline
Hm, how come my xls files look like this with no charts:
iozone -R -i 0 -i 2 -i 8 -s 8703181 -r 16384 -b /dev/shm/cfq-t1.xls -l 1 -u 1 -F /mnt/1
Throughput report Y-axis is type of test X-axis is number of processes
Record size = 16384 Kbytes
Output is in Kbytes/sec
Initial write 200036,9063
Rewrite 206328,8125
Random read 174993,6406
Mixed workload 175314,7813
Random write 193413,8906
Or do you create the charts yourself?
Offline
Hey Graysky,
i just got an email from the guy i was talking about, Dale, who does system analysis as a main part of his job... he took a look at the thread and said your results are pretty much spot on, and what he would have expected to find as well... He also commented to me about cfq and the folklore surrounding it around the web. he basically said that it is politics and ego that have presented cfq as being the best io scheduler, where as in reality it's not as good as some people think.
he didn't get into too much detail but he did say that iozone was a nice choice for this type of benchmark.
cheerz
Last edited by triplesquarednine (2011-08-01 01:28:38)
Offline
It would be interesting to see how these results compare to BFQ.
Offline
An advantage of CFQ (and BFQ these days, so I hear) is that the ionice command can be used. Everyone interested in performance, responsiveness etc. should check it out.
Offline
First, I started to modify the script (not done yet, it doesn't work for RAID where you have to set the scheduler per underlying device):
$ cat iozone-scheduler.sh
#!/bin/bash
ARGS=$#
if [[ "$ARGS" -ne "0" ]]; then
DEV=$1
DIR=$2
if [ ! -d "$DIR" ]; then
echo "error: $DIR not found"
exit 1
fi
SIZE=$(($3/3*1024*1024)) # Size in KB per worker
FREE=`df -m $DIR | grep -vE '^Filesystem|tmpfs|cdrom|none' | awk '{ print $3 }' | grep -v $DIR | grep -v dev`
[[ -z "$FREE" ]] && FREE=`df -m $DIR | grep -vE '^Filesystem|tmpfs|cdrom|none' | awk '{ print $4 }'`
if [[ -z "$FREE" ]]; then echo "Is $DIR a sane filesystem? Unable to check free space."; exit 1;fi
echo "free is $FREE"
exit 1
else
echo "usage: ./iozone-scheduler.sh <short device name> <test directory> <total test size in GB>"
echo "example: ./iozone-scheduler.sh sda /mnt 18"
exit 1
fi
for i in noop deadline cfq; do
echo $i > /sys/block/sdb/queue/scheduler
echo $i
cd "$DIR"
iozone -R -i 0 -i 2 -i 8 -s $SIZE -r 16384 -b /dev/shm/$i-t1.xls -l 1 -u 1 -F "$DIR"/1
iozone -R -i 0 -i 2 -i 8 -s $SIZE -r 16384 -b /dev/shm/$i-t2.xls -l 2 -u 2 -F "$DIR"/1 "$DIR"/2
iozone -R -i 0 -i 2 -i 8 -s $SIZE -r 16384 -b /dev/shm/$i-t3.xls -l 3 -u 3 -F "$DIR"/1 "$DIR"/2 "$DIR"/3
done
I attempted to benchmark my degraded RAID6 array during the night. I benchmarked noop, deadline, cfq & bfq. System specs:
4GB RAM, root and swap on Corsair F60 SSD
Zotac motherboard with an Intel Atom (dualcore, 1.8GHz), Nvidia Ion chipset. kernel26-ck (2.6.39.x)
RAID6 array consisting of 5x WD20EARS and 2x SAMSUNG HD204UI (of which one is broken and disabled)
That's right, the RAID6 array is degraded (missing 1 disk). No other process was using the filesystem at this time.
LVM setup on top of md0. MD Chunk size: 64KB. ext4 on top of LVM, mount options: /dev/mapper/lvstorage-storage /raid6volume ext4 rw,noatime,errors=remount-ro,nouser_xattr,barrier=1,stripe=80,data=ordered 0 0
read ahead for md0 is 16MB, for the lv it's 2MB, her actual HDD it's 256KB.
Here's tune2fs -l for the filesystem:
tune2fs 1.41.14 (22-Dec-2010)
Filesystem volume name: storage
Last mounted on: /raid6volume
Filesystem UUID: 0ca82f13-680f-4b0d-a5d0-08c246a838e5
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: journal_data_writeback
Mount options: data=ordered
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 609484800
Block count: 2437938944
Reserved block count: 243793
Free blocks: 204760897
Free inodes: 609016425
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 442
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stride: 16
RAID stripe width: 80
Flex block group size: 16
Filesystem created: Tue Oct 19 09:13:23 2010
Last mount time: Sun Jul 31 22:41:02 2011
Last write time: Sun Jul 31 22:41:02 2011
Mount count: 5
Maximum mount count: 15
Last checked: Sun Jul 31 19:41:52 2011
Check interval: 15552000 (6 months)
Next check after: Fri Jan 27 18:41:52 2012
Lifetime writes: 5986 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 55ace4c8-fa05-49b7-99fe-d49249ace525
Journal backup: inode blocks
Here are the results:
1 thread
2 threads
3 threads
MOD EDIT -- Please follow image posting rules -- Inxsible
Conclusion: I need to rerun all results, looks like something went wrong. No?
Last edited by Inxsible (2011-08-01 13:51:26)
Offline
It looks like BFQ pwned them all.
Offline
maybe the RAID is to blame for that? I just have one HDD (non-SSD), I'll try to run the same benchmark as you guys and post the results
btw, how can I get the most complete spec list for my HDD?
Last edited by el mariachi (2011-08-01 12:44:25)
Offline
It looks like BFQ pwned them all.
I'm rerunning the tests now, the results for BFQ do not look realistic.
Last edited by Fackamato (2011-08-01 13:08:55)
Offline
I just have one HDD (non-SSD), I'll try to run the same benchmark as you guys and post the results
I found that on my Intel SSD (X25-M G2, 160GB) with ext4 CFQ was actually fastest between noop,cfq,deadline (no bfq tested) in all tests except when there was only 1 thread. noop was fastest when there was only 1 thread, but fell behind deadline and cfq on "mixed workload" and "rewrite". cfq is almost always fastest on "initial writes".
Mount options for the above test:
/dev/sdb4 /media/linuxmint ext4 rw,relatime,user_xattr,acl,barrier=1,data=ordered 0 0
btw, how can I get the most complete spec list for my HDD?
Find the model number (hdparm -I or -i, or smartctl -a), then google.
Offline
I found that on my Intel SSD (X25-M G2, 160GB) with ext4 CFQ was actually fastest between noop,cfq,deadline (no bfq tested) in all tests except when there was only 1 thread. noop was fastest when there was only 1 thread, but fell behind deadline and cfq on "mixed workload" and "rewrite". cfq is almost always fastest on "initial writes".
Mount options for the above test:
/dev/sdb4 /media/linuxmint ext4 rw,relatime,user_xattr,acl,barrier=1,data=ordered 0 0
Odd. cfq is the less suitable scheduler for an SSD. When i was testing all 3 schedulers when i purchased the SSD, i ended up sticking to noop.
There's something weird in your tests.
Also, the most important test for an ssd (people dont use SSD for storage, but for the os) is random reads (i'm interested in 1k and 4k) . You don't say anything about that test.
Last edited by Viper_Scull (2011-08-01 13:42:33)
Athlon II X4 620 + Gigabyte 785GPM-UD2H + 4GB DDR3 + SSD OCZ Vertex2 60GB
Archlinux x86_64 + Openbox
Offline
el mariachi wrote:I found that on my Intel SSD (X25-M G2, 160GB) with ext4 CFQ was actually fastest between noop,cfq,deadline (no bfq tested) in all tests except when there was only 1 thread. noop was fastest when there was only 1 thread, but fell behind deadline and cfq on "mixed workload" and "rewrite". cfq is almost always fastest on "initial writes".
Mount options for the above test:
/dev/sdb4 /media/linuxmint ext4 rw,relatime,user_xattr,acl,barrier=1,data=ordered 0 0
Odd. cfq is the less suitable scheduler for an SSD. When i was testing all 3 schedulers when i purchased the SSD, i ended up sticking to noop.
There's something weird in your tests.
Also, the most important test for an ssd (people dont use SSD for storage, but for the os) is random reads (i'm interested in 1k and 4k) . You don't say anything about that test.
I agree, I didn't expect that result. I'll see if I can post the graphs (like above) later today.
Offline
I agree, I didn't expect that result. I'll see if I can post the graphs (like above) later today.
When you do, please follow image posting rules. Make use of a image hosting service and post only thumbnails here.
There's no such thing as a stupid question, but there sure are a lot of inquisitive idiots !
Offline
New results here: http://stuff3.imgur.com/iozone_on_intel_x25g2_partition
They are somewhat skewed because the partition wasn't big enough (RAM/cache kicked in).
tune2fs -l test-partition
tune2fs 1.41.14 (22-Dec-2010)
Filesystem volume name: <none>
Last mounted on: /media/linuxmint
Filesystem UUID: 9970457a-ef0d-4750-88fa-c8d6a526ba90
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 1305600
Block count: 5218304
Reserved block count: 260915
Free blocks: 3990428
Free inodes: 1119792
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1022
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8160
Inode blocks per group: 510
Flex block group size: 16
Filesystem created: Fri May 13 17:23:50 2011
Last mount time: Mon Aug 1 15:58:29 2011
Last write time: Mon Aug 1 15:58:29 2011
Mount count: 30
Maximum mount count: 32
Last checked: Fri May 13 17:23:50 2011
Check interval: 15552000 (6 months)
Next check after: Wed Nov 9 16:23:50 2011
Lifetime writes: 256 GB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: ce646643-2d04-441a-bfed-54ade07fa6ba
Journal backup: inode blocks
mount options:
/dev/sdb4 /media/linuxmint ext4 rw,relatime,user_xattr,acl,barrier=1,data=ordered 0 0
Offline
Second test run on the degraded RAID6 array with the low power Atom:
1 thread
2 threads
3 threads
(if someone has a nice spreadsheet where I can just paste numbers that would be nice, Excel/Open/LibreOffice Calc graphs isn't my main strength...)
I can say one thing: CFQ totally kills the system as far as interactivity goes. I'm doing these benchmarks on the server, my laptop and my workstation. On all machines, when the cfq tests kicks in, everything becomes sluggish. When BFQ, noop or deadline is being tested, everything flies. (the OS is not on the same physical device where the tests are being done)
Edit: I'm switching to deadline on my RAID6 array for now. I'll test the OS partition on the same server now as well (Corsair F60)
Last edited by Fackamato (2011-08-01 16:47:53)
Offline
This took 4 hours to test. ntfs-3g pegged the CPU, around 70% or so. This was run on a Lenovo T510, 4GB RAM, Core i5 @ 2.4GHz (performance governor), HT enabled, kernel 2.6.39-ck1 running Lubuntu.
The OS exists on a different physical drive, so the test target was never accessed by anything other than iozone. ntfs-3g version 2010.8.8-0ubuntu1
Mount options:
/dev/sda3 /media/128GB_NTFS ntfs-3g rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096 0 0
Commands used: (supposed to be tab separated, copy paste from spread sheet)
Scheduler Processes Size (KB) Record size (KB) Initial write Rewrite Random read Mixed workload Random write Command line
noop 1 5242880 16384 60607 59153 58355 54175 58322 iozone -R -i 0 -i 2 -i 8 -s 5242880 -r 16384 -b /dev/shm/sda-noop-t1.xls -l 1 -u 1 -F /media/128GB_NTFS//1
deadline 1 5242880 16384 41707 46849 36726 43154 42981 iozone -R -i 0 -i 2 -i 8 -s 5242880 -r 16384 -b /dev/shm/sda-deadline-t1.xls -l 1 -u 1 -F /media/128GB_NTFS//1
cfq 1 5242880 16384 51445 47955 53049 54107 49371 iozone -R -i 0 -i 2 -i 8 -s 5242880 -r 16384 -b /dev/shm/sda-cfq-t1.xls -l 1 -u 1 -F /media/128GB_NTFS//1
Scheduler Processes Size (KB) Record size (KB) Initial write Rewrite Random read Mixed workload Random write Command line
noop 2 5242880 16384 57015 57508 20404 45679 52105 iozone -R -i 0 -i 2 -i 8 -s 5242880 -r 16384 -b /dev/shm/sda-noop-t2.xls -l 2 -u 2 -F /media/128GB_NTFS//1 /media/128GB_NTFS//2
deadline 2 5242880 16384 45308 45226 19769 47239 39975 iozone -R -i 0 -i 2 -i 8 -s 5242880 -r 16384 -b /dev/shm/sda-deadline-t2.xls -l 2 -u 2 -F /media/128GB_NTFS//1 /media/128GB_NTFS//2
cfq 2 5242880 16384 50050 51582 21229 52960 44332 iozone -R -i 0 -i 2 -i 8 -s 5242880 -r 16384 -b /dev/shm/sda-cfq-t2.xls -l 2 -u 2 -F /media/128GB_NTFS//1 /media/128GB_NTFS//2
Scheduler Processes Size (KB) Record size (KB) Initial write Rewrite Random read Mixed workload Random write Command line
noop 3 5242880 16384 50737 54546 20754 17267 44535 iozone -R -i 0 -i 2 -i 8 -s 5242880 -r 16384 -b /dev/shm/sda-noop-t3.xls -l 3 -u 3 -F /media/128GB_NTFS//1 /media/128GB_NTFS//2 /media/128GB_NTFS//3
deadline 3 5242880 16384 38156 37001 18843 16932 39908 iozone -R -i 0 -i 2 -i 8 -s 5242880 -r 16384 -b /dev/shm/sda-deadline-t3.xls -l 3 -u 3 -F /media/128GB_NTFS//1 /media/128GB_NTFS//2 /media/128GB_NTFS//3
cfq 3 5242880 16384 44698 41136 20625 19232 45020 iozone -R -i 0 -i 2 -i 8 -s 5242880 -r 16384 -b /dev/shm/sda-cfq-t3.xls -l 3 -u 3 -F /media/128GB_NTFS//1 /media/128GB_NTFS//2 /media/128GB_NTFS//3
Conclusion: For this particular HDD and configuration, noop wins hands down.
Edit: you can find the files here: http://stuff.dyndns.org/logs/ , in a few minutes when the SSD is done testing. (files are on the SSD)
Last edited by Fackamato (2011-08-01 17:42:25)
Offline
Here are results for the ext4 partition (root) on a Corsair F60. The fs is directly on the partition, nothing in between.
Mount options:
/dev/sda3 / ext4 rw,noatime,errors=remount-ro,nouser_xattr,acl,barrier=1,data=ordered,discard 0 0
tune2fs -l :
tune2fs 1.41.14 (22-Dec-2010)
Filesystem volume name: ssd_root
Last mounted on: /
Filesystem UUID: 6e812ed7-01c4-4a76-ae31-7b3d36d847f5
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 3530752
Block count: 14103781
Reserved block count: 705065
Free blocks: 8487245
Free inodes: 3351246
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 956
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Tue Apr 19 15:58:24 2011
Last mount time: Sun Jul 31 22:41:01 2011
Last write time: Sun Jul 31 22:35:37 2011
Mount count: 21
Maximum mount count: 22
Last checked: Tue Jun 28 23:32:19 2011
Check interval: 15552000 (6 months)
Next check after: Sun Dec 25 22:32:19 2011
Lifetime writes: 9 TB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 800087
Default directory hash: half_md4
Directory Hash Seed: 1f9f7535-1a3d-4e38-95b5-5072eb06db00
Journal backup: inode blocks
Conclusion: noop wins big on this SSD setup.
Offline