You are not logged in.
@all - running this test just once does not give an accurate picture... I will post some data when I have time. Reproducibility is a key factor. In other words, when you run each test 3 times (that's 36 times for each scheduler if using 1-3 threads), and you average the result and plot the error bars, you may be UNpleasantly surprised that all three schedulers are statistically equivalent.
I'm repeating now with minimal daemons in run level 3 and will update when I have results.
Last edited by graysky (2011-08-01 19:03:05)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
@all - running this test just once does not give an accurate picture... I will post some data when I have time. Reproducibility is a key factor. In other words, when you run each test 3 times (that's 36 times for each scheduler if using 1-3 threads), and you average the result and plot the error bars, you may be UNpleasantly surprised that all three schedulers are statistically equivalent.
I'm repeating now with minimal daemons in run level 3 and will update when I have results.
True. I want to polish the script so it will run 3 times, output into different dirs etc. Meanwhile, here's a test I just did, took 2 hours:
cfq seems to be fast here.
/dev/sda1 /media/3TB ntfs-3g rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other,blksize=4096 0 0
Intel 2600K @ 4.4GHz, 8GB RAM, spreadsheet here.
Offline
@Fackamato,
why do you test with ntfs-3g? That isn't exactly native filesystem and may produce weird results... I think results on drives running ext4 would be more significant.
Offline
@Fackamato,
why do you test with ntfs-3g? That isn't exactly native filesystem and may produce weird results... I think results on drives running ext4 would be more significant.
I'm just testing what I have avaialble at the moment!
Offline
So, little "refined" script.
time sudo ./iozone-scheduler.sh sdb /media/linuxmint/ 102400 /dev/shm/ 3
The above will test all available schedulers on sdb, with /media/linuxmint/ as test path, total combined file size is 100MiB, logs are stored in /dev/shm/ with /dev/shm/iozone-summary-sdb.log holding all the output from stdout.
There be bugs. Todo: Check that test path is actually on the device.
#!/bin/bash
# Test schedulers with iozone
# by fackamato, Aug 1, 2011
if [ "$EUID" -ne "0" ]; then echo "Needs su, exiting"; exit 1; fi
unset ARGS;ARGS=$#
if [ ! $ARGS -lt "5" ]; then
DEV=$1
DIR=`echo $2 | sed 's/\/$//g'`
OUTPUTDIR=`echo $4 | sed 's/\/$//g'`
if [ ! -d "$DIR" -o ! -d "$OUTPUTDIR" ]; then
echo "Error: Are "$DIR" and/or "$OUTPUTDIR" directories?"
exit 1
fi
# Find available schedulers
declare -a SCHEDULERS
SCHEDULERS=`cat /sys/block/$DEV/queue/scheduler | sed 's/\[//g' | sed 's/\]//g'`
if [ -z "$SCHEDULERS" ]; then
echo "No schedulers found! Wrong device specified?"
exit 1
else
echo "Schedulers found: "$SCHEDULERS | tee -a $SUMMARY
SIZE=$(($3/3*1024)) # Size in KB per worker
unset RUNS; declare -i RUNS;RUNS=$5
fi
SUMMARY="$OUTPUTDIR/iozone-summary-$DEV.log"
RECORDSIZE=$6
[ -z "$RECORDSIZE" ] && RECORDSIZE="16384" # Set default to 16MB
else
echo "Usage:"
echo "`basename $0` <short device name> <test directory> <total test size in MiB> <output root directory> <#runs> <record size>"
echo "time ./iozone-scheduler.sh sda /mnt 18 /dev/shm/server1 3 16384"
echo "The above command will test sda 3 times per scheduler with 18GiB of data (16MiB record size) and save logs in /dev/shm/server1/"
exit 1
fi
cd "$DIR"
unset ITERATIONS; declare -i ITERATIONS; ITERATIONS=0
until [ "$ITERATIONS" -ge "$RUNS" ]; do
for SCHEDULER in $SCHEDULERS; do
echo $SCHEDULER > /sys/block/$DEV/queue/scheduler
echo | tee -a $SUMMARY
echo "Testing $SCHEDULER with 1 thread:" | tee -a $SUMMARY
echo 3 > /proc/sys/vm/drop_caches
time iozone -R -i 0 -i 2 -i 8 -s $SIZE -r $RECORDSIZE -b $OUTPUTDIR/$SCHEDULER-t1.xls -l 1 -u 1 -F "$DIR"/iozone-temp-1 | tee -a $SUMMARY
echo | tee -a $SUMMARY
echo "Testing $SCHEDULER with 2 threads:" | tee -a $SUMMARY
echo 3 > /proc/sys/vm/drop_caches
time iozone -R -i 0 -i 2 -i 8 -s $SIZE -r $RECORDSIZE -b $OUTPUTDIR/$SCHEDULER-t2.xls -l 2 -u 2 -F "$DIR"/iozone-temp-1 "$DIR"/iozone-temp-2 | tee -a $SUMMARY
echo | tee -a $SUMMARY
echo "Testing $SCHEDULER with 3 threads:" | tee -a $SUMMARY
echo 3 > /proc/sys/vm/drop_caches
time iozone -R -i 0 -i 2 -i 8 -s $SIZE -r $RECORDSIZE -b $OUTPUTDIR/$SCHEDULER-t3.xls -l 3 -u 3 -F "$DIR"/iozone-temp-1 "$DIR"/iozone-temp-2 "$DIR"/iozone-temp-3 | tee -a $SUMMARY
done
let ITERATIONS=$ITERATIONS+1
done
echo
echo "Done! Files saved in $OUTPUTDIR, summary at $SUMMARY" | tee -a $SUMMARY
Offline
Nice script... add a -i 1 to the list of tests though.... also, if you can process all of the xls files into a single sheet, I will generate some nice plots of your data. Here is the format I need:
Test Throughput (KB/s) I/O Scheduler Threads n
Where:
Test = test name (read, workload, etc.)
Throughput = values from iozone
I/O Scheduler = name of scheduler (noop, etc.)
Threads = threads from iozone
n = the number of the itteration (i.e. if you ran it through a loop 5 times, 1, 2, 3, etc.)
Example:
http://pastebin.com/u50ntfcN
Last edited by graysky (2011-08-02 00:30:53)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Nice script... add a -i 1 to the list of tests though.... also, if you can process all of the xls files into a single sheet, I will generate some nice plots of your data. Here is the format I need:
Test Throughput (KB/s) I/O Scheduler Threads n
Where:
Test = test name (read, workload, etc.)
Throughput = values from iozone
I/O Scheduler = name of scheduler (noop, etc.)
Threads = threads from iozone
n = the number of the itteration (i.e. if you ran it through a loop 5 times, 1, 2, 3, etc.)Example:
http://pastebin.com/u50ntfcN
Ok, I just noticed the script only saves the last run (overwrites each run), will fix that. Do we want the average number per test, or should all iterations be saved separately?
Offline
Here's a run of 3 iterations per scheduler. The device is a OCZ Agility 3, limited by the SATA bus (SATA2 only). Mount options:
/dev/mapper/xubuntu-home on /home type ext4 (rw,noatime,errors=remount-ro,commit=0)
Note that it's SSD > partition > LUKS container > LVM > filesystem on LV
Still, it appears that CFQ performs nice with this setup.
Offline
@Fack - don't average them and don't save them in different files... you need to calculate some statistics on them and plot them on the graphs. The standard error is what is key for understanding between run variabilities. Format the output as I suggested (see my post with the pastebin link for an example) and I will generate the stats and graphs for you.
Last edited by graysky (2011-08-02 08:28:39)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
@Fack - don't average them and don't save them in different files... you need to calculate some statistics on them and plot them on the graphs. The standard error is what is key for understanding between run variabilities. Format the output as I suggested (see my post with the pastebin link for an example) and I will generate the stats and graphs for you.
Yeah just saw that post. How are you outputting all results to a single file? Can iozone append in some way or do I have to grep/awk the thing.
edit: I'll make the script output into that format into a separate file.
Last edited by Fackamato (2011-08-02 09:46:12)
Offline
I'm currently testing on a HP host.
Hardware: HP: ProLiant DL380 G7
CPU: 2 x Intel Xeon 2933/133 1.4v (1536k L2, 12288k L3)
RAM: 98304MB in 12 DIMMs at 1333 MHz (0.8 ns)
Firmware: HP P67 [05/14/2010]
Smart Array P812, 146GB SAS HDDs
RAID10 on 24 HDDs (1 might be a spare, not sure)
RHEL 5.3 64-bit
Test size is ~ 114GB (size of partition)
# time /tmp/iozone-scheduler-bench.sh c0d0 /local/temp/ 116736 /dev/shm 3 16384
Schedulers found: noop anticipatory deadline cfq
Edit: No go, host needs to be used for other stuff. Testing on a similar host but 16GB RAM, 32-bit (PAE), ~440GB partition
Last edited by Fackamato (2011-08-04 10:47:02)
Offline
Fixed the script, it now works on HP controllers as well (different device names).
It should output in the format requested by graysky
Last edited by Fackamato (2011-08-04 10:47:25)
Offline
@Fack - don't average them and don't save them in different files... you need to calculate some statistics on them and plot them on the graphs. The standard error is what is key for understanding between run variabilities. Format the output as I suggested (see my post with the pastebin link for an example) and I will generate the stats and graphs for you.
Could you test this? http://pastebin.com/Rdr1qcrW
/dev/cciss/c0d1p1 /mnt ext3 rw,data=ordered 0 0
# tune2fs -l //dev/cciss/c0d1p1
tune2fs 1.39 (29-May-2006)
Filesystem volume name: /arch-01
Last mounted on: <not available>
Filesystem UUID: 189c27fc-3a90-4d6e-bd88-90743f90fd0d
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal resize_inode dir_index filetype needs_recovery sparse_super large_file
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 53755904
Block count: 107504967
Reserved block count: 1075049
Free blocks: 105767275
Free inodes: 53755893
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 998
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16384
Inode blocks per group: 512
Filesystem created: Sat Jul 23 06:57:57 2011
Last mount time: Tue Aug 2 10:42:22 2011
Last write time: Tue Aug 2 10:42:22 2011
Mount count: 2
Maximum mount count: -1
Last checked: Sat Jul 23 06:57:57 2011
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
Default directory hash: tea
Directory Hash Seed: b76600c7-e4e9-4aab-8666-e7ba0ff3689e
Journal backup: inode blocks
Last edited by Fackamato (2011-08-02 14:37:25)
Offline
Test results in a Core 2600K @ 4.4GHz, 8GB RAM, ST3750640NS (old 750GB Seagate). File system is btrfs on a 180GB partition.
Mount options:
/dev/sdc7 on /mnt/btrfs_180GB type btrfs (rw,nodatacow,nobarrier,compress=lzo,noacl,space_cache)
I think there's a "bug" in the script/way I'm testing. If I'm testing with 1 thread, it should be the full size (for example 20GB). If I'm testing 2 threads, each thread should be 10GB, and so on. I think that's why the results are weird on the 1 thread test above, because I've 8GB of RAM.
Thoughts?
edit:
The max size of the buffer on this HDD is 16,384 bytes so that is the value chosen for the transfer size. Note that I setup this expeirment in consultation with Don Capps one of the iozone devs.
How do you find this buffer size?
Last edited by Fackamato (2011-08-02 22:33:48)
Offline
I think the file size is constant. 20x1 for 1 and 20x2 for 2 and so on... the buffer size on my hdd was 16k because it told me when I tried making it more that it would be limited. I'll work up your data tomorrow... too busy with work right now
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I think the file size is constant. 20x1 for 1 and 20x2 for 2 and so on... the buffer size on my hdd was 16k because it told me when I tried making it more that it would be limited. I'll work up your data tomorrow... too busy with work right now
I'll change my script, currently it divides file size by 3 (3 threads), fixing that...
You can find the buffer size in an ugly way like this:
$ hdparm -i /dev/sdc | grep BuffSize | awk '{print $2}' | sed 's/\kB,//g' | awk -F'=' '{print $2}'
Offline
New script here below. Cleaned up, now you can pass everything on the command line (iterations, threads etc) and it will generate log files (and XLS) accordingly, including the sumary file in graysky's format.
edit: use new one below
Last edited by Fackamato (2011-08-04 10:48:11)
Offline
@fackamato. I agree there's gotta be something wrong with the 1 thread test. Just check the speed for random read and mixed workload in the graphic. It's impossible your Seagate could achieve such speed transfers. 400,000 KBps = 390MBps.
2 threads and 3 threads graphics show speed transfers close to what's expected.
Another weird thing in 1 and 2 threads tests. noop beats (not by much) cfq in every (almost) scenario, but when we focus on mixed workload, cfq is by far better than noop.
Athlon II X4 620 + Gigabyte 785GPM-UD2H + 4GB DDR3 + SSD OCZ Vertex2 60GB
Archlinux x86_64 + Openbox
Offline
I must say I'm much happier with noop than cfq.
I don't have any test data yet, but my initial impression is: overall it may be a little slower, but it doesn't cause my system to start dying on me on heavy i/o like cfq does, in fact it stays very responsive.
Offline
I must say I'm much happier with noop than cfq.
I don't have any test data yet, but my initial impression is: overall it may be a little slower, but it doesn't cause my system to start dying on me on heavy i/o like cfq does, in fact it stays very responsive.
Agreed!
Last edited by Fackamato (2011-08-03 22:19:52)
Offline
Now added support for Linux MD devices, use like this:
./iozone-scheduler.sh md0 /raid6volume/temp/ 8192 /root/iozone-degraded-raid6/ 3 16 3
command device test-dir filesize-in-MB log-output-dir iterations record-size threads
(i.e. as for any device)
#!/bin/bash
# Test schedulers with iozone
# See https://bbs.archlinux.org/viewtopic.php?pid=969117
# by fackamato, Aug 1, 2011
# changelog:
# 03082011
# Added: Support for Linux MD devices
# Added/fixed: take no. of threads as argument and test accordingly (big rewrite)
# 02082011
# Added: Should now output to a file with the syntax requested by graysky
# Fixed: Add support for HP RAID devices
# Fixed: Drop caches before each test run
if [ "$EUID" -ne "0" ]; then echo "Needs su, exiting"; exit 1; fi
unset ARGS;ARGS=$#
if [ ! $ARGS -lt "5" ]; then
DEV=$1
DIR=`echo $2 | sed 's/\/$//g'` # Remove trailing slashes from path
OUTPUTDIR=`echo $4 | sed 's/\/$//g'` # Remove trailing slashes from path
# Create the log file directory if it doesn't exist
if [ ! -d "$OUTPUTDIR" ]; then mkdir -p $OUTPUTDIR;fi
# Check the test directory
if [ ! -d "$DIR" ]; then
echo "Error: Is $DIR a directory?"
exit 1
fi
# Check the device name
MDDEV="md*"
HPDEV="c?d?"
case "$DEV" in
$HPDEV ) # HP RAID
unset SYSDEV;SYSDEV="/sys/block/cciss!$DEV/queue/scheduler"
unset MD;declare -i MD;MD=0
;;
$MDDEV ) # mdadm RAID
echo "Found a Linux MD device, checking for schedulers..."
unset MD;declare -i MD;MD=1
unset SYSDEV
SYSDEV=$(mdadm -D /dev/md0 | grep active | awk -F '/' '{print $3}' | sed 's/[0-9]//g')
;;
* )
unset SYSDEV;SYSDEV="/sys/block/$DEV/queue/scheduler"
unset MD;declare -i MD;MD=0
;;
esac
# Check for the output log
unset OUTPUTLOG;OUTPUTLOG="$OUTPUTDIR/iozone-$DEV-all-results.log"
if [ -e "$OUTPUTLOG" ]; then echo "$OUTPUTLOG exists, aborting"; exit 1;fi
# Find available schedulers
if [ $MD -eq 0 ]; then
echo "not md device"
declare -a SCHEDULERS
SCHEDULERS=`cat $SYSDEV | sed 's/\[//g' | sed 's/\]//g'`
else
declare -a SCHEDULERS; unset MDMEMBER
for MDMEMBER in ${SYSDEV[@]}; do
unset SYSDEVMD;SYSDEVMD="/sys/block/"$MDMEMBER"/queue/scheduler"
done
SCHEDULERS=`cat $SYSDEVMD | sed 's/\[//g' | sed 's/\]//g'`
fi
if [ -z "$SCHEDULERS" ]; then
echo "No schedulers found! Wrong device specified? Tried looking in $SYSDEV"
exit 1
else
echo "Schedulers found under $DEV: "$SCHEDULERS
SIZE=$(($3*1024)) # Size is now MB per thread
unset RUNS; declare -i RUNS;RUNS=$5
fi
# Set record size
if [ -z "$6" ]; then
echo "Using the default record size of 16MiB"
RECORDSIZE="16384" # Set default to 16MB
else
RECORDSIZE=$6"m"
fi
# Set no. threads
if [ -z "$7" ]; then
echo "Testing with 1, 2 & 3 threads (default)"
THREADS=3
else
THREADS=$7
fi
SHELL=`which bash`
else
echo "# Usage:"
echo "`basename $0` <dev name> <test dir> <test size in MiB> <log dir> <#runs> <record size> <threads>"
echo "time ./iozone-scheduler.sh sda /mnt 20480 /dev/shm/server1 3 16 3"
echo "# The above command will test sda with 1, 2 & 3 threads 3 times per scheduler with 20GiB of data using"
echo "# 16MiB record size and save logs in /dev/shm/server1/ ."
echo "# If the record size is omitted the default of 16MiB will be used. (should be buffer size of device)"
echo "# For HP RAID controllers use device name format c0d0 or c1d2 etc."
exit 1
fi
function createOutputLog () {
unset FILE
echo -e "Test\tThroughput (KB/s)\tI/O Scheduler\tThreads\tn" > $OUTPUTLOG
for FILE in $OUTPUTDIR/$DEV*.txt; do
# results
unset WRITE;unset REWRITE; unset RREAD; unset MIXED; unset RWRITE
# Scheduler, threads, iteration
unset SCHED;unset T; unset I;unset IT
SCHED=`echo "$FILE" | awk -F'-' '{print $2}'`
T=`echo "$FILE" | awk -F'-' '{print $3}' | sed 's/t//g'`
# FIXME, it's ugly
IT=`echo "$FILE" | awk -F'-' '{print $4}'`
I=`expr ${IT:1:1}`
# Get values
WRITE=`grep " Initial write " $FILE | awk '{print $5}'`
REWRITE=`grep " Rewrite " $FILE | awk '{print $4}'`
RREAD=`grep " Random read " $FILE | awk '{print $5}'`
MIXED=`grep " Mixed workload " $FILE | awk '{print $5}'`
RWRITE=`grep " Random write " $FILE | awk '{print $5}'`
# echo "iwrite $WRITE rwrite $REWRITE rread $RREAD mixed $MIXED random $RWRITE"
# Print to the file
if [ -z "$WRITE" -o -z "$REWRITE" -o -z "$RREAD" -o -z "$MIXED" -o -z "$RWRITE" ]; then
# Something's wrong with our input file, or bug in script
echo "BUG, unable to parse result:"
echo "write $WRITE rewrite $REWRITE random read $RREAD mixed $MIXED random write $RWRITE"
exit 1
else
echo -e "Initial write\t$WRITE\t$SCHED\t$T\t$I" >> $OUTPUTLOG
echo -e "Rewrite\t$RWRITE\t$SCHED\t$T\t$I" >> $OUTPUTLOG
echo -e "Random read\t$RREAD\t$SCHED\t$T\t$I" >> $OUTPUTLOG
echo -e "Mixed workload\t$MIXED\t$SCHED\t$T\t$I" >> $OUTPUTLOG
echo -e "Random write\t$RWRITE\t$SCHED\t$T\t$I" >> $OUTPUTLOG
fi
done
}
unset ITERATIONS; declare -i ITERATIONS; ITERATIONS=0
unset CURRENTTHREADS; declare -i CURRENTTHREADS
unset IOZONECMD
cd "$DIR"
echo "Using iozone at `which iozone`"
until [ "$ITERATIONS" -ge "$RUNS" ]; do
let ITERATIONS=$ITERATIONS+1
for SCHEDULER in $SCHEDULERS; do
# Change the scheduler
if [ $MD -eq 1 ]; then
unset MEMBER
for MEMBER in $SYSDEV; do
echo $SCHEDULER > /sys/block/$MEMBER/queue/scheduler
done
else
echo $SCHEDULER > $SYSDEV
fi
CURRENTTHREADS=1
# Repeat until we've tested with all requested threads
until [ $CURRENTTHREADS -gt $THREADS ]; do
unset IOZONECMDAPPEND
IOZONECMDAPPEND="$OUTPUTDIR/$DEV-$SCHEDULER-t$CURRENTTHREADS-i$ITERATIONS.txt"
#echo "iozonecmdappend is $IOZONECMDAPPEND"
# Append all test files to the command line (threads/processes)
unset I; unset IOZONECMD_FILES
for I in `seq 1 $CURRENTTHREADS`; do
IOZONECMD_FILES="$IOZONECMD_FILES$DIR/iozone-temp-$I "
done
# Drop caches
echo 3 > /proc/sys/vm/drop_caches
echo "Testing $SCHEDULER with $CURRENTTHREADS thread(s), run #$ITERATIONS"
IOZONECMD="iozone -R -i 0 -i 2 -i 8 -s $SIZE -r $RECORDSIZE -b $OUTPUTDIR/$DEV-$SCHEDULER-t$CURRENTTHREADS-i$ITERATIONS.xls -l 1 -u $CURRENTTHREADS -F $IOZONECMD_FILES"
# Run the command
echo time $IOZONECMD
time $IOZONECMD | tee -a $IOZONECMDAPPEND
# Done testing $CURRENTTHREADS threads/processes, increase to test one more in the loop (if applicable)
let CURRENTTHREADS=$CURRENTTHREADS+1
done
done
echo "Run #$ITERATIONS done" | tee -a $IOZONECMDAPPEND
done
echo
createOutputLog
echo "Done, logs saved in $OUTPUTDIR"
exit 0
Offline
Hm, why is no one else benchmarking?
I added iozone to https://wiki.archlinux.org/index.php/Benchmarking . It takes a looooong time to benchmark with 3 iterations (been going at it for >14 hours now) if you have lots of RAM.
I suppose you could boot with memsize=1GB or similar.
Offline
why should this be unexpected? On hdd cfq is better if 3 or more threads write/read to/from hdd, that is not unexpected for me. In daily use, the situation that 2 or less threads write/read at the same time is not so common, so it does not help to switch to noop/deadline on hdds.
Offline
why should this be unexpected? On hdd cfq is better if 3 or more threads write/read to/from hdd, that is not unexpected for me. In daily use, the situation that 2 or less threads write/read at the same time is not so common, so it does not help to switch to noop/deadline on hdds.
A thing to rememeber is that when your disk is extremely busy, and you're using CFQ, your desktop will become unresponsive. This doesn't happen with any other IO scheduler.
Of course, there's ionice, but AFAIK no way to automate it in a nice way.
Offline
Hm, why is no one else benchmarking?
I added iozone to https://wiki.archlinux.org/index.php/Benchmarking . It takes a looooong time to benchmark with 3 iterations (been going at it for >14 hours now) if you have lots of RAM.
I suppose you could boot with memsize=1GB or similar.
yes, i am guilty of this...lol... it's hard for me to find time to have my main machine down, essentially. i'm gonna have a stab at this some time, i just don't know when...maybe i can setup another machine for this or something, the only limitation would be, it's not the hardware i use day to day...
Offline