You are not logged in.
I setup a test rig for benchmarking my arch linux Sandy Bridge system using the Phoronix test suite. By default Phoronix uses its own (outdated) versions of system software. For these tests I pointed each tests "bin" directory at /usr so that the current arch linux versions were used for lzma, pbzip2, openssl, sqlite3, gnupg, povray, graphicsmagick, ogg, flac, mp3 encoding, and ffmpeg. The performance results are on openbenchmark.org, Archlinux-optimize-sandybridge.
I chose the liquorix kernel as my preferred kernel for low-latency. I ran the interbench latency benchmarks on the stock kernel, the ck kernel, and the lqx kernel for linux-3.0.8 and 3.1.1; the liquorix kernel (linux-lqx from AUR) showed the lowest latencies. The ck kernel and the lqx kernel were both compiled with BFS and 1000 Hz clock, both show lower latency than the stock kernel, lqx is slightly superior (lower latency).
In the Phoronix performance benchmarks the liquorix kernel "wins" the majority of tests. My intention is to now compile all of the core software from ABS, optimized for the native architecture (corei7-avx), but before I proceed I'd like some input on the sqlite results:
Sqlite benchmark time:
Stock kernel: 5.81 sec +- 0.1 sec
Linux-ck: 6.97 sec +- 0.05 sec
Linux-lqx: 6.90 sec +- 0.01 sec
The sqlite test was run on an SSD with the deadline scheduler. Why do the ck and lqx kernels give lower performance on this test than the stock kernel? Is there any way to configure them to get the low latency AND higher performance on sqlite?
To me the most interesting result in the Phoronix tests is the cpu utilization during x.264 playback. If you look at those graphs carefully, notice how low cpu utilization is under the lqx kernel. Wow! The best playback performance is obtained using vaapi (requires libva, libva-driver-intel, and mplayer-vaapi):
x.264 playback with vaapi:
Stock kernel: 3.3% cpu
Linux-ck: 2.2% cpu
Linux-lqx: 1.4% cpu
Offline
Very interesting graphs. -ck seems to be slower than other two, but you can't measure responsiveness...
Is it too much to ask to repeat same test with linux-pf? Although it should behave same as -ck, the added BFQ might make a difference
Xyne wrote:
"We've got Pacman. Wacka wacka, bitches!"
Offline
Very interesting graphs. -ck seems to be slower than other two, but you can't measure responsiveness...
Is it too much to ask to repeat same test with linux-pf? Although it should behave same as -ck, the added BFQ might make a difference
For measuring responsiveness I use interbench as a metric to guide me. Some typical results with 3.1 kernel:
Stock ARCH
--- Benchmarking simulated cpu of X in the presence of simulated ---
Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met
None 0.0 +/- 0.1 1.0 100 99.2
Video 0.0 +/- 0.1 1.0 100 99.2
Burn 25.2 +/- 40.3 101.0 26.4 15.7
Write 0.0 +/- 0.1 1.0 100 99.2
Read 0.0 +/- 0.1 1.0 100 99.2
Compile 26.6 +/- 42.1 101.0 25.3 14.8
Memload 0.0 +/- 0.2 2.0 99.7 98.7
--- Benchmarking simulated cpu of Gaming in the presence of simulated ---
Load Latency +/- SD (ms) Max Latency % Desired CPU
None 0.1 +/- 0.1 0.3 99.9
Video 0.1 +/- 0.1 0.7 99.9
X 0.1 +/- 0.1 0.1 99.9
Burn 21.0 +/- 44.1 113.0 82.7
Write 0.1 +/- 0.1 0.3 99.9
Read 0.1 +/- 0.1 0.7 99.9
Compile 38.4 +/- 60.9 111.9 72.2
Memload 0.8 +/- 0.8 1.0 99.2
linux-ck
--- Benchmarking simulated cpu of X in the presence of simulated ---
Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met
None 0.0 +/- 0.1 1.0 100 99.2
Video 0.0 +/- 0.1 1.0 100 99.2
Burn 5.8 +/- 19.5 98.0 46.1 42.5
Write 0.0 +/- 0.1 1.0 100 99.2
Read 0.0 +/- 0.1 1.0 100 99.2
Compile 4.7 +/- 11.5 58.0 59.1 50.3
Memload 0.0 +/- 0.2 2.0 98.7 97.8
--- Benchmarking simulated cpu of Gaming in the presence of simulated ---
Load Latency +/- SD (ms) Max Latency % Desired CPU
None 0.2 +/- 0.2 0.2 99.8
Video 0.2 +/- 0.2 0.2 99.8
X 0.2 +/- 0.2 0.2 99.8
Burn 28.8 +/- 49.6 102.2 77.6
Write 0.3 +/- 0.3 0.8 99.7
Read 0.4 +/- 0.4 1.3 99.6
Compile 27.6 +/- 29.5 68.8 78.4
Memload 1.3 +/- 1.4 7.0 98.8
linux-lqx
--- Benchmarking simulated cpu of X in the presence of simulated ---
Load Latency +/- SD (ms) Max Latency % Desired CPU % Deadlines Met
None 0.0 +/- 0.1 1.0 100 99.1
Video 0.0 +/- 0.1 1.0 100 99.1
Burn 3.2 +/- 14.8 100.0 49.5 46.8
Write 0.0 +/- 0.1 1.0 100 99.1
Read 0.0 +/- 0.1 1.0 100 99.1
Compile 5.1 +/- 12.4 87.0 56.1 47.3
Memload 0.0 +/- 0.2 2.0 98.5 98
--- Benchmarking simulated cpu of Gaming in the presence of simulated ---
Load Latency +/- SD (ms) Max Latency % Desired CPU
None 0.1 +/- 0.1 0.2 99.9
Video 0.1 +/- 0.1 0.2 99.9
X 0.1 +/- 0.1 0.2 99.9
Burn 23.4 +/- 44.9 102.2 81
Write 0.2 +/- 0.3 1.1 99.8
Read 0.3 +/- 0.4 0.8 99.7
Compile 28.2 +/- 30.7 84.4 78
Memload 1.2 +/- 1.3 1.9 98.8
I have BFQ compiled into the liquorix kernel but I've been using CFQ recently because BFQ borked large file transfers onto any jfs filesystem. Maybe that bug is gone. I don't know. Yes BFQ is nice in my experience, but the above tests all use CFQ as do the Phoronix benchmarks I published.
Uhhh... I was hoping nobody would ask for pf benchmarks. I'm sorry I don't have numbers. I used linux-pf on and off with the 2.6.38 kernels and convinced myself that ck was better, although all kernels with the BFS patch are very similar. IMO there are a few tweeks in the liquorix kernel that give it the (slight) edge in both performance and responsiveness.
Offline
Well, I'm more (happily) surprised when -ck and -lqx kernels are beating the stock kernel given that the BFS scheduler is optimized for responsiveness. The stock kernel being less tuned towards responsiveness 'should' logically have better throughput, right?.
Offline
Well, I'm more (happily) surprised when -ck and -lqx kernels are beating the stock kernel given that the BFS scheduler is optimized for responsiveness. The stock kernel being less tuned towards responsiveness 'should' logically have better throughput, right?.
I've concluded that it is indeed true that the stock kernel gives better throughput on some database type transactions where the BFS scheduler "fails" to give preference to the i/o threads that would, if allowed, use up more of the cpu (but degrade responsiveness). Hence the stock kernel IS faster on the sqlite benchmark. For a server use the default, well-tested and highly-optimized CFS. For a desktop system I much prefer kernels with the Brain Fucked Scheduler, it does give lower latencies and fewer stalls and glitches in audio and video even when the cpu is heavily loaded doing work like my frequent big compile jobs.
Heck, I used to write "operating systems" for embedded control systems. I got incredible performance using the simplest "scheduler" of all -- none. We used round-robin scheduling, which you can only do well if one team is writing ALL of the software that will run on the system. Every task ran with complete control of the cpu until it did i/o or voluntarily relinquished the cpu. Our rule of thumb was that code we wrote should never hold the cpu for more than 1 ms. We got hundreds of real-time tasks running big industrial machinery with dozens of motors under software control, hundreds of i/o points being monitored and controlled, and two operator interfaces. On a i386 cpu! "Low latency" was a very hard requirement: failing to respond to a switch within 100ms could crash a 5 ton rapidly rotating granite grinding wheel into a 50 ton hard roll from a steel rolling mill. The best way to get low latency was not to try to complicate the scheduler by letting it waste thousands of cpu cycles sorting through "priorities," but to have no scheduler at all and to invest all cpu cycles into getting work done. SO I have a strong preference for simple schedulers and well-written software, neither of which we have in Linux, but I think BFS is a move in the right direction.
Offline