Benchmarked stock kernel vs ck and liquorix - why does sqlite degrade?

sitquietly · 2011-11-16 18:44:37

I setup a test rig for benchmarking my arch linux Sandy Bridge system using the Phoronix test suite. By default Phoronix uses its own (outdated) versions of system software. For these tests I pointed each tests "bin" directory at /usr so that the current arch linux versions were used for lzma, pbzip2, openssl, sqlite3, gnupg, povray, graphicsmagick, ogg, flac, mp3 encoding, and ffmpeg. The performance results are on openbenchmark.org, Archlinux-optimize-sandybridge.

I chose the liquorix kernel as my preferred kernel for low-latency. I ran the interbench latency benchmarks on the stock kernel, the ck kernel, and the lqx kernel for linux-3.0.8 and 3.1.1; the liquorix kernel (linux-lqx from AUR) showed the lowest latencies. The ck kernel and the lqx kernel were both compiled with BFS and 1000 Hz clock, both show lower latency than the stock kernel, lqx is slightly superior (lower latency).

In the Phoronix performance benchmarks the liquorix kernel "wins" the majority of tests. My intention is to now compile all of the core software from ABS, optimized for the native architecture (corei7-avx), but before I proceed I'd like some input on the sqlite results:

Sqlite benchmark time:
Stock kernel: 5.81 sec +- 0.1 sec
Linux-ck: 6.97 sec +- 0.05 sec
Linux-lqx: 6.90 sec +- 0.01 sec

The sqlite test was run on an SSD with the deadline scheduler. Why do the ck and lqx kernels give lower performance on this test than the stock kernel? Is there any way to configure them to get the low latency AND higher performance on sqlite?

To me the most interesting result in the Phoronix tests is the cpu utilization during x.264 playback. If you look at those graphs carefully, notice how low cpu utilization is under the lqx kernel. Wow! The best playback performance is obtained using vaapi (requires libva, libva-driver-intel, and mplayer-vaapi):

x.264 playback with vaapi:
Stock kernel: 3.3% cpu
Linux-ck: 2.2% cpu
Linux-lqx: 1.4% cpu

alecmg · 2011-11-23 10:53:25

Very interesting graphs. -ck seems to be slower than other two, but you can't measure responsiveness...
Is it too much to ask to repeat same test with linux-pf? Although it should behave same as -ck, the added BFQ might make a difference

sitquietly · 2011-11-23 15:56:42

alecmg wrote:

Very interesting graphs. -ck seems to be slower than other two, but you can't measure responsiveness...
Is it too much to ask to repeat same test with linux-pf? Although it should behave same as -ck, the added BFQ might make a difference

For measuring responsiveness I use interbench as a metric to guide me. Some typical results with 3.1 kernel:

Stock ARCH

--- Benchmarking simulated cpu of X in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	   0.0 +/- 0.1        1.0		 100	       99.2
Video	   0.0 +/- 0.1        1.0		 100	       99.2
Burn	  25.2 +/- 40.3     101.0		26.4	       15.7
Write	   0.0 +/- 0.1        1.0		 100	       99.2
Read	   0.0 +/- 0.1        1.0		 100	       99.2
Compile	  26.6 +/- 42.1     101.0		25.3	       14.8
Memload	   0.0 +/- 0.2        2.0		99.7	       98.7

--- Benchmarking simulated cpu of Gaming in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU
None	   0.1 +/- 0.1        0.3		99.9
Video	   0.1 +/- 0.1        0.7		99.9
X	   0.1 +/- 0.1        0.1		99.9
Burn	  21.0 +/- 44.1     113.0		82.7
Write	   0.1 +/- 0.1        0.3		99.9
Read	   0.1 +/- 0.1        0.7		99.9
Compile	  38.4 +/- 60.9     111.9		72.2
Memload	   0.8 +/- 0.8        1.0		99.2

linux-ck

--- Benchmarking simulated cpu of X in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	   0.0 +/- 0.1        1.0		 100	       99.2
Video	   0.0 +/- 0.1        1.0		 100	       99.2
Burn	   5.8 +/- 19.5      98.0		46.1	       42.5
Write	   0.0 +/- 0.1        1.0		 100	       99.2
Read	   0.0 +/- 0.1        1.0		 100	       99.2
Compile	   4.7 +/- 11.5      58.0		59.1	       50.3
Memload	   0.0 +/- 0.2        2.0		98.7	       97.8

--- Benchmarking simulated cpu of Gaming in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU
None	   0.2 +/- 0.2        0.2		99.8
Video	   0.2 +/- 0.2        0.2		99.8
X	   0.2 +/- 0.2        0.2		99.8
Burn	  28.8 +/- 49.6     102.2		77.6
Write	   0.3 +/- 0.3        0.8		99.7
Read	   0.4 +/- 0.4        1.3		99.6
Compile	  27.6 +/- 29.5      68.8		78.4
Memload	   1.3 +/- 1.4        7.0		98.8

linux-lqx

--- Benchmarking simulated cpu of X in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU  % Deadlines Met
None	   0.0 +/- 0.1        1.0		 100	       99.1
Video	   0.0 +/- 0.1        1.0		 100	       99.1
Burn	   3.2 +/- 14.8     100.0		49.5	       46.8
Write	   0.0 +/- 0.1        1.0		 100	       99.1
Read	   0.0 +/- 0.1        1.0		 100	       99.1
Compile	   5.1 +/- 12.4      87.0		56.1	       47.3
Memload	   0.0 +/- 0.2        2.0		98.5	         98

--- Benchmarking simulated cpu of Gaming in the presence of simulated ---
Load	Latency +/- SD (ms)  Max Latency   % Desired CPU
None	   0.1 +/- 0.1        0.2		99.9
Video	   0.1 +/- 0.1        0.2		99.9
X	   0.1 +/- 0.1        0.2		99.9
Burn	  23.4 +/- 44.9     102.2		  81
Write	   0.2 +/- 0.3        1.1		99.8
Read	   0.3 +/- 0.4        0.8		99.7
Compile	  28.2 +/- 30.7      84.4		  78
Memload	   1.2 +/- 1.3        1.9		98.8

I have BFQ compiled into the liquorix kernel but I've been using CFQ recently because BFQ borked large file transfers onto any jfs filesystem. Maybe that bug is gone. I don't know. Yes BFQ is nice in my experience, but the above tests all use CFQ as do the Phoronix benchmarks I published.

Uhhh... I was hoping nobody would ask for pf benchmarks. I'm sorry I don't have numbers. I used linux-pf on and off with the 2.6.38 kernels and convinced myself that ck was better, although all kernels with the BFS patch are very similar. IMO there are a few tweeks in the liquorix kernel that give it the (slight) edge in both performance and responsiveness.

Grinch · 2011-11-23 17:23:55

Well, I'm more (happily) surprised when -ck and -lqx kernels are beating the stock kernel given that the BFS scheduler is optimized for responsiveness. The stock kernel being less tuned towards responsiveness 'should' logically have better throughput, right?.

sitquietly · 2011-11-23 20:17:08

Grinch wrote:

Well, I'm more (happily) surprised when -ck and -lqx kernels are beating the stock kernel given that the BFS scheduler is optimized for responsiveness. The stock kernel being less tuned towards responsiveness 'should' logically have better throughput, right?.

I've concluded that it is indeed true that the stock kernel gives better throughput on some database type transactions where the BFS scheduler "fails" to give preference to the i/o threads that would, if allowed, use up more of the cpu (but degrade responsiveness). Hence the stock kernel IS faster on the sqlite benchmark. For a server use the default, well-tested and highly-optimized CFS. For a desktop system I much prefer kernels with the Brain Fucked Scheduler, it does give lower latencies and fewer stalls and glitches in audio and video even when the cpu is heavily loaded doing work like my frequent big compile jobs.

Heck, I used to write "operating systems" for embedded control systems. I got incredible performance using the simplest "scheduler" of all -- none. We used round-robin scheduling, which you can only do well if one team is writing ALL of the software that will run on the system. Every task ran with complete control of the cpu until it did i/o or voluntarily relinquished the cpu. Our rule of thumb was that code we wrote should never hold the cpu for more than 1 ms. We got hundreds of real-time tasks running big industrial machinery with dozens of motors under software control, hundreds of i/o points being monitored and controlled, and two operator interfaces. On a i386 cpu! "Low latency" was a very hard requirement: failing to respond to a switch within 100ms could crash a 5 ton rapidly rotating granite grinding wheel into a 50 ton hard roll from a steel rolling mill. The best way to get low latency was not to try to complicate the scheduler by letting it waste thousands of cpu cycles sorting through "priorities," but to have no scheduler at all and to invest all cpu cycles into getting work done. SO I have a strong preference for simple schedulers and well-written software, neither of which we have in Linux, but I think BFS is a move in the right direction.

Arch Linux

#1 2011-11-16 18:44:37

Benchmarked stock kernel vs ck and liquorix - why does sqlite degrade?

#2 2011-11-23 10:53:25

Re: Benchmarked stock kernel vs ck and liquorix - why does sqlite degrade?

#3 2011-11-23 15:56:42

Re: Benchmarked stock kernel vs ck and liquorix - why does sqlite degrade?

#4 2011-11-23 17:23:55

Re: Benchmarked stock kernel vs ck and liquorix - why does sqlite degrade?

#5 2011-11-23 20:17:08

Re: Benchmarked stock kernel vs ck and liquorix - why does sqlite degrade?

Board footer