You are not logged in.
- Tested app: Dillo 0.8.4 (413,2kb tar.bz2). Language: C
- Computer:Pentium III 846MHz, 256 KB Cache, Kernel 2.6.11, GCC 3.4.3 march=i686 (arch linux)
size in kb of Arch Packages generated by makepkg:
-O1 -march=pentium3 -s -mfpmath=sse -fomit-frame-pointer -pipe 189,4
-O2 -march=pentium3 -s -mfpmath=sse -fomit-frame-pointer -pipe 211,3kb
-O3 -march=pentium3 -s -mfpmath=sse -fomit-frame-pointer -pipe 241,7
-Os -march=pentium3 -s -mfpmath=sse -fomit-frame-pointer -pipe 183,6
-Os -march=pentium3 -s -fomit-frame-pointer -pipe 183,6
-Os -march=pentium3 -s -mfpmath=sse -pipe 185,0
-Os -march=pentium3 -mfpmath=sse -fomit-frame-pointer -pipe 189,4
-Os -march=i686 -s -mfpmath=sse -fomit-frame-pointer -pipe 183,6
-Os -march=i586 -s -fomit-frame-pointer -pipe 183,0
-Os -march=i386 -s -fomit-frame-pointer -pipe 182,4
-Os -mtune=i686 -s -fomit-frame-pointer -pipe 183,5
-Os -mtune=i586 -s -fomit-frame-pointer -pipe 183,0
Now we have to check the execution time of some of those pkgs:
(foo.sh):
echo "Start: "
date +%s%n%N
dillo http://localhost/html/a.php
Where http://localhost/html/a.php is a page to open. The page contains:
<?php
echo microtime();
?>
- Launch Dillo with ./foo.sh
Time from the PHP - Time from the console =~ execution time.
Results (graphs)
http://www.linuks.rk.edu.pl/binary.php?id=74
http://www.linuks.rk.edu.pl/binary.php?id=75
http://www.linuks.rk.edu.pl/binary.php?id=76
"Baza": -Os -march=pentium3 -s -mfpmath=sse -fomit-frame-pointer -pipe
"Bez": without a * Flag
NOTES:
- -mfpmath=sse is a big +++ for pentium3 and never processors (those witch sse cpu flag) expecialy if the application makes a lot of matematical operations (Dillo isn't a calculator but the effect is visible)
- -fomit-frame-pointer also affects execution time
- if march or mtune is set to your processor family then it will generate code very similar to that generated by march=my_cpu. i686 Arch for P3 is ok, but i586 SuSE less.
- Os is the best size/speed optimalisation flag, unless files are very big and contain a lot of code (then check O2)
Offline
Does using the --mfpmath=sse flag cause the code to run correctly only on cpus that have sse support? If not, then why isn't that used by default?
Offline
sse only works if your proc supports sse - some i686s don't so this can't be done for all of arch
this sounds like a gentoo post to me... a gain of 0.05 seconds and 2KB of space isn't warranted in my book
Offline
SSE is already used when -march=pentium3 is set.
Anyway, it's not really a Gentoo post, if you ignore the notes. Why? Because there are benchmark results, telling us nicely how much sense some settings make in this case. It would be if he was demanding specific cpu model optimized binaries or other nonsense, but he doesn't.
Offline
fetching performance results for binary builds by using http requests from a php instance is silly. Not to mention that rendering still occurs after php gets its microtime value.
Also, system load at the time, as well as if the tests were run consecutively, or after a clean boot, would all be determining factors.
I agree with phrak. Further, I could understand doing a web benchmark against different http servers, or benchmarking execution time of a statically compiled binary using different compiler optimizations, but using a web server to test compiler optimizations seems...very strange.
Also, anything above 02 is silly most of the time.
*shrug*
oh. did I say silly in this post yet? just...silly..
"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍
Offline
SSE is already used when -march=pentium3 is set.
yeah, even though I think the performance increase may be nice, it's not going to be noticable... and then you'll start losing support for some processors. Yes, you'd start phasing out older processors, but then where does it stop? would the next step be MMX, then SSE2, then maybe we only allow AGP video cards (don't ask...)? I think it's important that Arch is baselined at i686. Once the "additional" repos become a big thing, then it might be worthwhile to make an "sse2" repo or something... *Shrug*
Offline
-mfpmath is not a good idea, even if you have sse support.
Use -msse instead. And if you have a pIII just use -march=pentium3.
-O3 is not a good idea for big compilations(kde)
-02 is the best compromise
-01 is stupid
-0s can be useful on very slow computers
Generally, avoid using exotic flags in gcc. Especially with 3.4. And also know that some programs(Openoffice) dont compile with gcc 3.3.x.
Offline
-omgod-optimized
Offline
good idea for the testing ...
as a web browser is not doing much computing, i would suggest that you try to do your test with some apps that need computing ... i'll prepare a test-case ...
The impossible missions are the only ones which succeed.
Offline
ok, here something to try:
in this file:
http://daperi.home.solnet.ch/uni/bk/mb/ … pled.fasta
there are 2 nucleic acid sequences (DNA) that can be alligned with muscle or clustalw (both available in extra)
whereas clustalw cannot be compiled with optimisation flags (at least not with gcc 3.4.x), muscle i compiled with standard arch flags for extra (see PKGBUILD here)
you can change the PKGBUILD and try better (=more optimised) ones, if you like
i measured it with "time" and here the example (on 2ghz pentium4 with arch flags):
[damir@Asteraceae mb]$ time muscle -in cc3285_and_cc1842_crippled.fasta -out out.fasta
MUSCLE v3.52 by Robert C. Edgar
http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.
cc3285_and_cc1842_crippled 2 seqs, max length 7070, avg length 7070
00:00:00 11 MB(1%) Iter 1 100.00% K-mer dist pass 1
00:00:00 11 MB(1%) Iter 1 100.00% K-mer dist pass 2
00:00:03 234 MB(30%) Iter 1 100.00% Align node
00:00:03 234 MB(30%) Iter 1 100.00% Root alignment
real 0m3.714s
user 0m2.927s
sys 0m0.536s
NOTE: as you can see, muscle needs a LOT of RAM! (234mb for this short piece of genes) ... it is also a good testing case if your swap works and if your kernel can handle it correctly (btw: that's why i use kernel26mm) - if you have less than 230mb free ram for this experiment, simply open this example fasta-file in an editor and shorten both sequences in similar ways - the original genes (in full version) are on my site under uni/bk/mb/cc3285_and_cc1842.fasta (DONT try to allign them with muscle - it will need more ram that you have (i have 768mb and it is not able to finish and stops))
The impossible missions are the only ones which succeed.
Offline
ummm... dp sometimes your knowledge scares me
Offline
that benchmark would only capture a limited performance benchmark.
For a good test, you would want something that tested floating point arithmatic time, integer arithmatic time, etc..
Something like this: http://shootout.alioth.debian.org/great … rt=fullcpu
but with performance comparisons, filesize output differences, and other information based upon different compiler flags instead of different language implementations, would be very cool. Their apps for C might be useful, as they are meant to cover a wide range of things.
"Be conservative in what you send; be liberal in what you accept." -- Postel's Law
"tacos" -- Cactus' Law
"t̥͍͎̪̪͗a̴̻̩͈͚ͨc̠o̩̙͈ͫͅs͙͎̙͊ ͔͇̫̜t͎̳̀a̜̞̗ͩc̗͍͚o̲̯̿s̖̣̤̙͌ ̖̜̈ț̰̫͓ạ̪͖̳c̲͎͕̰̯̃̈o͉ͅs̪ͪ ̜̻̖̜͕" -- -̖͚̫̙̓-̺̠͇ͤ̃ ̜̪̜ͯZ͔̗̭̞ͪA̝͈̙͖̩L͉̠̺͓G̙̞̦͖O̳̗͍
Offline
Enabling sse by default is silly indeed. Much better to focus on linker settings like -Wl,--as-needed and -Wl,-O1.
Offline
phrakture: is this ironically meant? i study biology, so it's not really unusual to know how to use clustalw ... and muscle is a really cool app for short sequences but for longer ones, it is not more than a memory-benchmark ;-)
cactus: thanx for the link
i3839: exactly! (the --as-needed i still don't know how it works in detail but what i read it makessense)
The impossible missions are the only ones which succeed.
Offline
just found out that i use the right fasta:
http://shootout.alioth.debian.org/great … rt=fullcpu
The impossible missions are the only ones which succeed.
Offline
(the --as-needed i still don't know how it works in detail but what i read it makessense)
It is very simple, really. Normally the app is linked to all libs you tell it to link to, wether it is needed or not. With --as-needed is only links to the libs which are really required.
Offline
dp wrote:(the --as-needed i still don't know how it works in detail but what i read it makessense)
It is very simple, really. Normally the app is linked to all libs you tell it to link to, wether it is needed or not. With --as-needed is only links to the libs which are really required.
thank you - that's exactly what i had meant, but my confusion is that this is not the normal behaviour (in my eyes, it would be more logical to have this behaviour as default and a flag to set if you need superficial libs to link)
... the question i ask myself: what good is it to have somethin glinked if it is not used in any case
The impossible missions are the only ones which succeed.
Offline
Yes, I had that struggle too: Why oh why isn't it the default behaviour? I guess because you told ld to link to something, so it wouldn't be polite to ignore that request. Though in practice it's probably good to have it enables as most people don't know and use ld options. Thus it would make sense and be good if gcc would pass those obviously good options to ld by default.
Offline