You are not logged in.

#51 2011-12-01 12:07:22

Vamp898
Member
From: 東京
Registered: 2009-01-03
Posts: 891
Website

Re: Do you have to leave Archlinux for the AMD bulldozer

Here are the Results from my tests, less is better: http://www.ignaz.org/bench/bulldozeri7.png

btw. the Project have now a Homepage: http://www.ignaz.org/bench im working on a own Domain

Last edited by Vamp898 (2011-12-01 17:49:13)

Offline

#52 2011-12-01 18:27:20

korkadapa
Member
Registered: 2008-08-27
Posts: 32

Re: Do you have to leave Archlinux for the AMD bulldozer

Vamp898 wrote:

Here are the Results from my tests, less is better: http://www.ignaz.org/bench/bulldozeri7.png

btw. the Project have now a Homepage: http://www.ignaz.org/bench im working on a own Domain

Did you compile them all with the settings in compile.sh? How much faster would the bulldozer be with the bdver1 option?

Offline

#53 2011-12-01 19:51:29

Vamp898
Member
From: 東京
Registered: 2009-01-03
Posts: 891
Website

Re: Do you have to leave Archlinux for the AMD bulldozer

Im writing on Version 0.9.0 which have a completely new algorythm which is a bit more complex and does a lot more that the optimisation of the prozessor (-march=native) have more effect.

The current Development Versions >=0.8.90 already have this algorythm and i used it to compare -march=bdver1 to -march=x86-64 (generic)

http://www.ignaz.org/bench/tibs_dev.png

you can get the Dev-Version from the Homepage http://www.ignaz.org/bench

Offline

#54 2011-12-02 00:46:16

Grinch
Member
Registered: 2010-11-07
Posts: 265

Re: Do you have to leave Archlinux for the AMD bulldozer

I like where you are going with this Vamp898, however I think there are some problems with your benchmark, I downloaded the dev version and saw that you set -O0 (which is NO optimization) which seemed kind of weird. However looking at the benchmarks I saw that you used constants for the values to be calculated in the threads which means the compiler can figure out the end result at compile-time and simply optimize away the calculations. And this is what happened when I compiled it with GCC on -O2 and -O3, also ICC did this aswell, Clang/LLVM did not. Anyway, in order to prevent the compilers to optimize away the calculations and thus allow you to benchmark with higher optimization levels you should generate the NUM, NUMM values at runtime instead of defining them as constants.

edit: actually looking at the code I see you are calculating the constants against an uninitialized float array which means that the reason the compilers (GCC, ICC) optimized the entire loop of calculations away is because you don't do anything with the result of the calculations done on the stack array. One simple way to prevent the compiler from optimizing away the whole calculation would be to store one of the bla[] floats in a global variable at the end of the calc thread.

Last edited by Grinch (2011-12-02 01:13:23)

Offline

#55 2011-12-02 11:36:00

Vamp898
Member
From: 東京
Registered: 2009-01-03
Posts: 891
Website

Re: Do you have to leave Archlinux for the AMD bulldozer

Grinch wrote:

I like where you are going with this Vamp898, however I think there are some problems with your benchmark, I downloaded the dev version and saw that you set -O0 (which is NO optimization) which seemed kind of weird. However looking at the benchmarks I saw that you used constants for the values to be calculated in the threads which means the compiler can figure out the end result at compile-time and simply optimize away the calculations. And this is what happened when I compiled it with GCC on -O2 and -O3, also ICC did this aswell, Clang/LLVM did not. Anyway, in order to prevent the compilers to optimize away the calculations and thus allow you to benchmark with higher optimization levels you should generate the NUM, NUMM values at runtime instead of defining them as constants.

edit: actually looking at the code I see you are calculating the constants against an uninitialized float array which means that the reason the compilers (GCC, ICC) optimized the entire loop of calculations away is because you don't do anything with the result of the calculations done on the stack array. One simple way to prevent the compiler from optimizing away the whole calculation would be to store one of the bla[] floats in a global variable at the end of the calc thread.

I use -march=native on the new Code which enables Architecture optimisation which is a huge difference to -march=x86-64

But anyway to put the input of bla in a global variable doesnt help _that_ much beceause i reset it to 1 after every iteration.

so the 999999 * 5 iterations would only run 1-time and than he would put the result to the global var.

I think -O0 is the best way to do this benchmarks, other benchmarks like Hardinfo also highly-recommend to use -O0

Processor optimisation is used in the >=0.8.90 cycle but i think >=-O1 would not be helpfull because the code would be optimized. I dont want that the code gets optimised, the processour should do stupid exactly what i wrote.

And i think its quite a good way to test processor only performance.

btw. i uplaoded the results of the i7<-->AMD quest with the >=0.8.90 Version. Quite interesting results: http://www.ignaz.org/bench/bulldozeri7_2.png

Last edited by Vamp898 (2011-12-02 12:06:07)

Offline

#56 2011-12-03 00:03:33

Grinch
Member
Registered: 2010-11-07
Posts: 265

Re: Do you have to leave Archlinux for the AMD bulldozer

Vamp898 wrote:

But anyway to put the input of bla in a global variable doesnt help _that_ much beceause i reset it to 1 after every iteration.

so the 999999 * 5 iterations would only run 1-time and than he would put the result to the global var.

Yes probably, but you could just avoid reseting it.

Vamp898 wrote:

I think -O0 is the best way to do this benchmarks, other benchmarks like Hardinfo also highly-recommend to use -O0

I don't really know anything about bechmarking cpu vs cpu, I've only benchmarked compiler vs compiler. However I believe some of the optimizations which affect code generation for a certain cpu's particular characteristics are only enabled at higher optimization levels. I could very well be totally wrong though, as I said I have no experience with this type of benchmarking.

Offline

#57 2011-12-03 00:43:27

Vamp898
Member
From: 東京
Registered: 2009-01-03
Posts: 891
Website

Re: Do you have to leave Archlinux for the AMD bulldozer

If i avoid resetting it will give a lot of overflows and unexpected results and maybe cause an calculation which is not valid or something like that. To prevent this i used the resetting. Because i dont need the result i dont wanted to take this risk.

You can use the -S switch to give the ASM Code which GCC provides and as far as i seen the -O only affects the code itself and doesnt seem to have influence how optimized it is on the processor.

But i could also be wrong. But its not a final 1.0 release and a lot can happen smile im lucky about every new thing which gets discovered smile

on Compiler Benchmarks the -O is one of the most important thing wink and it is just awesome how "clever" GCC is.

If you write a Benchmark and the Compiler optimises, you have to trick out the compiler. But thats no that easy as it looks because, as i said, GCC is just awesome. He will always find something you didn´t thought about and BABAM the whole benchmark is only a piece of crap xD

Last edited by Vamp898 (2011-12-03 00:44:52)

Offline

#58 2012-01-03 14:17:14

korkadapa
Member
Registered: 2008-08-27
Posts: 32

Re: Do you have to leave Archlinux for the AMD bulldozer

Vamp898 wrote:

Kernel: arch/x86/boot/bzImage is ready  (#9)

real    0m56.753s

but i used make -j512 now to compile it xDD but it really doesnt seem to be faster than with -j12 or -j16, but if you persist on the jobs i will re-compile it with -j12

I just bought a Bulldozer for myself, and I'm curious about what frequency you ran the CPU at when you did this test? Standard clock? Turbo Core enabled?

At 4.2 GHz (which should be the maximum turbo core frequency of a fx-8150) I get 1m23s. This is on a fx-8120 though, but the only difference should be the frequency.

Do you use GCC from the repos or something you compiled yourself with optimizations?

How fast is your memory? I've got 16GB DDR3-1600 so I actually doubt that it's my bottleneck.

Did you compile it on an SSD or HDD? I used and Intel 320 80GB SSD.

That was a lot of questions, but I just can't seem to reproduce your results, unless you run it at a higher frequensy than stock.

Offline

Board footer

Powered by FluxBB