You are not logged in.

#1 2005-12-07 13:48:15

kozaki
Member
From: London >. < Paris
Registered: 2005-06-13
Posts: 671
Website

benchmarked using nbench with different CFLAGS

nbench is an old CPU benchmark (single-threaded) that was ported to Unix/Linux by Uwe F. Mayer. It provides indexes for integer, floating, and memory performance. Description & source can be found on linux.softpedia.com

nbench comes with a configurable Makefile, so I just changed the CFLAGS and did a 'make' between each run.
Test machine is an amd 64 3200 running Arch Linux 0.7.1 with latest vanilla kernel & gcc from testing (Linux 2.6.14-ARCH, gcc 4.1.0 20051112 (experimental), libc-2.3.5.so), on enlightement 0.16 with 2 xterm opened (55 processes).

Highest results are marked bold.
Your input on those indexes would be appreciated, as well as on other benchmark & CFLAGS tips smile


· 1st test with default gcc generic CFLAGS = -s -static -Wall -O3

TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :           653.6  :      16.76  :       5.50
STRING SORT         :          125.04  :      55.87  :       8.65
BITFIELD            :      3.9123e+08  :      67.11  :      14.02
FP EMULATION        :          147.92  :      70.98  :      16.38
FOURIER             :           20521  :      23.34  :      13.11
ASSIGNMENT          :          20.908  :      79.56  :      20.64
IDEA                :          4396.5  :      67.24  :      19.96
HUFFMAN             :          1354.1  :      37.55  :      11.99
NEURAL NET          :          32.191  :      51.71  :      21.75
LU DECOMPOSITION    :          1065.1  :      55.18  :      39.84
===================ORIGINAL BYTEMARK RESULTS===================
INTEGER INDEX       : 50.989
FLOATING-POINT INDEX: 40.532
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
=======================LINUX DATA BELOW========================
MEMORY INDEX        : 13.575
INTEGER INDEX       : 12.121
FLOATING-POINT INDEX: 22.480
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38

· 2nd test with CFLAGS = -s -static -Wall -O3 -fomit-frame-pointer -funroll-loop

--------------------:------------------:-------------:------------
NUMERIC SORT        :          1028.4  :      26.37  :       8.66
STRING SORT         :          126.72  :      56.62  :       8.76
BITFIELD            :      3.5229e+08  :      60.43  :      12.62
FP EMULATION        :          142.72  :      68.48  :      15.80
FOURIER             :           20401  :      23.20  :      13.03
ASSIGNMENT          :          27.756  :     105.62  :      27.39
IDEA                :          4628.6  :      70.79  :      21.02
HUFFMAN             :          1583.1  :      43.90  :      14.02
NEURAL NET          :            37.3  :      59.92  :      25.20
LU DECOMPOSITION    :          1078.4  :      55.87  :      40.34
===================ORIGINAL BYTEMARK RESULTS===================
INTEGER INDEX       : 57.302
FLOATING-POINT INDEX: 42.665
=======================LINUX DATA BELOW========================
MEMORY INDEX        : 14.471
INTEGER INDEX       : 14.171
FLOATING-POINT INDEX: 23.664

· 3rd test with i686 optimized CFLAGS = -s -static -O3 -fomit-frame-pointer -Wall -march=i686
       -fforce-addr -fforce-mem -falign-loops=2 -falign-functions=2
       -falign-jumps=2 -funroll-loops

--------------------:------------------:-------------:------------
NUMERIC SORT        :          1098.8  :      28.18  :       9.25
STRING SORT         :             128  :      57.19  :       8.85
BITFIELD            :      3.5392e+08  :      60.71  :      12.68
FP EMULATION        :          178.68  :      85.74  :      19.78
FOURIER             :           20338  :      23.13  :      12.99
ASSIGNMENT          :          26.444  :     100.62  :      26.10
IDEA                :          4526.6  :      69.23  :      20.56
HUFFMAN             :          1625.1  :      45.06  :      14.39
NEURAL NET          :          34.543  :      55.49  :      23.34
LU DECOMPOSITION    :          1075.4  :      55.71  :      40.23
===================ORIGINAL BYTEMARK RESULTS===================
INTEGER INDEX       : 59.479
FLOATING-POINT INDEX: 41.505
=======================LINUX DATA BELOW========================
MEMORY INDEX        : 14.309
INTEGER INDEX       : 15.255
FLOATING-POINT INDEX: 23.020

· 4th test with Athlon XP optimized CFLAGS = -s -static -O3 -fomit-frame-pointer -Wall -march=athlon-xp
       -fforce-addr -fforce-mem -falign-loops=2 -falign-functions=2
       -falign-jumps=2 -funroll-loop

--------------------:------------------:-------------:------------
NUMERIC SORT        :          1056.6  :      27.10  :       8.90
STRING SORT         :          129.92  :      58.05  :       8.99
BITFIELD            :      3.6259e+08  :      62.20  :      12.99
FP EMULATION        :          266.16  :     127.72  :      29.47  <<<< +++
FOURIER             :           20409  :      23.21  :      13.04
ASSIGNMENT          :          28.463  :     108.31  :      28.09
IDEA                :          4670.4  :      71.43  :      21.21
HUFFMAN             :          1529.6  :      42.42  :      13.54
NEURAL NET          :          34.545  :      55.49  :      23.34
LU DECOMPOSITION    :          1071.2  :      55.49  :      40.07
===================ORIGINAL BYTEMARK RESULTS===================
INTEGER INDEX       : 63.363
FLOATING-POINT INDEX: 41.500
=======================LINUX DATA BELOW========================
MEMORY INDEX        : 14.857
INTEGER INDEX       : 16.567
FLOATING-POINT INDEX: 23.0

[/list]


Seeded last month: Arch 50 gig, derivatives 1 gig
Desktop @3.3GHz 8 gig RAM, linux-ck
laptop #1 Atom 2 gig RAM, Arch linux stock i686 (6H w/ 6yrs old battery smile) #2: ARM Tegra K1, 4 gig RAM, ChrOS
Atom Z520 2 gig RAM, OMV (Debian 7) kernel 3.16 bpo on SDHC | PGP Key: 0xFF0157D9

Offline

#2 2005-12-07 20:48:26

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: benchmarked using nbench with different CFLAGS

Why all -O3? -O3 produces results inferior to -O2 in a lot of instances, IIRC, due to the increase in the size of binaries (or perhaps some other factors). And -O3 is quite unstable on some systems, not surprisingly.

Offline

#3 2005-12-07 21:00:58

phrakture
Arch Overlord
From: behind you
Registered: 2003-10-29
Posts: 7,879
Website

Re: benchmarked using nbench with different CFLAGS

Also, "-static" is a bit misleading, because, while static compilation is best for speed, it's just not practical.  Try it without static linkage

Offline

#4 2005-12-07 21:39:56

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: benchmarked using nbench with different CFLAGS

Oogh, missed that. Yep, that's definitely not realistic. :shock:

Offline

#5 2005-12-07 22:28:35

kozaki
Member
From: London >. < Paris
Registered: 2005-06-13
Posts: 671
Website

Re: benchmarked using nbench with different CFLAGS

Guess what says the Makefile:

# Makefile for nbench, December 11, 1997, Uwe F. Mayer <mayer@tux.org>
# Updated February 18, 2003

default: nbench

##########################################################################
#   If you are using gcc-2.7.2.3 or earlier:
#   The optimizer of gcc has a bug and in general you should not specify
#   -funroll-loops together with -O (or -O2, -O3, etc.)
#   This bug is supposed to be fixed with release 2.8 of gcc.
#
#   This bug does NOT seem to have an effect on the correct compilation
#   of this benchmark suite on my Linux box. However, it leads to
#   the dreaded "internal compiler error" message on our alpha
#   running DEC Unix 4.0b. The Linux-binary that was used to obtain
#   the baseline results was nevertheless compiled with
#   CFLAGS = -s -static -Wall -O3 -fomit-frame-pointer -funroll-loops
#
# You should leave -static in the CFLAGS so that your sysinfo can be
# compiled into the executable.

What I first try is just to follow those FLAGS provided in to the old Makefile.
I'll give a try to new one ASAP wink


Seeded last month: Arch 50 gig, derivatives 1 gig
Desktop @3.3GHz 8 gig RAM, linux-ck
laptop #1 Atom 2 gig RAM, Arch linux stock i686 (6H w/ 6yrs old battery smile) #2: ARM Tegra K1, 4 gig RAM, ChrOS
Atom Z520 2 gig RAM, OMV (Debian 7) kernel 3.16 bpo on SDHC | PGP Key: 0xFF0157D9

Offline

#6 2005-12-07 23:07:53

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: benchmarked using nbench with different CFLAGS

But that's old, old stuff... We use GCC 4!

Oh, and from Arch's makepkg.conf:

export CFLAGS="-march=i686 -O2 -pipe"
export CXXFLAGS="-march=i686 -O2 -pipe"

IIRC, -O2 does include -fomit-frame-pointer as of GCC 3.4.x and later, since it no longer breaks debugging. Still, that's not -O3, and there's no mention of -static anywhere.

[Edited for spelling error.]

Offline

#7 2005-12-07 23:54:54

kozaki
Member
From: London >. < Paris
Registered: 2005-06-13
Posts: 671
Website

Re: benchmarked using nbench with different CFLAGS

Well Gullible Jones, I'm followinf this old benchmark, because I used it on many computers before, as well because I've got no clue yet about CFLAGS

See what happens with CFLAGS = -s -O2 -Wall -march=athlon-xp
         -fforce-addr -falign-loops=2 -falign-functions=2
         -falign-jumps=2 -funroll-loop
: Floating-point index keep high but Memory Integer drop far less than with default nbench Athlon XP CFLAGS.
Maybe did i messed with the last CFLAGS ? I'll try with those in Arch's /etc/makepkg.conf later.

--------------------:------------------:-------------:------------
NUMERIC SORT        :          999.52  :      25.63  :       8.42
STRING SORT         :          119.36  :      53.33  :       8.26
BITFIELD            :      2.5154e+08  :      43.15  :       9.01
FP EMULATION        :          147.08  :      70.58  :      16.29
FOURIER             :           19717  :      22.42  :      12.59
ASSIGNMENT          :           23.19  :      88.24  :      22.89
IDEA                :          3204.5  :      49.01  :      14.55
HUFFMAN             :          1538.8  :      42.67  :      13.63
NEURAL NET          :          35.797  :      57.51  :      24.19
LU DECOMPOSITION    :          1072.2  :      55.55  :      40.11
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 49.881
FLOATING-POINT INDEX: 41.528
==============================LINUX DATA BELOW===============================
libc                :
MEMORY INDEX        : 11.942
INTEGER INDEX       : 12.840
FLOATING-POINT INDEX: 23.033


Seeded last month: Arch 50 gig, derivatives 1 gig
Desktop @3.3GHz 8 gig RAM, linux-ck
laptop #1 Atom 2 gig RAM, Arch linux stock i686 (6H w/ 6yrs old battery smile) #2: ARM Tegra K1, 4 gig RAM, ChrOS
Atom Z520 2 gig RAM, OMV (Debian 7) kernel 3.16 bpo on SDHC | PGP Key: 0xFF0157D9

Offline

#8 2005-12-08 12:57:30

MNKyDeth
Member
From: MI
Registered: 2003-09-13
Posts: 89

Re: benchmarked using nbench with different CFLAGS

Slap k8 in there for your opt. march=k8  as you have a k8 cpu and since your chost specifies i686 from the builders of Arch all packages then will be compiled in 32bit opted for k8. No sense on staying at athlon-xp opt if your gonna explore all the ones your cpu handles.

Offline

#9 2005-12-08 14:47:38

kozaki
Member
From: London >. < Paris
Registered: 2005-06-13
Posts: 671
Website

Re: benchmarked using nbench with different CFLAGS

Well as long as my real understanding of gcc stuff such as CFLAGS lies in no more than 10 words, I'll keep playing with bench tools (or user only related stuff) that I won't break nothing with lol

I eventually would like to understand some of where come the differences between the tests, like why -march=k8 show indexes lower than generic old gcc CFLAGS :?:

Or is that bench way too old to show actual effect of CFLAGS when compiling an app ?
Or maybe does the results below only show the impact of the Linux-binary that was used when they built/tested this app (the baseline results was nevertheless compiled with CFLAGS = -s -static -Wall -O3 -fomit-frame-pointer -funroll-loops) ?

Some more indexes :

CFLAGS = -O2 -march=k8 -pipe show the poorest indexes, i.e. lower than with generic  -s -static -Wall -O3

==================ORIGINAL BYTEMARK RESULTS==================
INTEGER INDEX       : 46.807
FLOATING-POINT INDEX: 39.478
======================LINUX DATA BELOW=======================
MEMORY INDEX        : 11.828
INTEGER INDEX       : 11.571

FLOATING-POINT INDEX: 21.89

Defaults (but -march=k8 ) CFLAGS = -s -static -O3 -fomit-frame-pointer -Wall -march=k8
       -fforce-addr -falign-loops=2 -falign-functions=2
       -falign-jumps=2 -funroll-loop
show no improvment on athlon-xp :

==================ORIGINAL BYTEMARK RESULTS==================
INTEGER INDEX       : 63.346
FLOATING-POINT INDEX: 41.282
======================LINUX DATA BELOW=======================
MEMORY INDEX        : 14.592
INTEGER INDEX       : 16.784
FLOATING-POINT INDEX: 22.89


Seeded last month: Arch 50 gig, derivatives 1 gig
Desktop @3.3GHz 8 gig RAM, linux-ck
laptop #1 Atom 2 gig RAM, Arch linux stock i686 (6H w/ 6yrs old battery smile) #2: ARM Tegra K1, 4 gig RAM, ChrOS
Atom Z520 2 gig RAM, OMV (Debian 7) kernel 3.16 bpo on SDHC | PGP Key: 0xFF0157D9

Offline

#10 2005-12-08 20:50:33

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: benchmarked using nbench with different CFLAGS

A bit of advice: don't overoptimize. Just optimize for the basic architecture (i686 on a P4, K8 on a Athlon64, etc.), and use -O2  or -Os for code optimization.

(IIRC, you should NOT use -Os on a K8, but that might have been fixed - not sure, I don't have a K8 machine.)

Offline

#11 2005-12-08 21:19:53

kozaki
Member
From: London >. < Paris
Registered: 2005-06-13
Posts: 671
Website

Re: benchmarked using nbench with different CFLAGS

Gullible Jones > I got that for sure, & won't run makemyworld every week on my main production box lol
Nevertheless I'd like to understand the basics, so I'll use some unharming (not sure how to spell that one) benchs or single app compilation (video relative like mplayer) with different FLAGS to get a bit into it.
I feel I'll do, when I look at the huge difference for Integer & Memory indexes between the 2 gcc CFLAGS


Seeded last month: Arch 50 gig, derivatives 1 gig
Desktop @3.3GHz 8 gig RAM, linux-ck
laptop #1 Atom 2 gig RAM, Arch linux stock i686 (6H w/ 6yrs old battery smile) #2: ARM Tegra K1, 4 gig RAM, ChrOS
Atom Z520 2 gig RAM, OMV (Debian 7) kernel 3.16 bpo on SDHC | PGP Key: 0xFF0157D9

Offline

#12 2005-12-08 21:41:57

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: benchmarked using nbench with different CFLAGS

Why do you keep saying '-s -Wall -static -O3' is generic? -s (and also -Wall IIRC) is generic, but -static and -O3 are definitely not defaults!

Optimize your system if you want, but please understand that the realworld performance enhancement will be completely invisible, and that Arch's binary packages will never be -O3. -O3 may provide a slight performance boost under some circumstances (which I've certainly never seen), and give better benchmarks, but it also makes things less stable - and bloats up the binaries, cutting HDD performance.

Offline

#13 2005-12-09 00:53:37

kozaki
Member
From: London >. < Paris
Registered: 2005-06-13
Posts: 671
Website

Re: benchmarked using nbench with different CFLAGS

Gullible Jones > from the right beginning I spoke about optimizing a benchmark/app

kozaki wrote:

nbench is an old CPU benchmark (...)

but from the beginning you're speaking about optimizing the whole system, why is that :?:

I guess the developper of nbench thinks that those are generic options for gcc --or were they (from nbench's Makefile):

"# generic options for gcc
CFLAGS = -s -static -Wall -O3"

As for the indexes generated by nbench, they are better with Athlon-xp / march=k8, while other CFLAGS were 10-to-35% lower.


Seeded last month: Arch 50 gig, derivatives 1 gig
Desktop @3.3GHz 8 gig RAM, linux-ck
laptop #1 Atom 2 gig RAM, Arch linux stock i686 (6H w/ 6yrs old battery smile) #2: ARM Tegra K1, 4 gig RAM, ChrOS
Atom Z520 2 gig RAM, OMV (Debian 7) kernel 3.16 bpo on SDHC | PGP Key: 0xFF0157D9

Offline

#14 2005-12-09 01:51:46

Gullible Jones
Member
Registered: 2004-12-29
Posts: 4,863

Re: benchmarked using nbench with different CFLAGS

Oh... D'oh.

(Question: if the benchmark app is the only one you're optimizing, and the benchmarks are on itself, what the heck does this prove? :? )

Offline

#15 2005-12-09 16:04:57

kozaki
Member
From: London >. < Paris
Registered: 2005-06-13
Posts: 671
Website

Re: benchmarked using nbench with different CFLAGS

'cause noob always ask silly questions. Don't you know that yet wink

Possibly more seriously, it does prove nothing (& never tried to). Goal is : keeping record of CFLAGS giving best indexes when compiling & testing single apps like nbench. Then give a try with the best of the on some *real* app that should care, like those audio & video transcoding apps.

Wo'oo, am i really crap to try this way ?


Seeded last month: Arch 50 gig, derivatives 1 gig
Desktop @3.3GHz 8 gig RAM, linux-ck
laptop #1 Atom 2 gig RAM, Arch linux stock i686 (6H w/ 6yrs old battery smile) #2: ARM Tegra K1, 4 gig RAM, ChrOS
Atom Z520 2 gig RAM, OMV (Debian 7) kernel 3.16 bpo on SDHC | PGP Key: 0xFF0157D9

Offline

Board footer

Powered by FluxBB