[SOLVED] gcc, -march=native VS -march=broadwell

nTia89 · 2016-03-29 20:07:23

Hi folks,
today I've read our wiki about makepkg.
I decided to try this command

gcc -march=native -v -Q --help=target

, as described in the wiki.
Then, I compared the output of that command with the one from

gcc -march=broadwell -v -Q --help=target

(broadwell is my CPU, as gcc reports...).

Surprisingly, the two outputs are different!
diff native.txt broadwell.txt command returns:

12c12
<   -mabm                       		[enabled]
---
>   -mabm                       		[disabled]
15,16c15,16
<   -madx                       		[enabled]
<   -maes                       		[enabled]
---
>   -madx                       		[disabled]
>   -maes                       		[disabled]
26,27c26,27
<   -mavx                       		[enabled]
<   -mavx2                      		[enabled]
---
>   -mavx                       		[disabled]
>   -mavx2                      		[disabled]
40,41c40,41
<   -mbmi                       		[enabled]
<   -mbmi2                      		[enabled]
---
>   -mbmi                       		[disabled]
>   -mbmi2                      		[disabled]
49c49
<   -mcx16                      		[enabled]
---
>   -mcx16                      		[disabled]
52c52
<   -mf16c                      		[enabled]
---
>   -mf16c                      		[disabled]
55c55
<   -mfma                       		[enabled]
---
>   -mfma                       		[disabled]
60c60
<   -mfsgsbase                  		[enabled]
---
>   -mfsgsbase                  		[disabled]
62c62
<   -mfxsr                      		[enabled]
---
>   -mfxsr                      		[disabled]
76c76
<   -mlzcnt                     		[enabled]
---
>   -mlzcnt                     		[disabled]
79,80c79,80
<   -mmmx                       		[enabled]
<   -mmovbe                     		[enabled]
---
>   -mmmx                       		[disabled]
>   -mmovbe                     		[disabled]
89c89
<   -mno-sse4                   		[disabled]
---
>   -mno-sse4                   		[enabled]
95c95
<   -mpclmul                    		[enabled]
---
>   -mpclmul                    		[disabled]
97c97
<   -mpopcnt                    		[enabled]
---
>   -mpopcnt                    		[disabled]
101c101
<   -mprfchw                    		[enabled]
---
>   -mprfchw                    		[disabled]
103,104c103,104
<   -mrdrnd                     		[enabled]
<   -mrdseed                    		[enabled]
---
>   -mrdrnd                     		[disabled]
>   -mrdseed                    		[disabled]
112c112
<   -msahf                      		[enabled]
---
>   -msahf                      		[disabled]
116,117c116,117
<   -msse                       		[enabled]
<   -msse2                      		[enabled]
---
>   -msse                       		[disabled]
>   -msse2                      		[disabled]
119,122c119,122
<   -msse3                      		[enabled]
<   -msse4                      		[enabled]
<   -msse4.1                    		[enabled]
<   -msse4.2                    		[enabled]
---
>   -msse3                      		[disabled]
>   -msse4                      		[disabled]
>   -msse4.1                    		[disabled]
>   -msse4.2                    		[disabled]
126c126
<   -mssse3                     		[enabled]
---
>   -mssse3                     		[disabled]
135c135
<   -mtune=                     		broadwell
---
>   -mtune=                     		
142c142
<   -mxsave                     		[enabled]
---
>   -mxsave                     		[disabled]
144c144
<   -mxsaveopt                  		[enabled]
---
>   -mxsaveopt                  		[disabled]

My question is: why the two outputs are different?
gcc reports that -march=native means that gcc will interprets at compilation time[1] the right system's processor architecture and then gcc uses it.
in a few word, -march=native on my system should be equal to -march=broadwell, hence the same output is expected to be returned.
What am I missing?
Reading a bit on the web, I've thought: can be broadwell has been created for earlier version of broadwell CPUs ?
Then I've realized that this could not be true and indeed I checked the gcc page which contains broadwell optimization[2].

further, I share complete output

native:

The following options are target specific:
  -m128bit-long-double        		[disabled]
  -m16                        		[disabled]
  -m32                        		[disabled]
  -m3dnow                     		[disabled]
  -m3dnowa                    		[disabled]
  -m64                        		[enabled]
  -m80387                     		[enabled]
  -m8bit-idiv                 		[disabled]
  -m96bit-long-double         		[enabled]
  -mabi=                      		sysv
  -mabm                       		[enabled]
  -maccumulate-outgoing-args  		[disabled]
  -maddress-mode=             		short
  -madx                       		[enabled]
  -maes                       		[enabled]
  -malign-data=               		compat
  -malign-double              		[disabled]
  -malign-functions=          		0
  -malign-jumps=              		0
  -malign-loops=              		0
  -malign-stringops           		[enabled]
  -mandroid                   		[disabled]
  -march=                     		broadwell
  -masm=                      		att
  -mavx                       		[enabled]
  -mavx2                      		[enabled]
  -mavx256-split-unaligned-load 	[disabled]
  -mavx256-split-unaligned-store 	[disabled]
  -mavx512bw                  		[disabled]
  -mavx512cd                  		[disabled]
  -mavx512dq                  		[disabled]
  -mavx512er                  		[disabled]
  -mavx512f                   		[disabled]
  -mavx512ifma                		[disabled]
  -mavx512pf                  		[disabled]
  -mavx512vbmi                		[disabled]
  -mavx512vl                  		[disabled]
  -mbionic                    		[disabled]
  -mbmi                       		[enabled]
  -mbmi2                      		[enabled]
  -mbranch-cost=              		0
  -mcld                       		[disabled]
  -mclflushopt                		[disabled]
  -mclwb                      		[disabled]
  -mcmodel=                   		32
  -mcpu=                      		
  -mcrc32                     		[disabled]
  -mcx16                      		[enabled]
  -mdispatch-scheduler        		[disabled]
  -mdump-tune-features        		[disabled]
  -mf16c                      		[enabled]
  -mfancy-math-387            		[enabled]
  -mfentry                    		[enabled]
  -mfma                       		[enabled]
  -mfma4                      		[disabled]
  -mforce-drap                		[disabled]
  -mfp-ret-in-387             		[enabled]
  -mfpmath=                   		387
  -mfsgsbase                  		[enabled]
  -mfused-madd                		
  -mfxsr                      		[enabled]
  -mglibc                     		[enabled]
  -mhard-float                		[enabled]
  -mhle                       		[disabled]
  -mieee-fp                   		[enabled]
  -mincoming-stack-boundary=  		0
  -minline-all-stringops      		[disabled]
  -minline-stringops-dynamically 	[disabled]
  -mintel-syntax              		
  -mlarge-data-threshold=     		0x10000
  -mlong-double-128           		[disabled]
  -mlong-double-64            		[disabled]
  -mlong-double-80            		[enabled]
  -mlwp                       		[disabled]
  -mlzcnt                     		[enabled]
  -mmemcpy-strategy=          		
  -mmemset-strategy=          		
  -mmmx                       		[enabled]
  -mmovbe                     		[enabled]
  -mmpx                       		[disabled]
  -mms-bitfields              		[disabled]
  -mmwaitx                    		[disabled]
  -mno-align-stringops        		[disabled]
  -mno-default                		[disabled]
  -mno-fancy-math-387         		[disabled]
  -mno-push-args              		[disabled]
  -mno-red-zone               		[disabled]
  -mno-sse4                   		[disabled]
  -mnop-mcount                		[disabled]
  -momit-leaf-frame-pointer   		[disabled]
  -mpc32                      		[disabled]
  -mpc64                      		[disabled]
  -mpc80                      		[disabled]
  -mpclmul                    		[enabled]
  -mpcommit                   		[disabled]
  -mpopcnt                    		[enabled]
  -mprefer-avx128             		[disabled]
  -mpreferred-stack-boundary= 		0
  -mprefetchwt1               		[disabled]
  -mprfchw                    		[enabled]
  -mpush-args                 		[enabled]
  -mrdrnd                     		[enabled]
  -mrdseed                    		[enabled]
  -mrecip                     		[disabled]
  -mrecip=                    		
  -mrecord-mcount             		[disabled]
  -mred-zone                  		[enabled]
  -mregparm=                  		0
  -mrtd                       		[disabled]
  -mrtm                       		[disabled]
  -msahf                      		[enabled]
  -msha                       		[disabled]
  -mskip-rax-setup            		[disabled]
  -msoft-float                		[disabled]
  -msse                       		[enabled]
  -msse2                      		[enabled]
  -msse2avx                   		[disabled]
  -msse3                      		[enabled]
  -msse4                      		[enabled]
  -msse4.1                    		[enabled]
  -msse4.2                    		[enabled]
  -msse4a                     		[disabled]
  -msse5                      		
  -msseregparm                		[disabled]
  -mssse3                     		[enabled]
  -mstack-arg-probe           		[disabled]
  -mstack-protector-guard=    		tls
  -mstackrealign              		[enabled]
  -mstringop-strategy=        		[default]
  -mtbm                       		[disabled]
  -mtls-dialect=              		gnu
  -mtls-direct-seg-refs       		[enabled]
  -mtune-ctrl=                		
  -mtune=                     		broadwell
  -muclibc                    		[disabled]
  -mveclibabi=                		[default]
  -mvect8-ret-in-mem          		[disabled]
  -mvzeroupper                		[disabled]
  -mx32                       		[disabled]
  -mxop                       		[disabled]
  -mxsave                     		[enabled]
  -mxsavec                    		[disabled]
  -mxsaveopt                  		[enabled]
  -mxsaves                    		[disabled]

  Known assembler dialects (for use with the -masm-dialect= option):
    att intel

  Known ABIs (for use with the -mabi= option):
    ms sysv

  Known code models (for use with the -mcmodel= option):
    32 kernel large medium small

  Valid arguments to -mfpmath=:
    387 387+sse 387,sse both sse sse+387 sse,387

  Known data alignment choices (for use with the -malign-data= option):
    abi cacheline compat

  Known vectorization library ABIs (for use with the -mveclibabi= option):
    acml svml

  Known address mode (for use with the -maddress-mode= option):
    long short

  Known stack protector guard (for use with the -mstack-protector-guard= option):
    global tls

  Valid arguments to -mstringop-strategy=:
    byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop
    vector_loop

  Known TLS dialects (for use with the -mtls-dialect= option):
    gnu gnu2

broadwell:

The following options are target specific:
  -m128bit-long-double        		[disabled]
  -m16                        		[disabled]
  -m32                        		[disabled]
  -m3dnow                     		[disabled]
  -m3dnowa                    		[disabled]
  -m64                        		[enabled]
  -m80387                     		[enabled]
  -m8bit-idiv                 		[disabled]
  -m96bit-long-double         		[enabled]
  -mabi=                      		sysv
  -mabm                       		[disabled]
  -maccumulate-outgoing-args  		[disabled]
  -maddress-mode=             		short
  -madx                       		[disabled]
  -maes                       		[disabled]
  -malign-data=               		compat
  -malign-double              		[disabled]
  -malign-functions=          		0
  -malign-jumps=              		0
  -malign-loops=              		0
  -malign-stringops           		[enabled]
  -mandroid                   		[disabled]
  -march=                     		broadwell
  -masm=                      		att
  -mavx                       		[disabled]
  -mavx2                      		[disabled]
  -mavx256-split-unaligned-load 	[disabled]
  -mavx256-split-unaligned-store 	[disabled]
  -mavx512bw                  		[disabled]
  -mavx512cd                  		[disabled]
  -mavx512dq                  		[disabled]
  -mavx512er                  		[disabled]
  -mavx512f                   		[disabled]
  -mavx512ifma                		[disabled]
  -mavx512pf                  		[disabled]
  -mavx512vbmi                		[disabled]
  -mavx512vl                  		[disabled]
  -mbionic                    		[disabled]
  -mbmi                       		[disabled]
  -mbmi2                      		[disabled]
  -mbranch-cost=              		0
  -mcld                       		[disabled]
  -mclflushopt                		[disabled]
  -mclwb                      		[disabled]
  -mcmodel=                   		32
  -mcpu=                      		
  -mcrc32                     		[disabled]
  -mcx16                      		[disabled]
  -mdispatch-scheduler        		[disabled]
  -mdump-tune-features        		[disabled]
  -mf16c                      		[disabled]
  -mfancy-math-387            		[enabled]
  -mfentry                    		[enabled]
  -mfma                       		[disabled]
  -mfma4                      		[disabled]
  -mforce-drap                		[disabled]
  -mfp-ret-in-387             		[enabled]
  -mfpmath=                   		387
  -mfsgsbase                  		[disabled]
  -mfused-madd                		
  -mfxsr                      		[disabled]
  -mglibc                     		[enabled]
  -mhard-float                		[enabled]
  -mhle                       		[disabled]
  -mieee-fp                   		[enabled]
  -mincoming-stack-boundary=  		0
  -minline-all-stringops      		[disabled]
  -minline-stringops-dynamically 	[disabled]
  -mintel-syntax              		
  -mlarge-data-threshold=     		0x10000
  -mlong-double-128           		[disabled]
  -mlong-double-64            		[disabled]
  -mlong-double-80            		[enabled]
  -mlwp                       		[disabled]
  -mlzcnt                     		[disabled]
  -mmemcpy-strategy=          		
  -mmemset-strategy=          		
  -mmmx                       		[disabled]
  -mmovbe                     		[disabled]
  -mmpx                       		[disabled]
  -mms-bitfields              		[disabled]
  -mmwaitx                    		[disabled]
  -mno-align-stringops        		[disabled]
  -mno-default                		[disabled]
  -mno-fancy-math-387         		[disabled]
  -mno-push-args              		[disabled]
  -mno-red-zone               		[disabled]
  -mno-sse4                   		[enabled]
  -mnop-mcount                		[disabled]
  -momit-leaf-frame-pointer   		[disabled]
  -mpc32                      		[disabled]
  -mpc64                      		[disabled]
  -mpc80                      		[disabled]
  -mpclmul                    		[disabled]
  -mpcommit                   		[disabled]
  -mpopcnt                    		[disabled]
  -mprefer-avx128             		[disabled]
  -mpreferred-stack-boundary= 		0
  -mprefetchwt1               		[disabled]
  -mprfchw                    		[disabled]
  -mpush-args                 		[enabled]
  -mrdrnd                     		[disabled]
  -mrdseed                    		[disabled]
  -mrecip                     		[disabled]
  -mrecip=                    		
  -mrecord-mcount             		[disabled]
  -mred-zone                  		[enabled]
  -mregparm=                  		0
  -mrtd                       		[disabled]
  -mrtm                       		[disabled]
  -msahf                      		[disabled]
  -msha                       		[disabled]
  -mskip-rax-setup            		[disabled]
  -msoft-float                		[disabled]
  -msse                       		[disabled]
  -msse2                      		[disabled]
  -msse2avx                   		[disabled]
  -msse3                      		[disabled]
  -msse4                      		[disabled]
  -msse4.1                    		[disabled]
  -msse4.2                    		[disabled]
  -msse4a                     		[disabled]
  -msse5                      		
  -msseregparm                		[disabled]
  -mssse3                     		[disabled]
  -mstack-arg-probe           		[disabled]
  -mstack-protector-guard=    		tls
  -mstackrealign              		[enabled]
  -mstringop-strategy=        		[default]
  -mtbm                       		[disabled]
  -mtls-dialect=              		gnu
  -mtls-direct-seg-refs       		[enabled]
  -mtune-ctrl=                		
  -mtune=                     		
  -muclibc                    		[disabled]
  -mveclibabi=                		[default]
  -mvect8-ret-in-mem          		[disabled]
  -mvzeroupper                		[disabled]
  -mx32                       		[disabled]
  -mxop                       		[disabled]
  -mxsave                     		[disabled]
  -mxsavec                    		[disabled]
  -mxsaveopt                  		[disabled]
  -mxsaves                    		[disabled]

  Known assembler dialects (for use with the -masm-dialect= option):
    att intel

  Known ABIs (for use with the -mabi= option):
    ms sysv

  Known code models (for use with the -mcmodel= option):
    32 kernel large medium small

  Valid arguments to -mfpmath=:
    387 387+sse 387,sse both sse sse+387 sse,387

  Known data alignment choices (for use with the -malign-data= option):
    abi cacheline compat

  Known vectorization library ABIs (for use with the -mveclibabi= option):
    acml svml

  Known address mode (for use with the -maddress-mode= option):
    long short

  Known stack protector guard (for use with the -mstack-protector-guard= option):
    global tls

  Valid arguments to -mstringop-strategy=:
    byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop
    vector_loop

  Known TLS dialects (for use with the -mtls-dialect= option):
    gnu gnu2

---

[1] https://wiki.gentoo.org/wiki/GCC_optimization#-march
[2] https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

EDIT: OP's title now more specific
EDIT2: clarify text
EDIT3: fix and reference and more

Last edited by nTia89 (2016-04-01 13:04:04)

lahwaacz · 2016-03-31 16:33:12

nTia89 wrote:

(broadwell is my CPU, as gcc reports...).

Which command have you used to make this conclusion?

nTia89 · 2016-03-31 17:28:57

maybe I could use better word...

* CPU model is listed in many places; e.g. I identified it by

grep -m1 -A3 "vendor_id" /proc/cpuinfo

* gcc, instead, which want the CPU architecture (-march) has been identified using the -march=native command. You can see both outputs have "broadwell" as "-march"

lahwaacz · 2016-03-31 17:53:00

There are many ways how detection of the correct architecture could go wrong, for instance how do you know it's "broadwell" and not "corei7-avx" or similar? Please be exhaustive and tell me exactly, how did you leap from -march=native to -march=broadwell. Saying that "as gcc reports" or "identified using -march=native" is not clear at all.

Last edited by lahwaacz (2016-03-31 17:54:19)

nTia89 · 2016-03-31 19:36:18

the command

gcc -march=native -v -Q --help=target

returns -march: broadwell, line 24
indeed GCC (https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html) reports it too

Last edited by nTia89 (2016-03-31 19:39:09)

lahwaacz · 2016-03-31 21:36:19

nTia89 wrote:

the command
gcc -march=native -v -Q --help=target
returns -march: broadwell, line 24

This command works as you expected with -march=native, but not with -march=broadwell (or anything else).

The Find CPU-specific options from the Gentoo wiki (which is linked from the makepkg ArchWiki page) shows an alternative approach, which helps to explain the behaviour of the above command. Basically, the difference between the generated files march.s and native.s (without any cleaning with sed) looks something like this:

$ diff march.s native.s
1c1
< 	.file	"march.cc"
---
> 	.file	"native.cc"
6c6,16
< # options passed:  -D_GNU_SOURCE march.cc -march=sandybridge -fverbose-asm
---
> # options passed:  -D_GNU_SOURCE native.cc -march=sandybridge -mmmx
> # -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf
> # -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma
> # -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2
> # -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase
> # -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f
> # -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1
> # -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw
> # -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit
> # -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64
> # --param l2-cache-size=3072 -mtune=sandybridge -fverbose-asm

Note that on my system, there is a difference only in the "options passed", but not in "options enabled". So, I think we can conclude that passing -march=native translates to a list of additional options before the command line is fully parsed, but -march=sandybridge (or broadwell in your case) does not. The --help=target option obviously reports only information about the command line options, but this does not mean that they will not be enabled in a later stage.

TL;DR: you need to go through a compilation phase to find out which options are really enabled.

nTia89 · 2016-04-01 13:03:47

aaah, now it's clear, finally!

[SOLVED]

For the sake of completeness, on my pc (core i5-5600U) in order to level off the -march=broadwell switch, I have to add -mabm.
In this way (I remember you, for my system only, again) I get the same optimization level.

Now is totally clear what imply switching away from the default makepkg.conf configuration

Last edited by nTia89 (2016-04-01 13:04:23)

Arch Linux

#1 2016-03-29 20:07:23

[SOLVED] gcc, -march=native VS -march=broadwell

#2 2016-03-31 16:33:12

Re: [SOLVED] gcc, -march=native VS -march=broadwell

#3 2016-03-31 17:28:57

Re: [SOLVED] gcc, -march=native VS -march=broadwell

#4 2016-03-31 17:53:00

Re: [SOLVED] gcc, -march=native VS -march=broadwell

#5 2016-03-31 19:36:18

Re: [SOLVED] gcc, -march=native VS -march=broadwell

#6 2016-03-31 21:36:19

Re: [SOLVED] gcc, -march=native VS -march=broadwell

#7 2016-04-01 13:03:47

Re: [SOLVED] gcc, -march=native VS -march=broadwell

Board footer