You are not logged in.
Hi folks,
today I've read our wiki about makepkg.
I decided to try this command
gcc -march=native -v -Q --help=target
, as described in the wiki.
Then, I compared the output of that command with the one from
gcc -march=broadwell -v -Q --help=target
(broadwell is my CPU, as gcc reports...).
Surprisingly, the two outputs are different!
diff native.txt broadwell.txt command returns:
12c12
< -mabm [enabled]
---
> -mabm [disabled]
15,16c15,16
< -madx [enabled]
< -maes [enabled]
---
> -madx [disabled]
> -maes [disabled]
26,27c26,27
< -mavx [enabled]
< -mavx2 [enabled]
---
> -mavx [disabled]
> -mavx2 [disabled]
40,41c40,41
< -mbmi [enabled]
< -mbmi2 [enabled]
---
> -mbmi [disabled]
> -mbmi2 [disabled]
49c49
< -mcx16 [enabled]
---
> -mcx16 [disabled]
52c52
< -mf16c [enabled]
---
> -mf16c [disabled]
55c55
< -mfma [enabled]
---
> -mfma [disabled]
60c60
< -mfsgsbase [enabled]
---
> -mfsgsbase [disabled]
62c62
< -mfxsr [enabled]
---
> -mfxsr [disabled]
76c76
< -mlzcnt [enabled]
---
> -mlzcnt [disabled]
79,80c79,80
< -mmmx [enabled]
< -mmovbe [enabled]
---
> -mmmx [disabled]
> -mmovbe [disabled]
89c89
< -mno-sse4 [disabled]
---
> -mno-sse4 [enabled]
95c95
< -mpclmul [enabled]
---
> -mpclmul [disabled]
97c97
< -mpopcnt [enabled]
---
> -mpopcnt [disabled]
101c101
< -mprfchw [enabled]
---
> -mprfchw [disabled]
103,104c103,104
< -mrdrnd [enabled]
< -mrdseed [enabled]
---
> -mrdrnd [disabled]
> -mrdseed [disabled]
112c112
< -msahf [enabled]
---
> -msahf [disabled]
116,117c116,117
< -msse [enabled]
< -msse2 [enabled]
---
> -msse [disabled]
> -msse2 [disabled]
119,122c119,122
< -msse3 [enabled]
< -msse4 [enabled]
< -msse4.1 [enabled]
< -msse4.2 [enabled]
---
> -msse3 [disabled]
> -msse4 [disabled]
> -msse4.1 [disabled]
> -msse4.2 [disabled]
126c126
< -mssse3 [enabled]
---
> -mssse3 [disabled]
135c135
< -mtune= broadwell
---
> -mtune=
142c142
< -mxsave [enabled]
---
> -mxsave [disabled]
144c144
< -mxsaveopt [enabled]
---
> -mxsaveopt [disabled]
My question is: why the two outputs are different?
gcc reports that -march=native means that gcc will interprets at compilation time[1] the right system's processor architecture and then gcc uses it.
in a few word, -march=native on my system should be equal to -march=broadwell, hence the same output is expected to be returned.
What am I missing?
Reading a bit on the web, I've thought: can be broadwell has been created for earlier version of broadwell CPUs ?
Then I've realized that this could not be true and indeed I checked the gcc page which contains broadwell optimization[2].
further, I share complete output
native:
The following options are target specific:
-m128bit-long-double [disabled]
-m16 [disabled]
-m32 [disabled]
-m3dnow [disabled]
-m3dnowa [disabled]
-m64 [enabled]
-m80387 [enabled]
-m8bit-idiv [disabled]
-m96bit-long-double [enabled]
-mabi= sysv
-mabm [enabled]
-maccumulate-outgoing-args [disabled]
-maddress-mode= short
-madx [enabled]
-maes [enabled]
-malign-data= compat
-malign-double [disabled]
-malign-functions= 0
-malign-jumps= 0
-malign-loops= 0
-malign-stringops [enabled]
-mandroid [disabled]
-march= broadwell
-masm= att
-mavx [enabled]
-mavx2 [enabled]
-mavx256-split-unaligned-load [disabled]
-mavx256-split-unaligned-store [disabled]
-mavx512bw [disabled]
-mavx512cd [disabled]
-mavx512dq [disabled]
-mavx512er [disabled]
-mavx512f [disabled]
-mavx512ifma [disabled]
-mavx512pf [disabled]
-mavx512vbmi [disabled]
-mavx512vl [disabled]
-mbionic [disabled]
-mbmi [enabled]
-mbmi2 [enabled]
-mbranch-cost= 0
-mcld [disabled]
-mclflushopt [disabled]
-mclwb [disabled]
-mcmodel= 32
-mcpu=
-mcrc32 [disabled]
-mcx16 [enabled]
-mdispatch-scheduler [disabled]
-mdump-tune-features [disabled]
-mf16c [enabled]
-mfancy-math-387 [enabled]
-mfentry [enabled]
-mfma [enabled]
-mfma4 [disabled]
-mforce-drap [disabled]
-mfp-ret-in-387 [enabled]
-mfpmath= 387
-mfsgsbase [enabled]
-mfused-madd
-mfxsr [enabled]
-mglibc [enabled]
-mhard-float [enabled]
-mhle [disabled]
-mieee-fp [enabled]
-mincoming-stack-boundary= 0
-minline-all-stringops [disabled]
-minline-stringops-dynamically [disabled]
-mintel-syntax
-mlarge-data-threshold= 0x10000
-mlong-double-128 [disabled]
-mlong-double-64 [disabled]
-mlong-double-80 [enabled]
-mlwp [disabled]
-mlzcnt [enabled]
-mmemcpy-strategy=
-mmemset-strategy=
-mmmx [enabled]
-mmovbe [enabled]
-mmpx [disabled]
-mms-bitfields [disabled]
-mmwaitx [disabled]
-mno-align-stringops [disabled]
-mno-default [disabled]
-mno-fancy-math-387 [disabled]
-mno-push-args [disabled]
-mno-red-zone [disabled]
-mno-sse4 [disabled]
-mnop-mcount [disabled]
-momit-leaf-frame-pointer [disabled]
-mpc32 [disabled]
-mpc64 [disabled]
-mpc80 [disabled]
-mpclmul [enabled]
-mpcommit [disabled]
-mpopcnt [enabled]
-mprefer-avx128 [disabled]
-mpreferred-stack-boundary= 0
-mprefetchwt1 [disabled]
-mprfchw [enabled]
-mpush-args [enabled]
-mrdrnd [enabled]
-mrdseed [enabled]
-mrecip [disabled]
-mrecip=
-mrecord-mcount [disabled]
-mred-zone [enabled]
-mregparm= 0
-mrtd [disabled]
-mrtm [disabled]
-msahf [enabled]
-msha [disabled]
-mskip-rax-setup [disabled]
-msoft-float [disabled]
-msse [enabled]
-msse2 [enabled]
-msse2avx [disabled]
-msse3 [enabled]
-msse4 [enabled]
-msse4.1 [enabled]
-msse4.2 [enabled]
-msse4a [disabled]
-msse5
-msseregparm [disabled]
-mssse3 [enabled]
-mstack-arg-probe [disabled]
-mstack-protector-guard= tls
-mstackrealign [enabled]
-mstringop-strategy= [default]
-mtbm [disabled]
-mtls-dialect= gnu
-mtls-direct-seg-refs [enabled]
-mtune-ctrl=
-mtune= broadwell
-muclibc [disabled]
-mveclibabi= [default]
-mvect8-ret-in-mem [disabled]
-mvzeroupper [disabled]
-mx32 [disabled]
-mxop [disabled]
-mxsave [enabled]
-mxsavec [disabled]
-mxsaveopt [enabled]
-mxsaves [disabled]
Known assembler dialects (for use with the -masm-dialect= option):
att intel
Known ABIs (for use with the -mabi= option):
ms sysv
Known code models (for use with the -mcmodel= option):
32 kernel large medium small
Valid arguments to -mfpmath=:
387 387+sse 387,sse both sse sse+387 sse,387
Known data alignment choices (for use with the -malign-data= option):
abi cacheline compat
Known vectorization library ABIs (for use with the -mveclibabi= option):
acml svml
Known address mode (for use with the -maddress-mode= option):
long short
Known stack protector guard (for use with the -mstack-protector-guard= option):
global tls
Valid arguments to -mstringop-strategy=:
byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop
vector_loop
Known TLS dialects (for use with the -mtls-dialect= option):
gnu gnu2
broadwell:
The following options are target specific:
-m128bit-long-double [disabled]
-m16 [disabled]
-m32 [disabled]
-m3dnow [disabled]
-m3dnowa [disabled]
-m64 [enabled]
-m80387 [enabled]
-m8bit-idiv [disabled]
-m96bit-long-double [enabled]
-mabi= sysv
-mabm [disabled]
-maccumulate-outgoing-args [disabled]
-maddress-mode= short
-madx [disabled]
-maes [disabled]
-malign-data= compat
-malign-double [disabled]
-malign-functions= 0
-malign-jumps= 0
-malign-loops= 0
-malign-stringops [enabled]
-mandroid [disabled]
-march= broadwell
-masm= att
-mavx [disabled]
-mavx2 [disabled]
-mavx256-split-unaligned-load [disabled]
-mavx256-split-unaligned-store [disabled]
-mavx512bw [disabled]
-mavx512cd [disabled]
-mavx512dq [disabled]
-mavx512er [disabled]
-mavx512f [disabled]
-mavx512ifma [disabled]
-mavx512pf [disabled]
-mavx512vbmi [disabled]
-mavx512vl [disabled]
-mbionic [disabled]
-mbmi [disabled]
-mbmi2 [disabled]
-mbranch-cost= 0
-mcld [disabled]
-mclflushopt [disabled]
-mclwb [disabled]
-mcmodel= 32
-mcpu=
-mcrc32 [disabled]
-mcx16 [disabled]
-mdispatch-scheduler [disabled]
-mdump-tune-features [disabled]
-mf16c [disabled]
-mfancy-math-387 [enabled]
-mfentry [enabled]
-mfma [disabled]
-mfma4 [disabled]
-mforce-drap [disabled]
-mfp-ret-in-387 [enabled]
-mfpmath= 387
-mfsgsbase [disabled]
-mfused-madd
-mfxsr [disabled]
-mglibc [enabled]
-mhard-float [enabled]
-mhle [disabled]
-mieee-fp [enabled]
-mincoming-stack-boundary= 0
-minline-all-stringops [disabled]
-minline-stringops-dynamically [disabled]
-mintel-syntax
-mlarge-data-threshold= 0x10000
-mlong-double-128 [disabled]
-mlong-double-64 [disabled]
-mlong-double-80 [enabled]
-mlwp [disabled]
-mlzcnt [disabled]
-mmemcpy-strategy=
-mmemset-strategy=
-mmmx [disabled]
-mmovbe [disabled]
-mmpx [disabled]
-mms-bitfields [disabled]
-mmwaitx [disabled]
-mno-align-stringops [disabled]
-mno-default [disabled]
-mno-fancy-math-387 [disabled]
-mno-push-args [disabled]
-mno-red-zone [disabled]
-mno-sse4 [enabled]
-mnop-mcount [disabled]
-momit-leaf-frame-pointer [disabled]
-mpc32 [disabled]
-mpc64 [disabled]
-mpc80 [disabled]
-mpclmul [disabled]
-mpcommit [disabled]
-mpopcnt [disabled]
-mprefer-avx128 [disabled]
-mpreferred-stack-boundary= 0
-mprefetchwt1 [disabled]
-mprfchw [disabled]
-mpush-args [enabled]
-mrdrnd [disabled]
-mrdseed [disabled]
-mrecip [disabled]
-mrecip=
-mrecord-mcount [disabled]
-mred-zone [enabled]
-mregparm= 0
-mrtd [disabled]
-mrtm [disabled]
-msahf [disabled]
-msha [disabled]
-mskip-rax-setup [disabled]
-msoft-float [disabled]
-msse [disabled]
-msse2 [disabled]
-msse2avx [disabled]
-msse3 [disabled]
-msse4 [disabled]
-msse4.1 [disabled]
-msse4.2 [disabled]
-msse4a [disabled]
-msse5
-msseregparm [disabled]
-mssse3 [disabled]
-mstack-arg-probe [disabled]
-mstack-protector-guard= tls
-mstackrealign [enabled]
-mstringop-strategy= [default]
-mtbm [disabled]
-mtls-dialect= gnu
-mtls-direct-seg-refs [enabled]
-mtune-ctrl=
-mtune=
-muclibc [disabled]
-mveclibabi= [default]
-mvect8-ret-in-mem [disabled]
-mvzeroupper [disabled]
-mx32 [disabled]
-mxop [disabled]
-mxsave [disabled]
-mxsavec [disabled]
-mxsaveopt [disabled]
-mxsaves [disabled]
Known assembler dialects (for use with the -masm-dialect= option):
att intel
Known ABIs (for use with the -mabi= option):
ms sysv
Known code models (for use with the -mcmodel= option):
32 kernel large medium small
Valid arguments to -mfpmath=:
387 387+sse 387,sse both sse sse+387 sse,387
Known data alignment choices (for use with the -malign-data= option):
abi cacheline compat
Known vectorization library ABIs (for use with the -mveclibabi= option):
acml svml
Known address mode (for use with the -maddress-mode= option):
long short
Known stack protector guard (for use with the -mstack-protector-guard= option):
global tls
Valid arguments to -mstringop-strategy=:
byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop
vector_loop
Known TLS dialects (for use with the -mtls-dialect= option):
gnu gnu2
---
[1] https://wiki.gentoo.org/wiki/GCC_optimization#-march
[2] https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
EDIT: OP's title now more specific
EDIT2: clarify text
EDIT3: fix and reference and more
Last edited by nTia89 (2016-04-01 13:04:04)
+pc: custom | AMD Opteron 175 | nForce4 Ultra | 2GB ram DDR400 | nVidia 9800GT 1GB | ArchLinux x86_64 w/ openbox
+laptop: Apple | MacBook (2,1) | 2GB ram | Mac OS X 10.4 -> DIED
+ultrabook: Dell | XPS 13 (9343) | 8GB ram | 256GB ssd | FullHD display | Windows 8.1 64bit ArchLinux x86_64 w/ Gnome
Offline
(broadwell is my CPU, as gcc reports...).
Which command have you used to make this conclusion?
Offline
maybe I could use better word...
* CPU model is listed in many places; e.g. I identified it by
grep -m1 -A3 "vendor_id" /proc/cpuinfo
* gcc, instead, which want the CPU architecture (-march) has been identified using the -march=native command. You can see both outputs have "broadwell" as "-march"
+pc: custom | AMD Opteron 175 | nForce4 Ultra | 2GB ram DDR400 | nVidia 9800GT 1GB | ArchLinux x86_64 w/ openbox
+laptop: Apple | MacBook (2,1) | 2GB ram | Mac OS X 10.4 -> DIED
+ultrabook: Dell | XPS 13 (9343) | 8GB ram | 256GB ssd | FullHD display | Windows 8.1 64bit ArchLinux x86_64 w/ Gnome
Offline
There are many ways how detection of the correct architecture could go wrong, for instance how do you know it's "broadwell" and not "corei7-avx" or similar? Please be exhaustive and tell me exactly, how did you leap from -march=native to -march=broadwell. Saying that "as gcc reports" or "identified using -march=native" is not clear at all.
Last edited by lahwaacz (2016-03-31 17:54:19)
Offline
the command
gcc -march=native -v -Q --help=target
returns -march: broadwell, line 24
indeed GCC (https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html) reports it too
Last edited by nTia89 (2016-03-31 19:39:09)
+pc: custom | AMD Opteron 175 | nForce4 Ultra | 2GB ram DDR400 | nVidia 9800GT 1GB | ArchLinux x86_64 w/ openbox
+laptop: Apple | MacBook (2,1) | 2GB ram | Mac OS X 10.4 -> DIED
+ultrabook: Dell | XPS 13 (9343) | 8GB ram | 256GB ssd | FullHD display | Windows 8.1 64bit ArchLinux x86_64 w/ Gnome
Offline
the command
gcc -march=native -v -Q --help=target
returns -march: broadwell, line 24
This command works as you expected with -march=native, but not with -march=broadwell (or anything else).
The Find CPU-specific options from the Gentoo wiki (which is linked from the makepkg ArchWiki page) shows an alternative approach, which helps to explain the behaviour of the above command. Basically, the difference between the generated files march.s and native.s (without any cleaning with sed) looks something like this:
$ diff march.s native.s
1c1
< .file "march.cc"
---
> .file "native.cc"
6c6,16
< # options passed: -D_GNU_SOURCE march.cc -march=sandybridge -fverbose-asm
---
> # options passed: -D_GNU_SOURCE native.cc -march=sandybridge -mmmx
> # -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf
> # -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma
> # -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2
> # -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mno-rdrnd -mno-f16c -mno-fsgsbase
> # -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f
> # -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1
> # -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw
> # -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit
> # -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64
> # --param l2-cache-size=3072 -mtune=sandybridge -fverbose-asm
Note that on my system, there is a difference only in the "options passed", but not in "options enabled". So, I think we can conclude that passing -march=native translates to a list of additional options before the command line is fully parsed, but -march=sandybridge (or broadwell in your case) does not. The --help=target option obviously reports only information about the command line options, but this does not mean that they will not be enabled in a later stage.
TL;DR: you need to go through a compilation phase to find out which options are really enabled.
Offline
aaah, now it's clear, finally!
[SOLVED]
For the sake of completeness, on my pc (core i5-5600U) in order to level off the -march=broadwell switch, I have to add -mabm.
In this way (I remember you, for my system only, again) I get the same optimization level.
Now is totally clear what imply switching away from the default makepkg.conf configuration
Last edited by nTia89 (2016-04-01 13:04:23)
+pc: custom | AMD Opteron 175 | nForce4 Ultra | 2GB ram DDR400 | nVidia 9800GT 1GB | ArchLinux x86_64 w/ openbox
+laptop: Apple | MacBook (2,1) | 2GB ram | Mac OS X 10.4 -> DIED
+ultrabook: Dell | XPS 13 (9343) | 8GB ram | 256GB ssd | FullHD display | Windows 8.1 64bit ArchLinux x86_64 w/ Gnome
Offline