[Nottingham] Compiler optimisation flags (gcc, g++)
Martin
martin at ml1.co.uk
Thu Nov 6 22:59:18 GMT 2003
Robert Davies wrote:
[...]
>
> Theres' been a recent article in Gentoo news, comparing some optimisation
> flags. One of the issues was -O3 slowing code down (due to bloat), if you
> want non-explicit function inlining then consider -finline-limit=N where N is
> some size smaller than the default. Altering alignments from defaults, also
> did not help, except if made much larger.
>
> The problem with flags is, they are very processor and program specific, it's
> doubtful you get enough of a benefit over -O2 -march=athlon-xp (or -O3 with
> low -finline-limit) to warrant the effort, and noone else can tell you the
> 'right' options to use for your software.
Lots of surfing later and details for the optimisation flags seems
rather confused!
The best of the comparisons that I've found thus far are appended,
although even these are flawed because no mention is made of the
supporting hardware or even if it is the same users improving their tweaks.
So far, the good options seem to be:
export CFLAGS="-march=athlon-xp -O3 -fexpensive-optimizations
-funroll-loops -frerun-cse-after-loop -frerun-loop-opt
-fomit-frame-pointer -fschedule-insns2 -minline-all-stringops
-mfancy-math-387 -mfp-ret-in-387 -m3dnow -msse -mfpmath=sse -mmmx
-malign-double -falign-functions=4 -preferred-stack-boundary=4
-fforce-addr -pipe"
Notes:
Questions of whether the aligns should be =5 rather than =4?
Inlining and loop unrolling can be detrimental if code loops then exceed
the CPU cache size...
How extravagant is the -funroll-loops for poisoning the L1 cache?
-ffast-math can give significant speedups... However, I don't like the
idea of throwing away IEEE checks/conventions on the floats results.
Good/bad/indifferent/inconsequential?
> Things like -msse and -m3dnow are meant to be switched on by -march, anyway,
> if you check gcc in verbose mode, it actually seems to turn off these
> settings (gcc-3.{1,2,3}) immediately after you select them.
To be investigated...
All comments/advice welcome.
(About to compile gcc 3.3.2 with the presently installed gcc 3.2.2.)
Regards,
Martin
Optimisations roughly from fastest to slowest for AthlonXP CPUs:
http://www.freebench.org/cgi-bin/showdetails.pl?13=13&anabase=1&fourbase=1&masonbase=1&pcbase=1&pibase=1&distbase=1&neuralbase=1&fpmeanbase=1&intmeanbase=1&totmeanbase=1
>>>
Flags - Mason: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
-funroll-loops -finline-limit=100000 -fforce-addr -falign-functions=5
-malign-double -fbranch-probabilities
Flags - PiFFT: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
-funroll-loops -falign-loops=4 -falign-jumps=4
-mpreferred-stack-boundary=4 -fprefetch-loop-arrays -falign-functions=5
Flags - Neural: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
-funroll-loops -fprefetch-loop-arrays -falign-loops=4 -falign-jumps=4
-falign-functions=5 -fforce-addr -fbranch-probabilities
>>>
http://www.freebench.org/cgi-bin/showdetails.pl?35=35&anabase=1&fourbase=1&masonbase=1&pcbase=1&pibase=1&distbase=1&neuralbase=1&fpmeanbase=1&intmeanbase=1&totmeanbase=1
>>>
Flags - Analyzer: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
-funroll-loops -falign-loops=5 -falign-jumps=5 -falign-functions=64
-fforce-addr
Flags - FourInARow: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
-funroll-loops -falign-loops=5 -falign-jumps=5 -falign-functions=64
-fforce-addr
Flags - Mason: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
-funroll-loops -falign-loops=5 -falign-jumps=5 -falign-functions=64
-fforce-addr
Flags - pCompress2: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
-funroll-loops -falign-loops=5 -falign-jumps=5 -falign-functions=64
-fforce-addr
Flags - PiFFT: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
-funroll-loops -falign-loops=5 -falign-jumps=5 -falign-functions=64
-fforce-addr
Flags - DistRay: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
-funroll-loops -falign-loops=5 -falign-jumps=5 -falign-functions=64
-fforce-addr
Flags - Neural: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
-funroll-loops -falign-loops=5 -falign-jumps=5 -falign-functions=64
-fforce-addr
>>>
http://www.freebench.org/cgi-bin/showdetails.pl?38=38&anabase=1&fourbase=1&masonbase=1&pcbase=1&pibase=1&distbase=1&neuralbase=1&fpmeanbase=1&intmeanbase=1&totmeanbase=1
>>>
Flags - Analyzer: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
Flags - FourInARow: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
Flags - Mason: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
Flags - pCompress2: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
Flags - PiFFT: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
Flags - DistRay: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
Flags - Neural: -march=athlon-xp -O3 -fomit-frame-pointer -pipe
>>>
http://www.freebench.org/cgi-bin/showdetails.pl?77=77&anabase=1&fourbase=1&masonbase=1&pcbase=1&pibase=1&distbase=1&neuralbase=1&fpmeanbase=1&intmeanbase=1&totmeanbase=1
>>>
Flags - Analyzer: -march=athlon-xp -O3 -pipe -fomit-frame-pointer
-fforce-addr -falign-functions=64 -maccumulate-outgoing-args -ffast-math
-fprefetch-loop-arrays
Flags - FourInARow: -march=athlon-xp -O3 -pipe -fomit-frame-pointer
-fforce-addr -falign-functions=64 -maccumulate-outgoing-args -ffast-math
-fprefetch-loop-arrays
Flags - Mason: -march=athlon-xp -O3 -pipe -fomit-frame-pointer
-fforce-addr -falign-functions=64 -maccumulate-outgoing-args -ffast-math
-fprefetch-loop-arrays
Flags - pCompress2: -march=athlon-xp -O3 -pipe -fomit-frame-pointer
-fforce-addr -falign-functions=64 -maccumulate-outgoing-args -ffast-math
-fprefetch-loop-arrays
Flags - PiFFT: -march=athlon-xp -O3 -pipe -fomit-frame-pointer
-fforce-addr -falign-functions=64 -maccumulate-outgoing-args -ffast-math
-fprefetch-loop-arrays
Flags - DistRay: -march=athlon-xp -O3 -pipe -fomit-frame-pointer
-fforce-addr -falign-functions=64 -maccumulate-outgoing-args -ffast-math
-fprefetch-loop-arrays
Flags - Neural: -march=athlon-xp -O3 -pipe -fomit-frame-pointer
-fforce-addr -falign-functions=64 -maccumulate-outgoing-args -ffast-math
-fprefetch-loop-arrays
>>>
http://www.freebench.org/cgi-bin/showdetails.pl?40=40&anabase=1&fourbase=1&masonbase=1&pcbase=1&pibase=1&distbase=1&neuralbase=1&fpmeanbase=1&intmeanbase=1&totmeanbase=1
>>>
Flags - Analyzer: -O3 -pipe -march=athlon-xp -m3dnow -msse -mfpmath=sse
-mmmx -fforce-addr -fomit-frame-pointer -funroll-loops
-frerun-cse-after-loop -frerun-loop-opt -falign-functions=4
-maccumulate-outgoing-args -ffa
Flags - FourInARow: -O3 -pipe -march=athlon-xp -m3dnow -msse
-mfpmath=sse -mmmx -fforce-addr -fomit-frame-pointer -funroll-loops
-frerun-cse-after-loop -frerun-loop-opt -falign-functions=4
-maccumulate-outgoing-args -ffa
Flags - Mason: -O3 -pipe -march=athlon-xp -m3dnow -msse -mfpmath=sse
-mmmx -fforce-addr -fomit-frame-pointer -funroll-loops
-frerun-cse-after-loop -frerun-loop-opt -falign-functions=4
-maccumulate-outgoing-args -ffa
Flags - pCompress2: -O3 -pipe -march=athlon-xp -m3dnow -msse
-mfpmath=sse -mmmx -fforce-addr -fomit-frame-pointer -funroll-loops
-frerun-cse-after-loop -frerun-loop-opt -falign-functions=4
-maccumulate-outgoing-args -ffa
Flags - PiFFT: -O3 -pipe -march=athlon-xp -m3dnow -msse -mfpmath=sse
-mmmx -fforce-addr -fomit-frame-pointer -funroll-loops
-frerun-cse-after-loop -frerun-loop-opt -falign-functions=4
-maccumulate-outgoing-args -ffa
Flags - DistRay: -O3 -pipe -march=athlon-xp -m3dnow -msse -mfpmath=sse
-mmmx -fforce-addr -fomit-frame-pointer -funroll-loops
-frerun-cse-after-loop -frerun-loop-opt -falign-functions=4
-maccumulate-outgoing-args -ffa
Flags - Neural: -O3 -pipe -march=athlon-xp -m3dnow -msse -mfpmath=sse
-mmmx -fforce-addr -fomit-frame-pointer -funroll-loops
-frerun-cse-after-loop -frerun-loop-opt -falign-functions=4
-maccumulate-outgoing-args -ffa
>>>
http://www.freebench.org/cgi-bin/showdetails.pl?21=21&anabase=1&fourbase=1&masonbase=1&pcbase=1&pibase=1&distbase=1&neuralbase=1&fpmeanbase=1&intmeanbase=1&totmeanbase=1
>>>
Flags - Analyzer: -O3 -march=athlon-xp -fmove-all-movables
-fprefetch-loop-arrays -funroll-loops -fomit-frame-pointer -ffast-math
-mmmx -msse -m3dnow -mfpmath=sse,387 -pipe
Flags - FourInARow: -O3 -march=athlon-xp -fmove-all-movables
-fprefetch-loop-arrays -funroll-loops -fomit-frame-pointer -ffast-math
-mmmx -msse -m3dnow -mfpmath=sse,387 -pipe
Flags - Mason: -O3 -march=athlon-xp -fmove-all-movables
-fprefetch-loop-arrays -funroll-loops -fomit-frame-pointer -ffast-math
-mmmx -msse -m3dnow -mfpmath=sse,387 -pipe
Flags - pCompress2: -O3 -march=athlon-xp -fmove-all-movables
-fprefetch-loop-arrays -funroll-loops -fomit-frame-pointer -ffast-math
-mmmx -msse -m3dnow -mfpmath=sse,387 -pipe
Flags - PiFFT: -O3 -march=athlon-xp -fmove-all-movables
-fprefetch-loop-arrays -funroll-loops -fomit-frame-pointer -ffast-math
-mmmx -msse -m3dnow -mfpmath=sse,387 -pipe
Flags - DistRay: -O3 -march=athlon-xp -fmove-all-movables
-fprefetch-loop-arrays -funroll-loops -fomit-frame-pointer -ffast-math
-mmmx -msse -m3dnow -mfpmath=sse,387 -pipe
Flags - Neural: -O3 -march=athlon-xp -fmove-all-movables
-fprefetch-loop-arrays -funroll-loops -fomit-frame-pointer -ffast-math
-mmmx -msse -m3dnow -mfpmath=sse,387 -pipe
>>>
http://www.freebench.org/cgi-bin/showdetails.pl?90=90&anabase=1&fourbase=1&masonbase=1&pcbase=1&pibase=1&distbase=1&neuralbase=1&fpmeanbase=1&intmeanbase=1&totmeanbase=1
>>>
Flags - Analyzer: -O3
Flags - FourInARow: -O3
Flags - Mason: -O3
Flags - pCompress2: -O3
Flags - PiFFT: -O3
Flags - DistRay: -O3
Flags - Neural: -O3
>>>
--
----------------
Martin Lomas
martin at ml1.co.uk
----------------
More information about the Nottingham
mailing list