Topic: Suggested additional optimization in Suggestions

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

17 Mar 2012 2:17 #9839

I think that it is possibly worth automatically optimizing source code of the form realreal** to realinteger **when the exponent is in actual fact an integer.

I have been looking at the Polyhedron benchmark mp_prop_design which is not only the slowest, but the one where FTN95 does worst. The run times are improved around 10% if you do this optimization by hand (there are lots of instances of this, to powers 2, 3, 4 and 5, all of which are coded in the following form (simplified by me):

beta = alpha**2.0D0

It still doesn't bring the executable anywhere near as fast as the leaders, but it makes a comparatively large difference. On my machine and using the timer in the program, it reduces run times from 10.8 to 9.8 minutes.

I am aware that repeated multiplication can also be quicker than raising to a small power, but do not know where the cutoff happens.

Eddie

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

17 Mar 2012 7:21 #9842

Thanks for the suggestion Eddie but I would be surprised if any compiler would do this. Raising to a real power should imply that the programmer requires a logarithmic process rather than repeated multiplication. Maybe the compiler could provide a warning or a comment but even then testing a real for equality to an integer may not be trivial.

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

19 Mar 2012 3:48 #9851

Hi Paul,

It (the compiler) doesn't have any problem in knowing that a floating point entity is an integer when it tells me off for testing equality to zero (threads passim)!

In the case of mp_prop_design, the code looks like it was written by a novice, and after passing it through SPAG it loses the nice formatting it had in the original.

I wasn't suggesting checking for say

a**4.00000000001D0

only for a4.0D0 or a4.D0 (or the quivalent single precision versions)

and also not a**b where b=4.0D0 either (a run time check).

A programmer who writes a**2.0D0 clearly doesn't intend it to be done one way or another - he/she simply doesn't understand the difference (as in the case of mp_prop_design)! The check would have to be done at source code level, and would require checking (say) for '.' or '.0' and possibly stopping there, as more decimal places would, in my view, point to a wish to do it logarithmically.

It isn't a show-stopper for me, as I would always use an integer exponent if I could; or repeated multiplications, especially for 2nd and 3rd powers.

Eddie

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

19 Mar 2012 4:41 #9853

OK. I will see if I can at least provide a warning message for power 2.0.

JohnCampbell

Posts: 2526 Sydney

Back to Top

22 Mar 2012 5:02 #9885

I have found that a number of the Polyhedron benchmarks can be improved substantially for FTN95 with some simple changes, including:

shifting large local arrays from the stack to static declaration.
changes to do loop order for sequential memory access. Their selected FTN95 compilation used a very large stack.

That being said, there must be some substantial optiomisation bias (targeted to the examples) in some of the other compilers to achieve the performance they report. My codes do not perform as bad relatively, as the benchmark examples. Some of the examples use a small memory, which are better suited to cache optimisation, but the advantage can be lost with larger size problems. Some of my code show very poor cacheing with FTN95, especially when working backwards through a large array.

New vector instructions also appear to work well for the other compilers.

John

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

22 Mar 2012 9:12 #9887

John,

Other business interrupted me exploring these issues, so thanks for highlighting them. The last time I spent some time on the benchmarks I did wonder if running SPAG on them worsened FTN95's performance, but on reflection I think it is more in the programming styles of the benchmarks themselves.

The issue of performance of the two leading compilers (in speed terms) on the Polyhedron benchmarks comes up again and again in their support forums - so I agree that there could be optimization to suit the benchmarks. However, I agree with you that I have no complaints about the speed in my applications. However, I was appalled to find the mp_prop_design taking over 10 minutes, as the application doesn't look as if it does anything particularly complex.

I wonder if FTN95 doesn't optimize as aggressively as other compilers as a historical relic from the time when FTN77/DBOS was a speed leader?

It would be useful to have a consolidated list of hand optimizations and programming style hints for speed with FTN95. I'm not surprised that a Fortran 77 style runs faster than a more modern style, or am I just predjudiced?

Eddie