Mecej4
I found your post fascinating on several levels, not least because FTN95 usually has excellent diagnostics, and it left me wondering about some of the compilers I have used in the past, that would have said ‘Error’ and little more, perhaps an inscrutable number, or if one was lucky the hint ‘V’ or an indication where the compiler had got to in the offending line. Presumably FTN95 had seen ‘v(i)’ and was prepared to believe that it was a function from then on.
I’m quite happy to accept that this is a cut down and ‘tweaked’ example to show the principle you observed. However, it made me think about compiler optimisation, particularly regarding common subexpressions and loop invariant things. Does, for example, partly replacing common subexpressions foul up the optimiser in FTN95? Do generic functions execute more slowly than type-specific ones? Is the convenience of referring to array elements multiple times slower than adding an assignment, so that a single variable can be used instead?
I might add for good measure the questions how often is the routine executed, and therefore do any time savings really matter, and then finally, can the optimiser do a better job than the programmer? I have to admit that I have avoided /OPT but still find FTN95 acceptably fast, and also that I don’t derive huge confidence from the description of optimisation in FTN95.CHM
As many of my programming habits were formed in the distant past (and this also implies being before optimising compilers), then I would probably be looking to replace ‘raise to the power’ by multiplication, take care of common subexpressions myself, remove loop invariant calculations and so on. Moreover, as my programming habits use COMMON blocks instead of subroutine arguments to keep stack sizes as small as possible (so no SAVE needed, but INTENTs now not possible), the short array ALF might be passed to this routine as A1, A2 and A3 so that I don’t have assignments to local variables to take care of.
Just so that you can see how much of a museum piece I am, I’ve changed the constant 2 and the EXP to type-specific forms, and converted the DO loop to traditional form, as an outdent is as good as indents as far as I’m concerned, although I left the lower case for comparability. Was changing the 2 a sensible thing, as could FTN95 arguably do it a different way? (Say keeping it in a register and doing an add). Would ordering the parts of the assignment improve efficiency?
common /alphas/ a1, a2, a3
a3sq = a3*a3
do 20 i=1,ndata
ti = t(i)
coef1 = ti - a2
coef2 = coef1 / a3sq
djac(i,1) = -ti * dexp(-a1*ti)
djac(i,2) = 2.0D0 * coef2 * dexp(-coef1*coef2)
20 continue
Assuming that I have translated the intention of your original Fortran code properly, I find that the addition of a lot of white space and the removal of a forest of brackets helps me enormously, and for that matter, helps everything fit easily within the 72 column card format limit. Certainly doing the common subexpression stuff myself makes the logic a bit more convoluted, but the removal of so many brackets evens the score.
Now you see how we dinosaurs survive in the Lost World of northern Hampshire even if we are extinct elsewhere!
I wonder if instead of A3SQ I should have calculated its inverse, so that inside the DO loop I could have a multiply and not a divide? Even if I coded these two assignments as statement functions, I imagine that I would have precomputed A3SQ.
None of this (of course) implies any criticism, and if a compiler is going to give generally helpful diagnostics it might as well give correct ones instead of misleading ones. Perhaps looking at this from my own particular programming perspective has shown me why I seemed to get so little out of /OPT when I tried it last ... maybe my programming habits defeat some of the optimisations built in to FTN95.
Eddie