forums.silverfrost.com

PaulLaidler · Posted: Mon Jun 23, 2008 8:15 am Post subject:

As I understand it, opimisation does not reorder Fortran statements as such but it does optimise the way in which a given Fortran statement is represented in assembly code. Optimisations can include removing repeated expressions and holding certain intermediate values in registers rather than writing them back to memory but only in a way that does not change or reorder the expressed intention of the programmer.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

This thread is including something of interest to me.

While I am a strong supporter of FTN95, and acknowledge its strengths in checkmate, debugging and clearwin+, there are some aspects of run-time performance which could be improved, if run-time benchmarks are a true indicater.

I would like an option where array operations could be implemented using an automatic optimisation. I don't like how dot_product is implemented as in-line code and performance can change, depending on the compiler options.
I typically compile with /debug, and avoid /opt, due to problems with this in many past compilers. My past experience is general optimisation does not always work best, but nor does selective optimisation levels.
I am waiting for the results of work on memory management for /3gb and hope this addresses some of the performance problems with real*8 calculations.
As with some of DanRRight's comments, a lot of our bad impressions are basd on past experience, which may not be correct for the current compiler.

I saw some of the results from test procedures from equation.com, to drive multiple processors. It certainly would be interesting if this approach could be applied to some basic (large) vector operations. Dan may be right in that "parallelization is our unavoidable future". It's worth watching.

regards John

PaulLaidler · Posted: Mon Jun 23, 2008 2:33 pm Post subject:

There are 48 optimisations for which we have internal documentation.
I will investigate to see if this documentation might be released in some form.

/INHIBIT_OPTIMISATION <n>

inhibits a given optimisation and number 41 is documented as "dot product detection".

Please note that many optisations are applied even when /OPT does not appear on the command line.

Andrew · Joined: 09 Sep 2004 Posts: 232 Location: Frankfurt, Germany

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

I have, for a long time, been trying to identify how I can improve the calculation performance of my equation solver in my finite element program.
I checked my past emails to Salford, and a lot of the identified problems I was having were reported in 2002, so I can't confirm this is still the case.
There is a vague indication that other compilers have better performance in this area, but I don't have any definite proof.
Certainly in 2002, I was getting results where the run time performance for "dot_product" could vary by a factor of 2, and my then past knowledge assumed that real*8 arithmetic should be a substantial part of the compute time. I was asking myself what was happening in this extra processing time, as the mathematical computation part does not change. My conclusion was that it was associated with either unnecessary transfer of data between memory and the more confusing movement of data between memory "secondary cache" and the processor.
For the last few years I have not been able to run benchmarks that reliably indicate performance and also show performance improvements that relate to programming strategies of the 70's and 80's. I put this down to the vagaries of the intel cache management.

The problem now gets more complicated, with the larger problem size. I have been trying to improve performance where the active matrix size is in the range of 1gb to 3gb. Any disk I/O now has a huge performance penalty, which can be compounded by virtual memory mapping, even where there is adequate physical memory.

The equation solver I use is a skyline solver for large sets of linear simultaneous equations (symmetric), which was a preferred direct solver in the 70's to 90's. It has two basic array processes:
Dot_Product and
vector_A = Vector_A - beta * Vector_B
These vectors are typically 0-20,000 elements long.

My holy grail is to get a procedure for these two, which optimises performance. The three areas I have identified as problems are:
1) unnecessary variable shifts ( as in 2002)
2) not utilising multiple processors
3) not getting unnecessary disk transfers

To me the basic mission is, Dot_product gets the starting address and byte step of 2 vectors in memory, then produces the a.b answer. What puzzles me is why it is so difficult to optimise.

I look forward to improvements to memory management, especially in SLINK, when addressing improvements to the /3gb switch.

Keep up the good work.

John

ps : I wonder what I would do next if this problem had a solution ?

PaulLaidler · Posted: Wed Jun 25, 2008 7:26 am Post subject:

John

If you would like to post a sample calculation I would like to take a look at it when I can. I don't know when that will be but if I had your code to hand I might be able to find a minute to look at it.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

Paul,

Thanks, I will review some of the emails I sent in 2002 and see if I can summarise a later one that still identifies the problem and email it in a cleaner form. It is useful to look at these emails after time and see how (un)clearly I described the problem.

Typical of the problems with in-line expansion of dot_product
x = dot_product ( a(i1:i1+n-1), b(j1:j1+n-1) )

is a fairly good example of when compilation with /debug produces a poor solution. Even replacing this by an intermediate call,

x = vec_sum ( a(i1:i1+n-1), b(j1:j1+n-1) ) or (f95)
x = vec_sum ( a(i1), b(j1), n ) (f77)

where vec_sum is only a call to dot_product produces a much better result.

Also, I saw your comment on selective omission of optimisation. Lahey had something similar, but I never found it useful. It became difficult to be able to selectively use and remember which parts of the code could cause what problems.
When my programs have many files, I do use different compilation options (in .bat files), being:
/check for data reading and reporting
/debug for most code and
/opt for routines that are stable and use a high proportion of the run time.
I do have vec_sum compiled with /opt in my library file.

I would like to see automatic implimentation of /opt in "safe" code areas, such as:-
array functions like dot_product and
do loops where there are no unusual exits, such as calls to subrotines or non-pure procedures
I suppose what I am saying is I'm lazy and I want you to put the effort into improving optimising, rather than me trying to understand what optimisation approaches give me trouble.

John

PaulLaidler · Posted: Thu Jun 26, 2008 8:04 am Post subject:

There is a lot of opmisation that is carried out by default.
/opt provides extra optimisation that could be less safe in certain extreme circumstances.