forums.silverfrost.com

PaulLaidler · Posted: Sun Mar 22, 2015 8:59 am Post subject:

Eddie
There is a good chance that your best salflibc.dll will work with FTN77.

LitusSaxonicum · Posted: Sun Mar 22, 2015 10:36 am Post subject:

Paul, I suspected as much. Now, I wonder what the benchmarks do with FTN77? (Or does it have the same back end as FTN95?)

Eddie

mecej4 · Joined: 31 Oct 2006 Posts: 1888

JohnCampbell · Joined: 16 Feb 2006 Posts: 2556 Location: Sydney

Dan wrote:

LitusSaxonicum · Posted: Sun Mar 22, 2015 7:46 pm Post subject:

Thanks MECEJ4, it's not obvious from snippets that they are Fortran 90.

Over half a billion calls to a routine with all those parameters? You'd make a really big improvement if they were in COMMON! And that's without all the Fortran 90 stuff that John seems to think that is costing so much time.

Plus if you are making over half a billion calls to a subroutine in the first place, the program structure is probably all to cock ...

John-Silver · Joined: 30 Jul 2013 Posts: 1520 Location: Aerospace Valley

I was thinking along the same lines as you Eddie when I saw the 1/2 billion.

What came into my mind was that codes overall runtimes are not linear are they.
The case of half a billion loops is a pretty extreme example, for which the optimum optimised may be very interesting for the 0.1% of the computing world that needs them, but for Jowe blogs with a 'normal' size (whatever that is) program , FTN95 will probably perform much closer to the best.
I like your analogy of the 2min and 20min runtimes by the way. That's reality !
I think the big practical measure missing is a good tome about optimising a programs construction. I've already learned a lot being on here, the most basic being getting the order of the DO loops in right order, something which hadn't even occurred to me before tbh, simply because it never has been important. A 30 sec runtime is the same as a 10 minute runtime for a program for most people, both are a cup of coffee long, and if you're running say 50 of those a day then something is much more amiss than the programs optimisation.

It's analagous to FE modelling and the size of models .... people get lazy and start meshing like billyo, just because they can and end up with a mesh 10 times too small and hence 100 times too big a model globally , and hence 1000+times longer to tun than need be. An extrwme example I know, but even a 2times too big mesh would result easily in around 20 times the runtime.
The problem with Fortran is that its not always obvious where the reductions could be made.

I think less-extreme benchmarks are equally as valid in comparing compilers for this type of reason, because the aim should be to be 'optimum' for the highest percentage of users, not measured against the extreme-programmes only. It's unfair on the compilers to just consider the extreme case.
Of course the real problem is, just like for FE models, computers are too powerful today and most computing is way over the top as a result and creates more problems than it solves ! That's a fact.

LitusSaxonicum · Posted: Sun Mar 22, 2015 8:36 pm Post subject:

John-Silver,

Kreitzberg & Schneiderman: 'The elements of FORTRAN style' (Harcourt Brace Jovanovich) is where I started. Still available on the internet. Probably less than half of it is relevant today, sadly. My first copy was loaned and never returned, then I got another ... I suspect you can't pop round to my house to borrow it!

There are needs for speed: you are about to crash onto the moon? You want tomorrow's weather forecast, and the run-time is 25 hours? You need speed then.

But mostly you don't, and the speed ratios of even 300 to one mean nothing if FTN95-compiled code executes in less than a second! Then, there's the business I already alluded to of useful speed.

Eddie

DanRRight · Posted: Mon Mar 23, 2015 12:40 am Post subject:

John,
What high res timer do you use here? I am now confused, so many were discussed before, can you please post again whole its text and usage? I need it for tuning of some my own stuff (unfortunately no time even to bend the painful nail in the shoe, let alone for anything else like Polyhedron stuff Sad

Paul, what this english word "backend" means? I have with it some very wrong associations, but i am not english speaking Smile

. By the way, i have some third party parallel algebra 32bit libraries compiled by ancient MS and recent Intel Fortran and which somehow work with FTN95, will they work under 64bit?

JohnCampbell · Joined: 16 Feb 2006 Posts: 2556 Location: Sydney

There are a number of good timers available. Mecej4 recently posted a simple routine with integer*8 access to rdtsc, which works like CPU_CLOCK@. The good timers are:

call system_clock (count_start, count_rate, count_max)

STDCALL QUERYPERFORMANCECOUNTER 'QueryPerformanceCounter' (REF):LOGICAL*4
STDCALL QUERYPERFORMANCEFREQUENCY 'QueryPerformanceFrequency' (REF):LOGICAL*4

cpu_clock@ ( which uses rdtsc instruction)

integer*8 function rdtsc_tick ()
integer*8 cnt1
!
! get rdtsc value
code
rdtsc
mov cnt1,eax
mov cnt1[4],edx
edoc
!
rdtsc_tick = cnt1
end function rdtsc_tick

Both CPU_CLOCK@ tick at the processor clock rate and have a small call overhead. The problem is you need to calibrate them, which can be achieved by timing with SYSTEM_CLOCK and accumulating the ticks.

With FTN95, SYSTEM_CLOCK is accurate and easy to use, although rdtsc is much better for timing shorter duration events.
All these timers are elapsed time timers.
For each timer routine, I have developed 3 function types:
* integer*8 function RDTSC_TICK () returns the tick count
* integer*8 function RDTSC_RATE () returns the tick rate in ticks per second
* real*8 function RDTSC_SECONDS () returns the time in seconds
I have similar for SYSTEM_CLOCK_xxx and QUERYPERFORMANCE_xxx

------------------------------------------------------
All the other timers, including all CPU timers are hopeless. They all report a tick value that is updated 64 times per second. If you event is of the order of seconds, then these would be ok, but for accurate timing they are no good. (It depends on what you want)

These timers include:
cpu_time (intrinsic)
date_and_time (intrinsic)
high_res_clock@ (ftn95)
dclock@ (ftn95)
clock@ (ftn95)
GetLocalTime (winapi)
GetTickCount (winapi)
GetProcessTimes (winapi)

I hope this answers your question.

John

JohnCampbell · Joined: 16 Feb 2006 Posts: 2556 Location: Sydney

The following is my example of my Function RDTSC_Rate, which shows how to utilise the tick rate of each routine. Only RDTSC and CPU_CLOCK@ give different values at each call, all others return values faster than their clock rate.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2556 Location: Sydney

queryperform routines are:

LitusSaxonicum · Posted: Mon Mar 23, 2015 12:59 pm Post subject:

Dan,

Good enough explanations for front end and back end are in http://en.wikipedia.org/wiki/Compiler under the section 'Structure of a compiler'.

It's the bit that is machine and OS dependent I think.

Eddie

JohnCampbell · Joined: 16 Feb 2006 Posts: 2556 Location: Sydney

Eddie,

Well, I guess this post is about the FTN95 front end.
I have now reviewed 9 of the test files and I shall update the results soon.

The last test example I have reviewed is mdbx.f90. This one has sapped my enthusiasm and I must admit I would prefer to have an optimising compiler to do all the changes this one needs. Dan is right and this is a job for the compiler.

mdbx.f90 is full of lines of lengthy calculations. It is like a finite element program, where the element stiffness matrices are being generated but not solved. The majority of the calculation time involves lengthy formulas. Applying a programming restructure to group repeated calculations would be a dangerous approach for such an extensive number of code lines and should probably not be done.

In this case, the formulas are replicated like they have been defined, and the optimisation, by grouping repeated formula snippets and calculations that can be moved outside the inner loop has not been done by the programmer.
I actually agree with this programming approach, as it documents the theory being applied. I am not sure if this program requires optimum run time, but an optimising compiler would help.

I would expect that ifort's vectorisation, cache utilisation and inner loop smarts are very useful in this case.

The most important result from this program would be the correct answer, which FTN95 would provide.

I shall summarise the other tests in the next post. There are some useful results that identify the coding approaches that needs better attention in FTN95 or can be easily avoided by the programmer.

John

LitusSaxonicum · Posted: Tue Mar 24, 2015 5:42 pm Post subject:

John,

The programmer is in front of the front end! (And sometimes slows everything down like the man with the red flag who was at one time required to walk in front of a steam-powered road vehicle).

I looked at some of the Polyhedron stuff ages ago, decided that I didn't like it and moved on. Perhaps you can tell us if you compile for .NET if that slows things down further, and indeed, if it is a WINAPP, what impact that has. I find these things fascinating in a quasi-theological sense, because for me, I only need so many angels to be able to dance simultaneously on the head of a pin ... usually one, sometimes two, and theologians agree that the limit is higher even if they don't agree what it is.

The compiler I most hated that produced a different answer to the rest is one of the fastest now, but the version I have (but don't use) is lots of versions old, so I haven't named it - FTN77 with DBOS worked straight out of the box and Clearwin+ is without equal. I'm old enough that the phrase 'on different computers' actually means on radically different hardware, e.g. IBM, ICL, CDC, Univac, Burroughs, VAX, Elliott/NCR, Pr1me, PC ...

Eddie