forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Fortran timings

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
Anonymous
Guest





PostPosted: Fri Aug 04, 2006 4:14 am    Post subject: Fortran timings Reply with quote

Hello all,

I recently did some research and testing to get accurate timing information about Fortran code and the execution times (for optimalization purposes, of course).

I used the following routines:
DATE_AND_TIME (standard Fortran-95),
SYSTEM_CLOCK (standard Fortran-95), and
HIGH_RES_CLOCK@ (a Salford extension)

I tested my programs for both Salford's FTN95 and Intel's Visual Fortran Compiler for Windows.

By the way, I am using Windows XP Professional on a Intel-based machine with a Pentium-4 2.8Ghz processor (3 years old), in case this might be of any relevance.

What I discovered, and what puzzles me(!), are the following things.

1. When running the Salford programs the SYSTEM_CLOCK's COUNT-parameter is counting "down" in stead of "up"!. Intel's companion however is nicely counting "up".
I would like to have an explanation for this strange behaviour, if possible, but when you are aware of this it's easy to tackle the problem: switch your start- and stop-values, or even better, take ABS(stop - start) to get your execution times. No man overboard.

2. The SYSTEM_CLOCK's COUNT_RATE-parameter displays, in both compiler versions the value 10000 (10000 tics per second?).
This seems ok at first sight but...

3. When I calculated the elapsed times, based on the COUNT and COUNT_RATE values, I found that in the Salford version the elapsed time was about 1.8703 as high as it should be! In other words, COUNT_RATE was not 10000, it is (or seems to be) about 18703. Again, Intel was doing just fine here.

4. The result above was also in perfect consistency with the HIGH_RES_CLOCK@ results. This function (only available in FTN95) gives indeed results to the microsecond.
BUT! When calculating elapsed times by subtracting start from stop times the results are AGAIN a factor 1.8703 too high (approx).

5. And finally, the results of the HIGH_RES_CLOCK@ function are always negative, and again, counting down (or up in absolute value).



Surely there must be a relation between the HIGH_RES_CLOCK@ and the SYSTEM_CLOCK in Salford's FTN95.
Both are giving me the wrong answers (but consistently wrong, that's the advantage).

Can somebody please explain these results to me, the reason why, or can somebody please do some experiments on his/her machines?

I would be very grateful!

Many thanks in advance!
Lucas.

PS: It's a shame really because my Salford programs are running about 10 to 15 percent faster than Intel's (on an Intel machine by the way)!!
That's why I'm worried.
Back to top
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7927
Location: Salford, UK

PostPosted: Fri Aug 04, 2006 5:38 am    Post subject: Fortran timings Reply with quote

Lucas

There are a couple of existing threads on this forum relating to this problem.
Basically it is a known issue that we need to address.
Please note that it may be necessary to use INTEGER*8 values for the arguments of SYSTEM_CLOCK.
Back to top
View user's profile Send private message AIM Address
Anonymous
Guest





PostPosted: Fri Aug 04, 2006 6:10 am    Post subject: Fortran timings Reply with quote

Hello Paul,

Thanks for looking into my thread!
I have just changed the SYSTEM_CLOCK arguments to INTEGER*8.
However, the test results are exactly the same, except for the fact that COUNT_MAX is now (correctly) displayed as 2^63-1 in stead of 2^31-1.

But the main frustrating "error factor" of about 1.8703 remains.
Is this factor machine or even OS dependent?

At least I'm pleased to hear that the results are a known issue to Salford.

Kind regards, Lucas.
Back to top
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Wed Aug 09, 2006 9:24 pm    Post subject: Fortran timings Reply with quote

Lucas,

I have done a lot of testing of CPU and elapsed time timing routines in the past. I have a program which provides summaries for various routines for the particular machine ( intel processor ) being used.
There are differences between P3, P4 & P4m which can affect the relative performance of the timing routines.

SYSTEM_CLOCK has an error for recent fast processors, where the count_rate = the processor rate and overflows an I*4, so it counts backwards. You can work out a -ve conversion factor to calibrate it and still use it. I thing the count_rate comes from the Microsoft's winapi routine.

Basically each timing routine has an accuracy and a calling overhead. A lot of routines are accuarte to 0.015 seconds.
You need to consider these issues and convenience when choosing a routine.
Some record elapsed and some processor time ( which can sometimes exclude I/O time, but not always ) There are significant differences between desktop and notebook performance. What you get for CPU process time can vary a lot between different processors.

The fortran standard routines also vary between fortran compilers.

The text below are recent summary of results.
The first machine (Aug04) did not have the system_clock problem, but all my present pc's (Aug06) do.
Both high_res_clock@ and System_clock don't work on the latest test.

I hope these results are of use.

Regards John Campbell

! Results of timing call tests using c:genaudittiming.f95
!
! ELAPSED TIME CALLS
! fastest gettickcount winapi
! accurate QueryPerform winapi
!
! CPU TIME CALLS
! fastest cpu_clock@ salford
! accurate cpu_clock@ salford
!
! P4 2553 mhz using ftn95 ver 4.50
! Run on Tuesday, 24 August 2004 at 09:50:06
! cpu_clock : MHz = 2.535E+09 ( 1118676 cycles)
! Run on Tuesday, 24 August 2004 at 09:50:35
!
! # indicates routine time: only call time is uniformly timed
!
! Call Description Total time call time unique min dif max dif accuracy
! # seconds mu.sec number # mu.sec # mu.sec # mu.sec
! call gettickcount 28.360 0.142 2 16000.00 16000.00 16000.00 api GetTickCount WINAPI
! call dclock@ 28.360 0.482 6 15000.00 16000.00 15666.67 sal elapsed time
! call getlocaltime 28.360 0.527 7 15000.00 16000.00 15714.29 api GetLocalTime WINAPI
! call QueryPerform 28.371 1.593 199999 1.40 411.23 1.59 api is the fastest and most accurate
! call system_clock 28.371 3.176 6332 100.00 600.00 100.38 f95 a real-time clock
! date_and_time 28.360 1.738 22 15000.00 16000.00 15636.36 f95 real-time date and clock time
! seconds_since_1980@ 29.000 0.476 1 1000000.00 1000000.00 1000000.00 sal
!
! cpu_clock@ () 28.355 0.096 199999 0.09 203.05 0.10 sal not reliable on NT + ??
! high_res_clock@ 28.371 2.948 199999 2.79 383.01 2.95 sal CPU time accurate to 1 microsecond
! GetProcessTimes 28.297 0.762 10 15625.00 15625.00 15625.00 api GetProcessTimes WINAPI
! call cpu_time 28.297 0.909 11 15625.00 15625.00 15625.00 f95 To get the processor time.
!
! dummy call 28.227 0.019
!
! Total time the time obtained from this routine for the full test run using timing.f95
! call time the estimate for the time of each call ( using cpu_clock@ )
! number unique the number of calls that returned a different time out of 100,000 calls
! min dif the smallest difference in the time estimate from sucessive call, ignoring same
! max dif the maximum difference in the call value
! accuracy the average of the different calls


Run TODAY
desktop P4 3.2 ghz
Run on Thursday, 10 August 2006 at 13:12:4
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Wed Aug 09, 2006 9:29 pm    Post subject: Fortran timings Reply with quote

Apologies for the poor layout of my previous email.
Also, all tests were done on a win32 environment.
I have no idea of how they might work in a .net environment.
Back to top
View user's profile Send private message
weaverwb



Joined: 04 Aug 2005
Posts: 37
Location: Monterey

PostPosted: Sat Aug 12, 2006 4:44 pm    Post subject: Fortran timings Reply with quote

We recently did some timing tests of a very compute intensive program that does radiative transfers in turbulent NLTE plasmas (stellar atmospheres). This is a program that runs for hours to days. We were somewhat surprised to find that FTN95 was, at first, faster than a popular compiler benchmarked by Polyhedron as much faster than ftn95. After a lot of tweaking, we got the other compiler to generate somewhat faster code but we have not yet returned to ftn95 to do the same. Some smaller quick and dirty tests revealed similar results. It raises questions of compiler writers being a bit too aware of standard benchmarks.

As usual, it is a bit hard to predict how any code will do with a given compiler. Also perhaps the folks at Silverfrost should not write off number crunching applications quite so quickly. What I'm lobbying for here is more attention to MMX, 2DNow!, SSE - SSE3, etc. I think they are only a few short steps away from being recognized as competitive compiler for number crunching.



Bruce Weaver
_________________
Bruce+Weaver
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Mon Aug 14, 2006 12:37 am    Post subject: Fortran timings Reply with quote

Bruce,

I'm not sure what you mean by "MMX, 2DNow!, SSE - SSE3, etc.", but if they are part of the P4 instruction set, I would certainly add my vote. It would be good if some addition of P4 instructions to the array operations were available, although given the complexity of cache performance, it is difficult to identify where improvements could best achieve performance improvements.

My limited testing of the processor based compiler options did not show any significant improvement on my P4. I have not tried them for a couple of years. /Pentium and /P6 should be superseded with /P4 and /P4m. Does anyone know if the existing /Pentium4 option do anything substantial ?

Are there any areas where the P4 instruction set could be more easily utilised in FTN95?



John Campbell
Back to top
View user's profile Send private message
weaverwb



Joined: 04 Aug 2005
Posts: 37
Location: Monterey

PostPosted: Mon Aug 14, 2006 2:33 am    Post subject: Fortran timings Reply with quote

Hi,

They are FPU modes for hardware math for extended precision that replace/extend the x87 set. My understanding is that ftn95 only uses the x87 set which, I think, is becoming depreciated in the latest chips. It is my understanding that some chips are even peeling off some of the 80 bits to provide functionality for other uses. These modes I mentioned are supposed to be faster and permit higher precision. Some are designed to take advantage of hardware pipelining, which would presumably significantly improve array operation speed. I'm not sure at what point they start appearing in chips but my Athlon & Xeon CPUs support them as does my Prescott, which is a P4 chip.

Most of my code uses at least double precision (it turns out that 32 bits is a really bad choice for physics calculations...it would have been much better if the 36 bit model had survived instead) and I do alot of array calculations as well.



Bruce Weaver
_________________
Bruce+Weaver
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group