Topic: Pure, forall, and realkind in Support

weaverwb

Posts: 30 Monterey

Back to Top

26 May 2006 2:31 #690

Hi,

I wrote a test program which called a couple simple subroutines that werre typed as PURE. All three programs dimensioned three real arrays of size(1000,1000). The first subroutine assigned zero to a matrix, computed the cos of another, a matrix multiply, and sqrt(abs( a third)). One matrix, filled with random numbers, was passed in.

The second routine performed a FORALL inside a modest DO loop.

I ran these programs with KIND=1,2, and 3.

With single threads off, the FORALL never invoked both processors on either a dual-core machine or a dual processor machine. What do I have to do to have it employ more than one processor?
Going from KIND=1 to KIND=2 cost between 5 and 20%, depending on functions (that's good) but going from KIND=2 to KIND=3 cost 50% to 80% (that is not so good). Does this mean that FTN95 is not using the full capabillities of current co-processor hardware? The description claims 80-bit precision for KIND=3 but the on-line documentation claims 64-bit. Which is true?
Most peculiar is the timing change when the program is run over a network. The two subroutines run at the same speed (using the timing analysis v.1.0.3) but the calling main program, which only assigns random numbers to one matrix and calls the two subroutines, goes from tenths of a second to 7.5 cpu (?) seconds! It make no significant difference if I assign a fixed number to the elements of the matrix. The timing analysis claims no page faults but the disk is certainly being accessed. I thought I had 100 megabytes before I had to worry about allocation of heaps and such; computer has 2 Gig of memory. Could this be paging? How do I prevent that?

Thanks for any guidance here.

Bruce Weaver

PaulLaidler

Posts: 7974 Salford, UK

Back to Top

26 May 2006 5:26 #691

Bruce

In itself FTN95 only uses one processor. Hence there is no computational advantage in using FORALL. In fact you could easily end up with less efficient code. You may be able to make use of a dual processor via a third party utility (I think this has be mentioned elsewhere on this forum) but only in the sense of managing the FTN95 executable. If the high performance features of Fortran 95 are important to you then you will need to use a different compiler.
Under Win32, FTN95 uses 80-bit precision with KIND=3. I do not know why you are finding that it is significantly slower. As I understand it, FTN95 will be using the co-processor.
If the disk is being accessed then presumably paging is taking place. As I understand it, all you can do is close down all other tasks and/or add more RAM. The only other option is to redesign your application with this problem in mind.

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 May 2006 10:14 #694

Bruce,

I agree with Paul and I avoid FORALL as a DO loop has far more flexibility with no performance benefit in FTN95.
I have also had problems with 80-bit precision performance. The problem can also be with memory footprint.
What is the disk doing ? What is being transferred through the network, as hopefully paging is not. I avoid network operation as any network I/O can have a big penalty for yourself and everyone else.

In both B & C there is only a 25% increase in memory requirement from R8 to R10, so it would be unlikely you passed a significant paging milestone. Would it be possible to send me your test program as I am at present trying to improve the memory management in my programs to get better performance with 1gb+ memory. At these memory sizes, disk transfers for paging or actively saving information to disk has a significant time penalty. I have a finite element program and I am trying to identify unnecessary disk I/O and remove it. Paging and I/O buffers can be effected by what other programs are runing, and that can be difficult to control and compare between runs.

regards John Campbell