Hi,
I wrote a test program which called a couple simple subroutines that werre typed as PURE. All three programs dimensioned three real arrays of size(1000,1000). The first subroutine assigned zero to a matrix, computed the cos of another, a matrix multiply, and sqrt(abs( a third)). One matrix, filled with random numbers, was passed in.
The second routine performed a FORALL inside a modest DO loop.
I ran these programs with KIND=1,2, and 3.
With single threads off, the FORALL never invoked both processors on either a dual-core machine or a dual processor machine. What do I have to do to have it employ more than one processor?
Going from KIND=1 to KIND=2 cost between 5 and 20%, depending on functions (that's good) but going from KIND=2 to KIND=3 cost 50% to 80% (that is not so good). Does this mean that FTN95 is not using the full capabillities of current co-processor hardware? The description claims 80-bit precision for KIND=3 but the on-line documentation claims 64-bit. Which is true?
Most peculiar is the timing change when the program is run over a network. The two subroutines run at the same speed (using the timing analysis v.1.0.3) but the calling main program, which only assigns random numbers to one matrix and calls the two subroutines, goes from tenths of a second to 7.5 cpu (?) seconds! It make no significant difference if I assign a fixed number to the elements of the matrix. The timing analysis claims no page faults but the disk is certainly being accessed. I thought I had 100 megabytes before I had to worry about allocation of heaps and such; computer has 2 Gig of memory. Could this be paging? How do I prevent that?
Thanks for any guidance here.
Bruce Weaver