forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Pure, forall, and realkind

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
weaverwb



Joined: 04 Aug 2005
Posts: 37
Location: Monterey

PostPosted: Fri May 26, 2006 3:31 am    Post subject: Pure, forall, and realkind Reply with quote

Hi,

I wrote a test program which called a couple simple subroutines that werre typed as PURE. All three programs dimensioned three real arrays of size(1000,1000). The first subroutine assigned zero to a matrix, computed the cos of another, a matrix multiply, and sqrt(abs( a third)). One matrix, filled with random numbers, was passed in.

The second routine performed a FORALL inside a modest DO loop.

I ran these programs with KIND=1,2, and 3.

A. With single threads off, the FORALL never invoked both processors on either a dual-core machine or a dual processor machine. What do I have to do to have it employ more than one processor?

B. Going from KIND=1 to KIND=2 cost between 5 and 20%, depending on functions (that's good) but going from KIND=2 to KIND=3 cost 50% to 80% (that is not so good). Does this mean that FTN95 is not using the full capabillities of current co-processor hardware? The description claims 80-bit precision for KIND=3 but the on-line documentation claims 64-bit. Which is true?

C. Most peculiar is the timing change when the program is run over a network. The two subroutines run at the same speed (using the timing analysis v.1.0.3) but the calling main program, which only assigns random numbers to one matrix and calls the two subroutines, goes from tenths of a second to 7.5 cpu (?) seconds! It make no significant difference if I assign a fixed number to the elements of the matrix. The timing analysis claims no page faults but the disk is certainly being accessed. I thought I had 100 megabytes before I had to worry about allocation of heaps and such; computer has 2 Gig of memory. Could this be paging? How do I prevent that?

Thanks for any guidance here.

Bruce Weaver
_________________
Bruce+Weaver
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7924
Location: Salford, UK

PostPosted: Fri May 26, 2006 6:26 am    Post subject: Pure, forall, and realkind Reply with quote

Bruce

A. In itself FTN95 only uses one processor. Hence there is no computational advantage in using FORALL. In fact you could easily end up with less efficient code. You may be able to make use of a dual processor via a third party utility (I think this has be mentioned elsewhere on this forum) but only in the sense of managing the FTN95 executable. If the high performance features of Fortran 95 are important to you then you will need to use a different compiler.

B. Under Win32, FTN95 uses 80-bit precision with KIND=3. I do not know why you are finding that it is significantly slower. As I understand it, FTN95 will be using the co-processor.

C. If the disk is being accessed then presumably paging is taking place. As I understand it, all you can do is close down all other tasks and/or add more RAM. The only other option is to redesign your application with this problem in mind.
Back to top
View user's profile Send private message AIM Address
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Sun May 28, 2006 11:14 pm    Post subject: Pure, forall, and realkind Reply with quote

Bruce,

A. I agree with Paul and I avoid FORALL as a DO loop has far more flexibility with no performance benefit in FTN95.

B. I have also had problems with 80-bit precision performance. The problem can also be with memory footprint.

C. What is the disk doing ? What is being transferred through the network, as hopefully paging is not. I avoid network operation as any network I/O can have a big penalty for yourself and everyone else.

In both B & C there is only a 25% increase in memory requirement from R*8 to R*10, so it would be unlikely you passed a significant paging milestone.
Would it be possible to send me your test program as I am at present trying to improve the memory management in my programs to get better performance with 1gb+ memory. At these memory sizes, disk transfers for paging or actively saving information to disk has a significant time penalty. I have a finite element program and I am trying to identify unnecessary disk I/O and remove it. Paging and I/O buffers can be effected by what other programs are runing, and that can be difficult to control and compare between runs.

regards John Campbell
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group