forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Calculating the outer vector product
Goto page Previous  1, 2, 3
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Tue Jul 03, 2018 11:00 am    Post subject: Reply with quote

Klaus

There has been no attempt to "fix" FTN95 with respect the initial program that you posted on this thread.

FTN95 does not process the following line as you would like...

Code:
c = spread(a,dim=2,ncopies=n) * spread(b,dim=1,ncopies=m )


If you want to multiply two matrices together then FTN95 expects you to call MATMUL.

Code:
c = MATMUL(spread(a,dim=2,ncopies=n), spread(b,dim=1,ncopies=m))
Back to top
View user's profile Send private message
KL



Joined: 16 Nov 2009
Posts: 124

PostPosted: Tue Jul 03, 2018 12:15 pm    Post subject: Reply with quote

Thank you very much, Paul. I misunderstood what was meant by "the problem had been fixed".

Your proposal works well for m = n (if divided by m). But it is not the fastest method: the method mentioned also in this thread (to eliminate the inner do loop) is faster by a factor of 2-3. However, for m /= n the two array shapes are non-conformant.

I have rerun the case with Code::Blocks and the GNU compiler. With this compiler, the "spread solution" is by far the fastest method. No idea why, but obviously both intrinsic spread functions (ftn95/GNU) differ in their conception. As mentioned earlier, I have no insight to get any further.

Klaus
Back to top
View user's profile Send private message
John-Silver



Joined: 30 Jul 2013
Posts: 858

PostPosted: Wed Jul 04, 2018 12:10 pm    Post subject: Reply with quote

the ftn.enh file would confirm that it was SPREAD which was fixed but it isn't included in this beta279 release, only the clrwin.enh .
Maybe in future it would be good to include all relevant .enh's
_________________
"This is the triumph of folly.
The machine, which knows no rest, ought to remain a tool,
... but instead becomes our master and will swallow up our life and soul"
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Wed Jul 04, 2018 1:09 pm    Post subject: Reply with quote

There is/was no evidence of a fault in SPREAD.
So SPREAD has not been fixed.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1968
Location: Sydney

PostPosted: Thu Jul 05, 2018 4:51 am    Post subject: Reply with quote

Paul,

Run mecej4's test program using FTN95 and gFortran. The following adaptation gives an indication that gFortran is running !
I ran it in PLATO selecting Release Win32 then Release x64 : Tools>Options>"Use gFortran/gcc for x64"
Code:
program v2m
 ! construct matrix A(n,n) with elements A(i,j) = i for i = 1:n, j=1:n
 implicit none
 integer :: i,k,n
 real, allocatable :: v(:),A(:,:)
 integer*8 :: t1,t2,rate
 real*4    :: sec
 !
    n = 25
    do k=1,4
       allocate(v(n),A(n,n))
       v = (/ (i, i=1,n) /)
       call system_clock (t1,rate)
       A = spread(v, DIM=2, NCOPIES = n)
       call system_clock (t2,rate)
       deallocate(v,A)
       sec = real(t2-t1)/real(rate) ; write (*,*) sec
       write(*,'(I4,2x,F8.4,1x,A)')n,sec,'s'
       n=n*2
    end do
!
 end program

There may not be a fault, but there is definitely a performance problem.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Thu Jul 05, 2018 6:30 am    Post subject: Reply with quote

John

There was a fault in FTN95 relating to the way it called SPREAD in certain contexts and this was demonstrated in Mecej4's sample program. This has been fixed in the latest beta download. If there is still a performance hit compared with gFortran then it should be negligible.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 933

PostPosted: Thu Jul 05, 2018 12:38 pm    Post subject: Re: Reply with quote

PaulLaidler wrote:
John

There was a fault in FTN95 relating to the way it called SPREAD in certain contexts and this was demonstrated in Mecej4's sample program. This has been fixed in the latest beta download. If there is still a performance hit compared with gFortran then it should be negligible.

Paul, I agree that there is no justification for blaming SPREAD itself. The problem is that FTN95 compiles some Fortran expressions containing SPREAD in such a way that the resulting program is extremely inefficient and slow, because a large number of calls to SPREAD are made where just a single call would suffice.

Let us note at the outset that no matrix multiplication is involved in the following (or in the example codes that were posted earlier). My v2m example was constructed to show the existence of the inefficiency in the simplest way that I could think of. The v8.30.279 beta release fixes that.

Unfortunately, the inefficiency is still a major problem if the expression containing SPREAD involves anything beyond a simple reference to SPREAD. Klaus and John C. provided example codes where the expression was the product of two references to SPREAD. Below I give timings from an adaptation of John's example with that expression split into two statements. In place of
Code:
c = spread ( a,dim=2,ncopies=n ) * spread ( b,dim=1,ncopies=m )
write
Code:
c = spread ( a,dim=2,ncopies=n )
c = c * spread ( b,dim=1,ncopies=m ) ! This is NOT a matrix multiply operation

Examination of the generated code using /64 /opt /explist shows the problem clearly:

    (i) Calculation of each element of the result matrix C involves the MULSD instruction, which occurs only once in the listing;
    (ii) The MULSD is in a loop, and a call to SPREAD is located in the same loop. As a result, calculating the final result C involves making m X n calls to SPREAD, which is extremely expensive.
    (iii) A single call to SPREAD should suffice for the second Fortran statement above, since C has already been allocated and initialised in the Fortran statement preceding it.

Here are the timing results (2.1 GHz Intel T4300, Windows 10 X64, FTN95 8.30.279)
Code:
System_clock rate =  10000

   n    time (s)

  10    0.004900
  50    0.047500
 100    0.697100
 200   11.048800

And, from Gfortran 7.3 with -O2 :
Code:
System_clock rate =  1000000000

   n    time (s)

  10    0.001192
  50    0.000013
 100    0.000057
 200    0.000585
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5437
Location: Salford, UK

PostPosted: Thu Jul 05, 2018 5:58 pm    Post subject: Reply with quote

mecej4

My understanding of this issue differs from yours.
I will take another look at it when I get a moment.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group