
forums.silverfrost.com Welcome to the Silverfrost forums

View previous topic :: View next topic 
Author 
Message 
mecej4
Joined: 31 Oct 2006 Posts: 1072

Posted: Fri Jun 22, 2018 12:23 pm Post subject: Poor performance of SPREAD function 


Here is a simplified version of John Campbell's test program, with only the SPREAD based version of the outer product computation being included. I hope that this version succeeds in convincing readers that the FTN95 implementation of SPREAD needs improvement.
Code: 
MODULE DataTypes
INTEGER , PARAMETER :: I4B = SELECTED_INT_KIND (9)
INTEGER , PARAMETER :: I2B = SELECTED_INT_KIND (4)
INTEGER , PARAMETER :: I1B = SELECTED_INT_KIND (2)
INTEGER , PARAMETER :: SP = KIND (1.0)
INTEGER , PARAMETER :: dp = KIND (1.0D0)
INTEGER , PARAMETER :: LGT = KIND (.true.)
END MODULE DataTypes
program test
call Test_OuterProduct ( 10, 10)
call Test_OuterProduct ( 50, 50)
call Test_OuterProduct ( 100,100)
call Test_OuterProduct ( 200,200)
end program
subroutine Test_OuterProduct (m,n)
Use DataTypes
Implicit None
Integer :: m,n
Real (dp) , allocatable, Dimension (:) :: a (:), b(:), c(:,:)
Integer :: seed = 123
Real (dp) :: time1, time2
allocate ( a(m), b(n), c(m,n) )
!
call random_seed (seed)
call random_number (a)
call random_number (b)
Call elapse_Time ( time1 )
c = spread ( a,dim=2,ncopies=n ) * spread ( b,dim=1,ncopies=m )
Call elapse_Time ( time2 )
write (*,11) n,time2time1
11 format ( I4,f12.6)
End subroutine Test_OuterProduct
Subroutine elapse_time ( sec )
real*8 sec
integer*8 tick, rate
integer*4 :: kk = 0
call system_clock ( tick, rate )
if (kk == 0 ) then
write (*,'(A,T15,I12,//,A,/)') 'System_clock rate =',rate,' n time (s)'
kk = rate
end if
sec = dble(tick)/dble(rate)
end subroutine elapse_time 
Here are the outputs from FTN95 8.30 with /opt /64:
Code:  System_clock rate = 10000
n time (s)
10 0.004300
50 0.091000
100 1.417600
200 25.589800 
and Gfortran 7.2, 64 bit:
Code:  System_clock r 3579545
n time (s)
10 0.006306
50 0.000151
100 0.000203
200 0.001016 
The results were from running the program on a desktop PC with an AMD Athlon 644200+, running Windows 1064.
For n = 200, the run time of the EXE generated by FTN95 is ~25,000 times that of the EXE generated by Gfortran.
Last edited by mecej4 on Fri Jun 22, 2018 3:17 pm; edited 1 time in total 

Back to top 


JohnCampbell
Joined: 16 Feb 2006 Posts: 2053 Location: Sydney

Posted: Fri Jun 22, 2018 12:42 pm Post subject: 


Paul,
Your suggestion of FTN95 may be treating SPREAD as "elemental" is certainly consistent with the timing performance. I hope it can be fixed.
Imagine if Dot_Product had the same problem; imagine the improvement we could get.
John 

Back to top 


PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 5759 Location: Salford, UK

Posted: Fri Jun 22, 2018 7:05 pm Post subject: 


I may be wrong but as far as I know the operator '*' is not overloaded to mean matrix multiplication in this context. If matrix multiplication is intended then FTN95 requires a call to MATMUL. This then results in two calls to SPREAD. Otherwise FTN95 provides a vast number of calls to SPREAD presumably doing the whole calculation repeatedly for each element in turn.
There is no implied problem with SPREAD but FTN95 is not doing what the programmer intended. 

Back to top 


mecej4
Joined: 31 Oct 2006 Posts: 1072

Posted: Sat Jun 23, 2018 1:28 am Post subject: 


The Fortran 95 standard documents the properties of the arguments and result value of intrinsic procedures in section 13.14. For each such procedure, the second attribute listed is "Class". For functions such as ABS, SQRT, etc., the listed class is "Elemental Function". For SPREAD, MATMUL, etc., the listed class is "Transformational Function".
From this I conclude that SPREAD is not an elemental function and I think that the implementation of SPREAD in FTN95 should not be forced to be elemental, given the severe performance penalty that has been documented in this thread.
Unless use cases can be presented for which the "Elemental" attribute is beneficial, that attribute should, perhaps, not be added by a particular compiler vendor. 

Back to top 


PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 5759 Location: Salford, UK

Posted: Sat Jun 23, 2018 7:19 am Post subject: 


I understand that SPREAD is not elemental and that FTN95 is not doing what the programmer intended. At the moment I am not clear whether FTN95 can do anything to correct this. It may not even be able to provide an error or warning.
I will aim to look at it when I get a moment. 

Back to top 


mecej4
Joined: 31 Oct 2006 Posts: 1072

Posted: Sun Jun 24, 2018 12:16 pm Post subject: Re: 


PaulLaidler wrote:  I understand that SPREAD is not elemental and that FTN95 is not doing what the programmer intended. At the moment I am not clear whether FTN95 can do anything to correct this. It may not even be able to provide an error or warning. 
Paul, I hope that this problem can be addressed because, otherwise, I regret to say, SPREAD is unusable in FTN95. To see this, please consider the following test program, which contains nothing more than the most basic use of SPREAD  to construct a square matrix each of whose columns contains a copy of a given vector.
Code:  program v2m
! construct matrix A(n,n) with elements A(i,j) = i for i = 1:n, j=1:n
implicit none
integer :: i,k,n
real, allocatable :: v(:),A(:,:)
real :: t1,t2
!
n=50
do k=1,3
allocate(v(n),A(n,n))
v = (/ (i, i=1,n) /)
call cpu_time(t1)
A = spread(v, DIM=2, NCOPIES = n)
call cpu_time(t2)
deallocate(v,A)
write(*,'(I4,2x,F6.2,1x,A)')n,t2t1,'s'
n=n*2
end do
end program 
For n = 200, this code took 63 seconds with 32bit FTN95 and 17 seconds with 64bit FTN95, both compiled with /opt specified. 

Back to top 


LitusSaxonicum
Joined: 23 Aug 2005 Posts: 1945 Location: Yateley, Hants, UK

Posted: Sun Jun 24, 2018 3:19 pm Post subject: 


I never even heard of the SPREAD function before reading this thread, so I have learnt something. As it suffers the same(ish) fault in the 32bit compiler version, I wonder just how widely it is used.
While I am an enthusiast for things working as they should (and in this context, standardconforming must also include working efficiently as well as properly), in practical terms it is a matter of supreme indifference to me, as someone who is rather oldfashioned in programming, whether it works or not. I suspect that I am not entirely alone in this view.
I wondered what people did before such functions were part of Fortran.
I therefore had a go at writing a routine to do pretty much what mecej4 described with his code example. Here it is:
Code:  PROGRAM V2M_GERIATRIC
DIMENSION A(500,500)
DO 30 K = 50, 500, 50
CALL CPU_TIME (T1)
DO 20 J = 1,K
DO 10 I = 1,K
A(I,J) = I
10 CONTINUE
20 CONTINUE
CALL CPU_TIME (T2)
WRITE(*,'(I4,F15.8,1X,A)') K, T2T1, 's'
30 CONTINUE
END 
Rather interestingly, the whole thing takes, as closely as I can judge, 6 seconds, which is the length of the nag screen introduced into the PE when one uses /LGO in version 8.30! (32 bit only).
I am reminded of the opening line from L. P. Hartley's 'The Go Between': "The past is a foreign country, they do things differently there."
Emigrating to the present is not always an advance.
The book is a good read too ...
Eddie 

Back to top 


JohnSilver
Joined: 30 Jul 2013 Posts: 1035 Location: Aerospace Valley

Posted: Mon Jun 25, 2018 6:26 am Post subject: 


Can I cast some doubt on the timings for gfortran aas given at bottom of mecej4's 22 jun comment at the top of this, p. 2 of the post .....
I note that for N=10, ftn95 is 75% of the time of gfortran. Faster !!!!
Now, between N=10 and N=50, gfortran achieves the impossible ..... it REDUCES the time by a factor of 40 from 0.00606 to 0.000151
How can this be so ??? _________________ ''Computers are incredibly rigid. They question nothing. Especialy input data.Human beings are incredibly trusting of computers and don't check input data. Together they are capable of cocking up even the simplest calculation ... " 

Back to top 


PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 5759 Location: Salford, UK

Posted: Mon Jun 25, 2018 6:53 am Post subject: 


Yes there is certainly a problem in FTN95 when calling SPREAD. It works correctly when used within MATMUL but Mecj4's program demonstrates a bug that has only just come to light after more than 20 years.
I have noted that this needs fixing. 

Back to top 


mecej4
Joined: 31 Oct 2006 Posts: 1072

Posted: Mon Jun 25, 2018 9:09 am Post subject: Re: 


JohnSilver wrote:  Can I cast some doubt on the timings for gfortran aas given at bottom of mecej4's 22 jun comment at the top of this, p. 2 of the post .....
I note that for N=10, ftn95 is 75% of the time of gfortran. Faster !!!!
Now, between N=10 and N=50, gfortran achieves the impossible ..... it REDUCES the time by a factor of 40 from 0.00606 to 0.000151
How can this be so ??? 
I did notice the inconsistent timings, but investigating them struck me as something that would be definitely offtopic here. Were there a forum for Gfortran, other than Usenet:comp.lang.fortran, this subproblem could go there. 

Back to top 


JohnCampbell
Joined: 16 Feb 2006 Posts: 2053 Location: Sydney

Posted: Mon Jun 25, 2018 10:32 am Post subject: 


I think that it has taken 20 years to identify the problem with SPREAD indicates how little this function is used. I would prefer to write the code using a DO loop approach, rather than check the documentation for SPREAD, probably each time I reviewed the code.
It is good to identify the problem in case it has a more general effect (assumption for what are elemental functions ?)
I am not suggesting it is a high priority fix, although we all have different priorities. 

Back to top 


PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 5759 Location: Salford, UK

Posted: Mon Jun 25, 2018 12:22 pm Post subject: 


This bug has now been fixed for the next release of FTN95. It turns out that FTN95 already works correctly for fixed size arrays but not (as here) with ALLOCATE.
With this fix elapsed times are no longer significant. 

Back to top 


PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 5759 Location: Salford, UK


Back to top 


mecej4
Joined: 31 Oct 2006 Posts: 1072

Posted: Sun Jul 01, 2018 8:20 am Post subject: 


The updated FTN95 8.3 EXE and DLLs in the Dropbox download fix the SPREAD problem that was exhibited with my test program v2m (see above).
Thanks to Silverfrost and Paul for the fast response and effective remedy.
I hope that Klaus L. and John Campbell will be able to confirm that the same fixes also work for their test programs as well as Klaus's real application in which matrix outer products are computed.
Last edited by mecej4 on Tue Jul 03, 2018 1:11 pm; edited 1 time in total 

Back to top 


KL
Joined: 16 Nov 2009 Posts: 141

Posted: Tue Jul 03, 2018 9:19 am Post subject: 


I have copied the files of the beta version into their corresponding directories, but I see no improvement. Neither with Visual Studio 2015 nor with Plato. I will further look for a potential error on my side but the updated files are on their correct place. Could someone who stated that the problem has been fixed just run my little program? That would help me a lot.
Many thanks, Klaus 

Back to top 




You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum

Powered by phpBB © 2001, 2005 phpBB Group
