|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
mecej4
Joined: 31 Oct 2006 Posts: 1895
|
Posted: Fri Jun 22, 2018 12:23 pm Post subject: Poor performance of SPREAD function |
|
|
Here is a simplified version of John Campbell's test program, with only the SPREAD based version of the outer product computation being included. I hope that this version succeeds in convincing readers that the FTN95 implementation of SPREAD needs improvement.
Code: |
MODULE DataTypes
INTEGER , PARAMETER :: I4B = SELECTED_INT_KIND (9)
INTEGER , PARAMETER :: I2B = SELECTED_INT_KIND (4)
INTEGER , PARAMETER :: I1B = SELECTED_INT_KIND (2)
INTEGER , PARAMETER :: SP = KIND (1.0)
INTEGER , PARAMETER :: dp = KIND (1.0D0)
INTEGER , PARAMETER :: LGT = KIND (.true.)
END MODULE DataTypes
program test
call Test_OuterProduct ( 10, 10)
call Test_OuterProduct ( 50, 50)
call Test_OuterProduct ( 100,100)
call Test_OuterProduct ( 200,200)
end program
subroutine Test_OuterProduct (m,n)
Use DataTypes
Implicit None
Integer :: m,n
Real (dp) , allocatable, Dimension (:) :: a (:), b(:), c(:,:)
Integer :: seed = 123
Real (dp) :: time1, time2
allocate ( a(m), b(n), c(m,n) )
!
call random_seed (seed)
call random_number (a)
call random_number (b)
Call elapse_Time ( time1 )
c = spread ( a,dim=2,ncopies=n ) * spread ( b,dim=1,ncopies=m )
Call elapse_Time ( time2 )
write (*,11) n,time2-time1
11 format ( I4,f12.6)
End subroutine Test_OuterProduct
Subroutine elapse_time ( sec )
real*8 sec
integer*8 tick, rate
integer*4 :: kk = 0
call system_clock ( tick, rate )
if (kk == 0 ) then
write (*,'(A,T15,I12,//,A,/)') 'System_clock rate =',rate,' n time (s)'
kk = rate
end if
sec = dble(tick)/dble(rate)
end subroutine elapse_time |
Here are the outputs from FTN95 8.30 with /opt /64:
Code: | System_clock rate = 10000
n time (s)
10 0.004300
50 0.091000
100 1.417600
200 25.589800 |
and Gfortran 7.2, 64 bit:
Code: | System_clock r 3579545
n time (s)
10 0.006306
50 0.000151
100 0.000203
200 0.001016 |
The results were from running the program on a desktop PC with an AMD Athlon 64-4200+, running Windows 10-64.
For n = 200, the run time of the EXE generated by FTN95 is ~25,000 times that of the EXE generated by Gfortran.
Last edited by mecej4 on Fri Jun 22, 2018 3:17 pm; edited 1 time in total |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Fri Jun 22, 2018 12:42 pm Post subject: |
|
|
Paul,
Your suggestion of FTN95 may be treating SPREAD as "elemental" is certainly consistent with the timing performance. I hope it can be fixed.
Imagine if Dot_Product had the same problem; imagine the improvement we could get.
John |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7938 Location: Salford, UK
|
Posted: Fri Jun 22, 2018 7:05 pm Post subject: |
|
|
I may be wrong but as far as I know the operator '*' is not overloaded to mean matrix multiplication in this context. If matrix multiplication is intended then FTN95 requires a call to MATMUL. This then results in two calls to SPREAD. Otherwise FTN95 provides a vast number of calls to SPREAD presumably doing the whole calculation repeatedly for each element in turn.
There is no implied problem with SPREAD but FTN95 is not doing what the programmer intended. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1895
|
Posted: Sat Jun 23, 2018 1:28 am Post subject: |
|
|
The Fortran 95 standard documents the properties of the arguments and result value of intrinsic procedures in section 13.14. For each such procedure, the second attribute listed is "Class". For functions such as ABS, SQRT, etc., the listed class is "Elemental Function". For SPREAD, MATMUL, etc., the listed class is "Transformational Function".
From this I conclude that SPREAD is not an elemental function and I think that the implementation of SPREAD in FTN95 should not be forced to be elemental, given the severe performance penalty that has been documented in this thread.
Unless use cases can be presented for which the "Elemental" attribute is beneficial, that attribute should, perhaps, not be added by a particular compiler vendor. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7938 Location: Salford, UK
|
Posted: Sat Jun 23, 2018 7:19 am Post subject: |
|
|
I understand that SPREAD is not elemental and that FTN95 is not doing what the programmer intended. At the moment I am not clear whether FTN95 can do anything to correct this. It may not even be able to provide an error or warning.
I will aim to look at it when I get a moment. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1895
|
Posted: Sun Jun 24, 2018 12:16 pm Post subject: Re: |
|
|
PaulLaidler wrote: | I understand that SPREAD is not elemental and that FTN95 is not doing what the programmer intended. At the moment I am not clear whether FTN95 can do anything to correct this. It may not even be able to provide an error or warning. |
Paul, I hope that this problem can be addressed because, otherwise, I regret to say, SPREAD is unusable in FTN95. To see this, please consider the following test program, which contains nothing more than the most basic use of SPREAD -- to construct a square matrix each of whose columns contains a copy of a given vector.
Code: | program v2m
! construct matrix A(n,n) with elements A(i,j) = i for i = 1:n, j=1:n
implicit none
integer :: i,k,n
real, allocatable :: v(:),A(:,:)
real :: t1,t2
!
n=50
do k=1,3
allocate(v(n),A(n,n))
v = (/ (i, i=1,n) /)
call cpu_time(t1)
A = spread(v, DIM=2, NCOPIES = n)
call cpu_time(t2)
deallocate(v,A)
write(*,'(I4,2x,F6.2,1x,A)')n,t2-t1,'s'
n=n*2
end do
end program |
For n = 200, this code took 63 seconds with 32-bit FTN95 and 17 seconds with 64-bit FTN95, both compiled with /opt specified. |
|
Back to top |
|
|
LitusSaxonicum
Joined: 23 Aug 2005 Posts: 2391 Location: Yateley, Hants, UK
|
Posted: Sun Jun 24, 2018 3:19 pm Post subject: |
|
|
I never even heard of the SPREAD function before reading this thread, so I have learnt something. As it suffers the same(-ish) fault in the 32-bit compiler version, I wonder just how widely it is used.
While I am an enthusiast for things working as they should (and in this context, standard-conforming must also include working efficiently as well as properly), in practical terms it is a matter of supreme indifference to me, as someone who is rather old-fashioned in programming, whether it works or not. I suspect that I am not entirely alone in this view.
I wondered what people did before such functions were part of Fortran.
I therefore had a go at writing a routine to do pretty much what mecej4 described with his code example. Here it is:
Code: | PROGRAM V2M_GERIATRIC
DIMENSION A(500,500)
DO 30 K = 50, 500, 50
CALL CPU_TIME (T1)
DO 20 J = 1,K
DO 10 I = 1,K
A(I,J) = I
10 CONTINUE
20 CONTINUE
CALL CPU_TIME (T2)
WRITE(*,'(I4,F15.8,1X,A)') K, T2-T1, 's'
30 CONTINUE
END |
Rather interestingly, the whole thing takes, as closely as I can judge, 6 seconds, which is the length of the nag screen introduced into the PE when one uses /LGO in version 8.30! (32 bit only).
I am reminded of the opening line from L. P. Hartley's 'The Go Between': "The past is a foreign country, they do things differently there."
Emigrating to the present is not always an advance.
The book is a good read too ...
Eddie |
|
Back to top |
|
|
John-Silver
Joined: 30 Jul 2013 Posts: 1520 Location: Aerospace Valley
|
Posted: Mon Jun 25, 2018 6:26 am Post subject: |
|
|
Can I cast some doubt on the timings for gfortran aas given at bottom of mecej4's 22 jun comment at the top of this, p. 2 of the post .....
I note that for N=10, ftn95 is 75% of the time of gfortran. Faster !!!!
Now, between N=10 and N=50, gfortran achieves the impossible ..... it REDUCES the time by a factor of 40 from 0.00606 to 0.000151
How can this be so ??? _________________ ''Computers (HAL and MARVIN excepted) are incredibly rigid. They question nothing. Especially input data.Human beings are incredibly trusting of computers and don't check input data. Together cocking up even the simplest calculation ... " |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7938 Location: Salford, UK
|
Posted: Mon Jun 25, 2018 6:53 am Post subject: |
|
|
Yes there is certainly a problem in FTN95 when calling SPREAD. It works correctly when used within MATMUL but Mecj4's program demonstrates a bug that has only just come to light after more than 20 years.
I have noted that this needs fixing. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1895
|
Posted: Mon Jun 25, 2018 9:09 am Post subject: Re: |
|
|
John-Silver wrote: | Can I cast some doubt on the timings for gfortran aas given at bottom of mecej4's 22 jun comment at the top of this, p. 2 of the post .....
I note that for N=10, ftn95 is 75% of the time of gfortran. Faster !!!!
Now, between N=10 and N=50, gfortran achieves the impossible ..... it REDUCES the time by a factor of 40 from 0.00606 to 0.000151
How can this be so ??? |
I did notice the inconsistent timings, but investigating them struck me as something that would be definitely off-topic here. Were there a forum for Gfortran, other than Usenet:comp.lang.fortran, this sub-problem could go there. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Mon Jun 25, 2018 10:32 am Post subject: |
|
|
I think that it has taken 20 years to identify the problem with SPREAD indicates how little this function is used. I would prefer to write the code using a DO loop approach, rather than check the documentation for SPREAD, probably each time I reviewed the code.
It is good to identify the problem in case it has a more general effect (assumption for what are elemental functions ?)
I am not suggesting it is a high priority fix, although we all have different priorities. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7938 Location: Salford, UK
|
Posted: Mon Jun 25, 2018 12:22 pm Post subject: |
|
|
This bug has now been fixed for the next release of FTN95. It turns out that FTN95 already works correctly for fixed size arrays but not (as here) with ALLOCATE.
With this fix elapsed times are no longer significant. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7938 Location: Salford, UK
|
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1895
|
Posted: Sun Jul 01, 2018 8:20 am Post subject: |
|
|
The updated FTN95 8.3 EXE and DLLs in the Dropbox download fix the SPREAD problem that was exhibited with my test program v2m (see above).
Thanks to Silverfrost and Paul for the fast response and effective remedy.
I hope that Klaus L. and John Campbell will be able to confirm that the same fixes also work for their test programs as well as Klaus's real application in which matrix outer products are computed.
Last edited by mecej4 on Tue Jul 03, 2018 1:11 pm; edited 1 time in total |
|
Back to top |
|
|
KL
Joined: 16 Nov 2009 Posts: 144
|
Posted: Tue Jul 03, 2018 9:19 am Post subject: |
|
|
I have copied the files of the beta version into their corresponding directories, but I see no improvement. Neither with Visual Studio 2015 nor with Plato. I will further look for a potential error on my side but the updated files are on their correct place. Could someone who stated that the problem has been fixed just run my little program? That would help me a lot.
Many thanks, Klaus |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|