forums.silverfrost.com
Welcome to the Silverfrost forums

 Calculating the outer vector product Goto page Previous  1, 2, 3  Next
Author Message
mecej4

Joined: 31 Oct 2006
Posts: 1072

Posted: Fri Jun 22, 2018 12:23 pm    Post subject: Poor performance of SPREAD function

Here is a simplified version of John Campbell's test program, with only the SPREAD based version of the outer product computation being included. I hope that this version succeeds in convincing readers that the FTN95 implementation of SPREAD needs improvement.
 Code: MODULE DataTypes          INTEGER  , PARAMETER :: I4B    = SELECTED_INT_KIND (9)          INTEGER  , PARAMETER :: I2B    = SELECTED_INT_KIND (4)          INTEGER  , PARAMETER :: I1B    = SELECTED_INT_KIND (2)          INTEGER  , PARAMETER :: SP     = KIND (1.0)          INTEGER  , PARAMETER :: dp     = KIND (1.0D0)          INTEGER  , PARAMETER :: LGT    = KIND (.true.)        END MODULE DataTypes   program test        call Test_OuterProduct (  10, 10)        call Test_OuterProduct (  50, 50)        call Test_OuterProduct ( 100,100)        call Test_OuterProduct ( 200,200)   end program      subroutine Test_OuterProduct (m,n)        Use DataTypes        Implicit None        Integer :: m,n        Real (dp) , allocatable, Dimension (:)   :: a (:), b(:), c(:,:)        Integer :: seed = 123        Real (dp) :: time1, time2        allocate ( a(m), b(n), c(m,n) ) !        call random_seed (seed)        call random_number (a)        call random_number (b)        Call elapse_Time ( time1 )        c = spread ( a,dim=2,ncopies=n ) * spread ( b,dim=1,ncopies=m )        Call elapse_Time ( time2 )        write (*,11) n,time2-time1     11 format ( I4,f12.6)      End subroutine Test_OuterProduct      Subroutine elapse_time ( sec )        real*8 sec        integer*8 tick, rate        integer*4 :: kk = 0        call system_clock ( tick, rate )        if (kk == 0 ) then          write (*,'(A,T15,I12,//,A,/)') 'System_clock rate =',rate,'   n    time (s)'          kk = rate        end if        sec = dble(tick)/dble(rate)      end subroutine elapse_time

Here are the outputs from FTN95 8.30 with /opt /64:
 Code: System_clock rate =  10000    n    time (s)   10    0.004300   50    0.091000  100    1.417600  200   25.589800

and Gfortran 7.2, 64 bit:
 Code: System_clock r     3579545    n    time (s)   10    0.006306   50    0.000151  100    0.000203  200    0.001016

The results were from running the program on a desktop PC with an AMD Athlon 64-4200+, running Windows 10-64.

For n = 200, the run time of the EXE generated by FTN95 is ~25,000 times that of the EXE generated by Gfortran.

Last edited by mecej4 on Fri Jun 22, 2018 3:17 pm; edited 1 time in total
JohnCampbell

Joined: 16 Feb 2006
Posts: 2053
Location: Sydney

 Posted: Fri Jun 22, 2018 12:42 pm    Post subject: Paul, Your suggestion of FTN95 may be treating SPREAD as "elemental" is certainly consistent with the timing performance. I hope it can be fixed. Imagine if Dot_Product had the same problem; imagine the improvement we could get. John
PaulLaidler

Joined: 21 Feb 2005
Posts: 5759
Location: Salford, UK

 Posted: Fri Jun 22, 2018 7:05 pm    Post subject: I may be wrong but as far as I know the operator '*' is not overloaded to mean matrix multiplication in this context. If matrix multiplication is intended then FTN95 requires a call to MATMUL. This then results in two calls to SPREAD. Otherwise FTN95 provides a vast number of calls to SPREAD presumably doing the whole calculation repeatedly for each element in turn. There is no implied problem with SPREAD but FTN95 is not doing what the programmer intended.
mecej4

Joined: 31 Oct 2006
Posts: 1072

 Posted: Sat Jun 23, 2018 1:28 am    Post subject: The Fortran 95 standard documents the properties of the arguments and result value of intrinsic procedures in section 13.14. For each such procedure, the second attribute listed is "Class". For functions such as ABS, SQRT, etc., the listed class is "Elemental Function". For SPREAD, MATMUL, etc., the listed class is "Transformational Function". From this I conclude that SPREAD is not an elemental function and I think that the implementation of SPREAD in FTN95 should not be forced to be elemental, given the severe performance penalty that has been documented in this thread. Unless use cases can be presented for which the "Elemental" attribute is beneficial, that attribute should, perhaps, not be added by a particular compiler vendor.
PaulLaidler

Joined: 21 Feb 2005
Posts: 5759
Location: Salford, UK

 Posted: Sat Jun 23, 2018 7:19 am    Post subject: I understand that SPREAD is not elemental and that FTN95 is not doing what the programmer intended. At the moment I am not clear whether FTN95 can do anything to correct this. It may not even be able to provide an error or warning. I will aim to look at it when I get a moment.
mecej4

Joined: 31 Oct 2006
Posts: 1072

Posted: Sun Jun 24, 2018 12:16 pm    Post subject: Re:

 PaulLaidler wrote: I understand that SPREAD is not elemental and that FTN95 is not doing what the programmer intended. At the moment I am not clear whether FTN95 can do anything to correct this. It may not even be able to provide an error or warning.

Paul, I hope that this problem can be addressed because, otherwise, I regret to say, SPREAD is unusable in FTN95. To see this, please consider the following test program, which contains nothing more than the most basic use of SPREAD -- to construct a square matrix each of whose columns contains a copy of a given vector.
 Code: program v2m ! construct matrix A(n,n) with elements A(i,j) = i for i = 1:n, j=1:n implicit none integer :: i,k,n real, allocatable :: v(:),A(:,:) real :: t1,t2 ! n=50 do k=1,3    allocate(v(n),A(n,n))    v = (/ (i, i=1,n) /)    call cpu_time(t1)    A = spread(v, DIM=2, NCOPIES = n)    call cpu_time(t2)    deallocate(v,A)    write(*,'(I4,2x,F6.2,1x,A)')n,t2-t1,'s'    n=n*2 end do end program

For n = 200, this code took 63 seconds with 32-bit FTN95 and 17 seconds with 64-bit FTN95, both compiled with /opt specified.
LitusSaxonicum

Joined: 23 Aug 2005
Posts: 1945
Location: Yateley, Hants, UK

Posted: Sun Jun 24, 2018 3:19 pm    Post subject:

I never even heard of the SPREAD function before reading this thread, so I have learnt something. As it suffers the same(-ish) fault in the 32-bit compiler version, I wonder just how widely it is used.

While I am an enthusiast for things working as they should (and in this context, standard-conforming must also include working efficiently as well as properly), in practical terms it is a matter of supreme indifference to me, as someone who is rather old-fashioned in programming, whether it works or not. I suspect that I am not entirely alone in this view.

I wondered what people did before such functions were part of Fortran.

I therefore had a go at writing a routine to do pretty much what mecej4 described with his code example. Here it is:

 Code: PROGRAM V2M_GERIATRIC       DIMENSION A(500,500)       DO 30 K = 50, 500, 50       CALL CPU_TIME (T1)       DO 20 J = 1,K       DO 10 I = 1,K       A(I,J)  = I   10  CONTINUE   20  CONTINUE       CALL CPU_TIME (T2)       WRITE(*,'(I4,F15.8,1X,A)') K, T2-T1, 's'   30  CONTINUE       END

Rather interestingly, the whole thing takes, as closely as I can judge, 6 seconds, which is the length of the nag screen introduced into the PE when one uses /LGO in version 8.30! (32 bit only).

I am reminded of the opening line from L. P. Hartley's 'The Go Between': "The past is a foreign country, they do things differently there."

Emigrating to the present is not always an advance.

The book is a good read too ...

Eddie
John-Silver

Joined: 30 Jul 2013
Posts: 1035
Location: Aerospace Valley

 Posted: Mon Jun 25, 2018 6:26 am    Post subject: Can I cast some doubt on the timings for gfortran aas given at bottom of mecej4's 22 jun comment at the top of this, p. 2 of the post ..... I note that for N=10, ftn95 is 75% of the time of gfortran. Faster !!!! Now, between N=10 and N=50, gfortran achieves the impossible ..... it REDUCES the time by a factor of 40 from 0.00606 to 0.000151 How can this be so ???_________________''Computers are incredibly rigid. They question nothing. Especialy input data.Human beings are incredibly trusting of computers and don't check input data. Together they are capable of cocking up even the simplest calculation ... "
PaulLaidler

Joined: 21 Feb 2005
Posts: 5759
Location: Salford, UK

 Posted: Mon Jun 25, 2018 6:53 am    Post subject: Yes there is certainly a problem in FTN95 when calling SPREAD. It works correctly when used within MATMUL but Mecj4's program demonstrates a bug that has only just come to light after more than 20 years. I have noted that this needs fixing.
mecej4

Joined: 31 Oct 2006
Posts: 1072

Posted: Mon Jun 25, 2018 9:09 am    Post subject: Re:

 John-Silver wrote: Can I cast some doubt on the timings for gfortran aas given at bottom of mecej4's 22 jun comment at the top of this, p. 2 of the post ..... I note that for N=10, ftn95 is 75% of the time of gfortran. Faster !!!! Now, between N=10 and N=50, gfortran achieves the impossible ..... it REDUCES the time by a factor of 40 from 0.00606 to 0.000151 How can this be so ???

I did notice the inconsistent timings, but investigating them struck me as something that would be definitely off-topic here. Were there a forum for Gfortran, other than Usenet:comp.lang.fortran, this sub-problem could go there.
JohnCampbell

Joined: 16 Feb 2006
Posts: 2053
Location: Sydney

 Posted: Mon Jun 25, 2018 10:32 am    Post subject: I think that it has taken 20 years to identify the problem with SPREAD indicates how little this function is used. I would prefer to write the code using a DO loop approach, rather than check the documentation for SPREAD, probably each time I reviewed the code. It is good to identify the problem in case it has a more general effect (assumption for what are elemental functions ?) I am not suggesting it is a high priority fix, although we all have different priorities.
PaulLaidler

Joined: 21 Feb 2005
Posts: 5759
Location: Salford, UK

 Posted: Mon Jun 25, 2018 12:22 pm    Post subject: This bug has now been fixed for the next release of FTN95. It turns out that FTN95 already works correctly for fixed size arrays but not (as here) with ALLOCATE. With this fix elapsed times are no longer significant.
PaulLaidler

Joined: 21 Feb 2005
Posts: 5759
Location: Salford, UK

 Posted: Sat Jun 30, 2018 2:32 pm    Post subject: Please see... http://forums.silverfrost.com/viewtopic.php?t=3827
mecej4

Joined: 31 Oct 2006
Posts: 1072

 Posted: Sun Jul 01, 2018 8:20 am    Post subject: The updated FTN95 8.3 EXE and DLLs in the Dropbox download fix the SPREAD problem that was exhibited with my test program v2m (see above). Thanks to Silverfrost and Paul for the fast response and effective remedy. I hope that Klaus L. and John Campbell will be able to confirm that the same fixes also work for their test programs as well as Klaus's real application in which matrix outer products are computed.Last edited by mecej4 on Tue Jul 03, 2018 1:11 pm; edited 1 time in total
KL

Joined: 16 Nov 2009
Posts: 141

 Posted: Tue Jul 03, 2018 9:19 am    Post subject: I have copied the files of the beta version into their corresponding directories, but I see no improvement. Neither with Visual Studio 2015 nor with Plato. I will further look for a potential error on my side but the updated files are on their correct place. Could someone who stated that the problem has been fixed just run my little program? That would help me a lot. Many thanks, Klaus
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 All times are GMT + 1 HourGoto page Previous  1, 2, 3  Next Page 2 of 3