forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Calculating the outer vector product
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
mecej4



Joined: 31 Oct 2006
Posts: 976

PostPosted: Fri Jun 22, 2018 12:23 pm    Post subject: Poor performance of SPREAD function Reply with quote

Here is a simplified version of John Campbell's test program, with only the SPREAD based version of the outer product computation being included. I hope that this version succeeds in convincing readers that the FTN95 implementation of SPREAD needs improvement.
Code:

      MODULE DataTypes
         INTEGER  , PARAMETER :: I4B    = SELECTED_INT_KIND (9)
         INTEGER  , PARAMETER :: I2B    = SELECTED_INT_KIND (4)
         INTEGER  , PARAMETER :: I1B    = SELECTED_INT_KIND (2)
         INTEGER  , PARAMETER :: SP     = KIND (1.0)
         INTEGER  , PARAMETER :: dp     = KIND (1.0D0)
         INTEGER  , PARAMETER :: LGT    = KIND (.true.)
       END MODULE DataTypes

  program test

       call Test_OuterProduct (  10, 10)
       call Test_OuterProduct (  50, 50)
       call Test_OuterProduct ( 100,100)
       call Test_OuterProduct ( 200,200)

  end program

     subroutine Test_OuterProduct (m,n)

       Use DataTypes
       Implicit None

       Integer :: m,n

       Real (dp) , allocatable, Dimension (:)   :: a (:), b(:), c(:,:)
       Integer :: seed = 123
       Real (dp) :: time1, time2

       allocate ( a(m), b(n), c(m,n) )
!
       call random_seed (seed)
       call random_number (a)
       call random_number (b)

       Call elapse_Time ( time1 )

       c = spread ( a,dim=2,ncopies=n ) * spread ( b,dim=1,ncopies=m )

       Call elapse_Time ( time2 )

       write (*,11) n,time2-time1
    11 format ( I4,f12.6)
     End subroutine Test_OuterProduct

     Subroutine elapse_time ( sec )
       real*8 sec
       integer*8 tick, rate
       integer*4 :: kk = 0

       call system_clock ( tick, rate )
       if (kk == 0 ) then
         write (*,'(A,T15,I12,//,A,/)') 'System_clock rate =',rate,'   n    time (s)'
         kk = rate
       end if
       sec = dble(tick)/dble(rate)
     end subroutine elapse_time

Here are the outputs from FTN95 8.30 with /opt /64:
Code:
System_clock rate =  10000

   n    time (s)

  10    0.004300
  50    0.091000
 100    1.417600
 200   25.589800

and Gfortran 7.2, 64 bit:
Code:
System_clock r     3579545

   n    time (s)

  10    0.006306
  50    0.000151
 100    0.000203
 200    0.001016

The results were from running the program on a desktop PC with an AMD Athlon 64-4200+, running Windows 10-64.

For n = 200, the run time of the EXE generated by FTN95 is ~25,000 times that of the EXE generated by Gfortran.


Last edited by mecej4 on Fri Jun 22, 2018 3:17 pm; edited 1 time in total
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1979
Location: Sydney

PostPosted: Fri Jun 22, 2018 12:42 pm    Post subject: Reply with quote

Paul,

Your suggestion of FTN95 may be treating SPREAD as "elemental" is certainly consistent with the timing performance. I hope it can be fixed.

Imagine if Dot_Product had the same problem; imagine the improvement we could get.

John
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5496
Location: Salford, UK

PostPosted: Fri Jun 22, 2018 7:05 pm    Post subject: Reply with quote

I may be wrong but as far as I know the operator '*' is not overloaded to mean matrix multiplication in this context. If matrix multiplication is intended then FTN95 requires a call to MATMUL. This then results in two calls to SPREAD. Otherwise FTN95 provides a vast number of calls to SPREAD presumably doing the whole calculation repeatedly for each element in turn.

There is no implied problem with SPREAD but FTN95 is not doing what the programmer intended.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 976

PostPosted: Sat Jun 23, 2018 1:28 am    Post subject: Reply with quote

The Fortran 95 standard documents the properties of the arguments and result value of intrinsic procedures in section 13.14. For each such procedure, the second attribute listed is "Class". For functions such as ABS, SQRT, etc., the listed class is "Elemental Function". For SPREAD, MATMUL, etc., the listed class is "Transformational Function".

From this I conclude that SPREAD is not an elemental function and I think that the implementation of SPREAD in FTN95 should not be forced to be elemental, given the severe performance penalty that has been documented in this thread.

Unless use cases can be presented for which the "Elemental" attribute is beneficial, that attribute should, perhaps, not be added by a particular compiler vendor.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5496
Location: Salford, UK

PostPosted: Sat Jun 23, 2018 7:19 am    Post subject: Reply with quote

I understand that SPREAD is not elemental and that FTN95 is not doing what the programmer intended. At the moment I am not clear whether FTN95 can do anything to correct this. It may not even be able to provide an error or warning.

I will aim to look at it when I get a moment.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 976

PostPosted: Sun Jun 24, 2018 12:16 pm    Post subject: Re: Reply with quote

PaulLaidler wrote:
I understand that SPREAD is not elemental and that FTN95 is not doing what the programmer intended. At the moment I am not clear whether FTN95 can do anything to correct this. It may not even be able to provide an error or warning.

Paul, I hope that this problem can be addressed because, otherwise, I regret to say, SPREAD is unusable in FTN95. To see this, please consider the following test program, which contains nothing more than the most basic use of SPREAD -- to construct a square matrix each of whose columns contains a copy of a given vector.
Code:
program v2m
! construct matrix A(n,n) with elements A(i,j) = i for i = 1:n, j=1:n
implicit none
integer :: i,k,n
real, allocatable :: v(:),A(:,:)
real :: t1,t2
!
n=50
do k=1,3
   allocate(v(n),A(n,n))
   v = (/ (i, i=1,n) /)
   call cpu_time(t1)
   A = spread(v, DIM=2, NCOPIES = n)
   call cpu_time(t2)
   deallocate(v,A)
   write(*,'(I4,2x,F6.2,1x,A)')n,t2-t1,'s'
   n=n*2
end do
end program

For n = 200, this code took 63 seconds with 32-bit FTN95 and 17 seconds with 64-bit FTN95, both compiled with /opt specified.
Back to top
View user's profile Send private message
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 1844
Location: Yateley, Hants, UK

PostPosted: Sun Jun 24, 2018 3:19 pm    Post subject: Reply with quote

I never even heard of the SPREAD function before reading this thread, so I have learnt something. As it suffers the same(-ish) fault in the 32-bit compiler version, I wonder just how widely it is used.

While I am an enthusiast for things working as they should (and in this context, standard-conforming must also include working efficiently as well as properly), in practical terms it is a matter of supreme indifference to me, as someone who is rather old-fashioned in programming, whether it works or not. I suspect that I am not entirely alone in this view.

I wondered what people did before such functions were part of Fortran.

I therefore had a go at writing a routine to do pretty much what mecej4 described with his code example. Here it is:

Code:
      PROGRAM V2M_GERIATRIC
      DIMENSION A(500,500)
      DO 30 K = 50, 500, 50
      CALL CPU_TIME (T1)
      DO 20 J = 1,K
      DO 10 I = 1,K
      A(I,J)  = I
  10  CONTINUE
  20  CONTINUE
      CALL CPU_TIME (T2)
      WRITE(*,'(I4,F15.8,1X,A)') K, T2-T1, 's'
  30  CONTINUE
      END


Rather interestingly, the whole thing takes, as closely as I can judge, 6 seconds, which is the length of the nag screen introduced into the PE when one uses /LGO in version 8.30! (32 bit only).

I am reminded of the opening line from L. P. Hartley's 'The Go Between': "The past is a foreign country, they do things differently there."

Emigrating to the present is not always an advance.

The book is a good read too ...

Eddie
Back to top
View user's profile Send private message
John-Silver



Joined: 30 Jul 2013
Posts: 913
Location: Aerospace Valley

PostPosted: Mon Jun 25, 2018 6:26 am    Post subject: Reply with quote

Can I cast some doubt on the timings for gfortran aas given at bottom of mecej4's 22 jun comment at the top of this, p. 2 of the post .....

I note that for N=10, ftn95 is 75% of the time of gfortran. Faster !!!!

Now, between N=10 and N=50, gfortran achieves the impossible ..... it REDUCES the time by a factor of 40 from 0.00606 to 0.000151

How can this be so ???
_________________
''Computers are incredibly rigid. They question nothing. Especialy input data.Human beings are incredibly trusting of computers and don't check input data. Together they are capable of cocking up even the simplest calculation ... Smile "
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5496
Location: Salford, UK

PostPosted: Mon Jun 25, 2018 6:53 am    Post subject: Reply with quote

Yes there is certainly a problem in FTN95 when calling SPREAD. It works correctly when used within MATMUL but Mecj4's program demonstrates a bug that has only just come to light after more than 20 years.

I have noted that this needs fixing.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 976

PostPosted: Mon Jun 25, 2018 9:09 am    Post subject: Re: Reply with quote

John-Silver wrote:
Can I cast some doubt on the timings for gfortran aas given at bottom of mecej4's 22 jun comment at the top of this, p. 2 of the post .....

I note that for N=10, ftn95 is 75% of the time of gfortran. Faster !!!!

Now, between N=10 and N=50, gfortran achieves the impossible ..... it REDUCES the time by a factor of 40 from 0.00606 to 0.000151

How can this be so ???

I did notice the inconsistent timings, but investigating them struck me as something that would be definitely off-topic here. Were there a forum for Gfortran, other than Usenet:comp.lang.fortran, this sub-problem could go there.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1979
Location: Sydney

PostPosted: Mon Jun 25, 2018 10:32 am    Post subject: Reply with quote

I think that it has taken 20 years to identify the problem with SPREAD indicates how little this function is used. I would prefer to write the code using a DO loop approach, rather than check the documentation for SPREAD, probably each time I reviewed the code.
It is good to identify the problem in case it has a more general effect (assumption for what are elemental functions ?)
I am not suggesting it is a high priority fix, although we all have different priorities.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5496
Location: Salford, UK

PostPosted: Mon Jun 25, 2018 12:22 pm    Post subject: Reply with quote

This bug has now been fixed for the next release of FTN95. It turns out that FTN95 already works correctly for fixed size arrays but not (as here) with ALLOCATE.

With this fix elapsed times are no longer significant.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5496
Location: Salford, UK

PostPosted: Sat Jun 30, 2018 2:32 pm    Post subject: Reply with quote

Please see...

http://forums.silverfrost.com/viewtopic.php?t=3827
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 976

PostPosted: Sun Jul 01, 2018 8:20 am    Post subject: Reply with quote

The updated FTN95 8.3 EXE and DLLs in the Dropbox download fix the SPREAD problem that was exhibited with my test program v2m (see above).

Thanks to Silverfrost and Paul for the fast response and effective remedy.

I hope that Klaus L. and John Campbell will be able to confirm that the same fixes also work for their test programs as well as Klaus's real application in which matrix outer products are computed.


Last edited by mecej4 on Tue Jul 03, 2018 1:11 pm; edited 1 time in total
Back to top
View user's profile Send private message
KL



Joined: 16 Nov 2009
Posts: 132

PostPosted: Tue Jul 03, 2018 9:19 am    Post subject: Reply with quote

I have copied the files of the beta version into their corresponding directories, but I see no improvement. Neither with Visual Studio 2015 nor with Plato. I will further look for a potential error on my side but the updated files are on their correct place. Could someone who stated that the problem has been fixed just run my little program? That would help me a lot.

Many thanks, Klaus
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group