forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Calculating the outer vector product
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
KL



Joined: 16 Nov 2009
Posts: 144

PostPosted: Wed Jun 20, 2018 7:38 am    Post subject: Calculating the outer vector product Reply with quote

For the outer vector product an intrinsic procedure is not available. It can be calculated by two simple do loops, the spread function or may be by other techniques. The "spread solution" is recommended in several references (for instance in Numerical Recipes). In the following test program both methods are tested:
Code:

     MODULE DataTypes
        INTEGER  , PARAMETER :: I4B    = SELECTED_INT_KIND (9)
        INTEGER  , PARAMETER :: I2B    = SELECTED_INT_KIND (4)
        INTEGER  , PARAMETER :: I1B    = SELECTED_INT_KIND (2)
        INTEGER  , PARAMETER :: SP     = KIND (1.0)
        INTEGER  , PARAMETER :: dp     = KIND (1.0D0)
        INTEGER  , PARAMETER :: LGT    = KIND (.true.)
      END MODULE DataTypes



Code:


    Program Test_OuterProduct

      Use DataTypes
      Implicit None

      Integer , Parameter :: m = 100, n = 100
      Integer             :: i, j
      Integer             :: seed = 123

      Real (dp) , Dimension (m)      :: a
      Real (dp) , Dimension (n)      :: b
      Real (dp) , Dimension (m,n)    :: c

      Real (dp)                      :: harvest
      Real (dp)                      :: time1, time2

!     ---------------------------------------------------------------------------------------------
!                                         Defining a and b     
     
      call random_seed (seed)

      Do i=1,m
          call random_number (harvest)
          a (i) = harvest
      End Do

      Do i=1,n
          call random_number (harvest)
          b (i) = harvest
      End Do

!     ---------------------------------------------------------------------------------------------
!                      Straightforward method for outer product by 2 do loops

      Call Cpu_Time ( time1 )

      Do i = 1, m
        Do j = 1, n
         c(i,j) = a(i) * b(j)
        End Do
      End Do

      Call Cpu_Time ( time2 )

      write (*,*) c(1,1), c(m,n)

      write (*,*)
      write (*,*) 'Processing time for straightforward method was', time2-time1, ' seconds'
      write (*,*)
!     ---------------------------------------------------------------------------------------------
!                          "Recommended" method via the spread procedure
     
      Call Cpu_Time ( time1 )

      c (1:m,1:n) = spread ( a(1:m),dim=2,ncopies=size(b))*spread(b(1:n),dim=1,ncopies=size(a) )

      Call Cpu_Time ( time2 )

      write (*,*) c(1,1), c(m,n)

      write (*,*)
      write (*,*) 'Processing time employing spread           was', time2-time1, ' seconds'
      write (*,*)

      read  (*,*)

    End Program Test_OuterProduct




With vectors a(m) and b(n) of size 1000 (both), the outer vector product can be calculated by the simple do loops in very short time (app. 15 ms). Using the spread intrinsic procedure of the FTN95 compiler results in extensively increasing computer times. For vector sizes much below 1000, FTN95 needs the following running times:

For m=n=100 approximately 5 s, for 150 app. 24 s and for 200 already 76(!) s.

The results of both methods are identical.

I would like to add that this is not the case when running this test case with Code:Blocks and the GNU Fortran compiler. With this compiler, both methods calculate the vector product of size 1000 in app. 15 ms[/b:969912
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7916
Location: Salford, UK

PostPosted: Wed Jun 20, 2018 9:09 am    Post subject: Reply with quote

The key is that for FTN95 you need an explicit call to MATMUL. Presumably gFortran is clever enough to work out that a matrix product is intended.

My guess is that FTN95 makes a literal interpretation and ends up doing the SPREAD for each element in turn.

You will need to check the details but for FTN95 you will need something like

Code:
c = matmul(spread(a,dim=2,ncopies=n),spread(b,dim=1,ncopies=m))/n


(Do we divide by n or m?)
Back to top
View user's profile Send private message AIM Address
KL



Joined: 16 Nov 2009
Posts: 144

PostPosted: Wed Jun 20, 2018 9:44 am    Post subject: Reply with quote

Thank you very much, Paul.

Unfortunately, my last sentence of my mail was cut-off:

"Of course, one could question the method using the spread procedure as being too complicated for the offered advantages (parallel computation possible). Anyhow, I report this for eventual checking."

I will have a closer look to your proposal and give feedback. But very likely I will not use the "spread solution". My problem was that this was included in another solution and it took some time to find out the source of the problem.


Klaus
Back to top
View user's profile Send private message
KL



Joined: 16 Nov 2009
Posts: 144

PostPosted: Wed Jun 20, 2018 9:45 am    Post subject: Reply with quote

Thank you very much, Paul.

Unfortunately, my last sentence of my mail was cut-off:

"Of course, one could question the method using the spread procedure as being too complicated for the offered advantages (parallel computation possible). Anyhow, I report this for eventual checking."

I will have a closer look to your proposal and give feedback. But very likely I will not use the "spread solution". My problem was that this was included in another solution and it took some time to find out the source of the problem.


Klaus
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Thu Jun 21, 2018 6:34 am    Post subject: Reply with quote

Klaus,

I would replace CPU_Time with Elapse_Time, which calls System_Clock or RDTSC_VAL@(). You won't get a meaningful result with CPU_Time for your first test.

I am trying m=n=1000, which certainly shows FTN95 implementation of SPREAD does not work well. ( Still hasn't finished ! I was expecting a stack overflow error )

Alternatives could be:
forall (i=1:m, j=1:n) c(i,j) = a(i)*b(j)
or
Do j = 1, n
c(:,j) = a(:) * b(j)
End Do

You want "I" to be the inner loop, which my second option guarantees. With FORALL, the order is uncertain.

Given the uncertainty of the stack requirement for SPREAD, how could this be a recommended approach ?

John
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7916
Location: Salford, UK

PostPosted: Thu Jun 21, 2018 7:03 am    Post subject: Reply with quote

Please note that there is no implied problem with SPREAD. In this context, FTN95 does require an explicit call to MATMUL and this seems right to me.
Back to top
View user's profile Send private message AIM Address
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Thu Jun 21, 2018 7:57 am    Post subject: Re: Reply with quote

PaulLaidler wrote:
FTN95 does require an explicit call to MATMUL and this seems right to me.
Paul, wouldn't this be a different calculation ?

oops ! The following adaptation still has a delay:
Code:
       Call elapse_Time ( time1 )

!       c (1:m,1:n) = spread ( a(1:m),dim=2,ncopies=size(b) )   &
!                   * spread ( b(1:n),dim=1,ncopies=size(a) )
       allocate ( aa(m,n) )
       allocate ( bb(m,n) )
       aa = spread ( a(1:m),dim=2,ncopies=size(b) )
       bb = spread ( b(1:n),dim=1,ncopies=size(a) )
       c = aa * bb

       Call elapse_Time ( time2 )

       write (*,*) c(1,1), c(m,n)

       write (*,*)
       write (*,11) time2-time1, ' seconds : Processing time employing spread method : ',n
       write (*,*)


See the linked test program that shows SPREAD is very slow.

https://www.dropbox.com/s/4r5kzn5onefvkka/kl2.f90?dl=0
Back to top
View user's profile Send private message
KL



Joined: 16 Nov 2009
Posts: 144

PostPosted: Thu Jun 21, 2018 10:23 am    Post subject: Reply with quote

Dear Paul, dear John,

thank you for your remarks. I am aware of the deficiency of my time evaluation. I just wanted to demonstrate the difference of computational costs between both methods: 2 (or even more) orders of magnitude!

The problem arises when using standard programs from Numerical Recipes. A quick look reveals that 13 procedures make use of the utility program outerprod, which uses the “spread solution”. Amongst these programs are important solvers such as gaussj, ludcmp and svdcmp. Not modifying the procedure outerprod ends in a disaster for larger systems when using ftn95. Whether one uses do loops, forall loops in various variations instead does not really matter. How would your proposal, Paul, look in detail taking also different m and n into account? Or is your position that the “spread solution” is an unnecessary, excessive use of Fortran rules?

I would like to add that any discussion what formal requirements intrinsic Fortran procedures must fulfill by far exceeds my knowledge.
Klaus
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7916
Location: Salford, UK

PostPosted: Thu Jun 21, 2018 10:46 am    Post subject: Reply with quote

John

My point is that the program supplied by Klaus does not do what he intends in the calls to SPREAD.

The form

c = SPREAD*SPREAD

does not do a MATMUL on the right and then an element-by-element copy to the left.

A MATMUL is required to get what is intended. I am only guessing but this seems to me to be the mostly likely explanation.
Back to top
View user's profile Send private message AIM Address
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Thu Jun 21, 2018 11:48 am    Post subject: Reply with quote

The computation of the outer product of two m X n matrices involves m.n multiplications plus 3 m.n load/store operations.

There is no need to do matrix multiplication in this. Firstly, the two input matrices A and B cannot be multiplied unless m = n. Secondly, the number of arithmetic operations needed for doing matrix multiplication is an order of magnitude higher than for forming the outer product. For these reasons, I do not see why MATMUL made its way into the discussion.

I find that with Gfortran the calculation with the "straightforward method" can be made about 10 times faster if the nested DO loops are replaced by
Code:
      Do j = 1, n
         c(1:m,j) = a(1:m) * b(j)
      End Do

Perhaps this modification (i.e., using vector operations) could be tried with FTN95 as well.
Back to top
View user's profile Send private message
KL



Joined: 16 Nov 2009
Posts: 144

PostPosted: Thu Jun 21, 2018 12:01 pm    Post subject: Reply with quote

The "spread solution" is not my idea. It has been taken from Numerical Recipes in Fortran 90, Second edition, where it is discussed in detail (Chapter 22 and 23). Klaus
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Thu Jun 21, 2018 12:05 pm    Post subject: Reply with quote

The link I provided demonstrates a number of loop alternatives, including array syntax instead of the inner loop, which was the fastest option with FTN95.
As Klaus reported, it also shows that FTN95 implementation of SPREAD is extremely slow. This needs to be reviewed. I used both win32 and /64 in tests.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7916
Location: Salford, UK

PostPosted: Thu Jun 21, 2018 12:44 pm    Post subject: Reply with quote

John

As I understand it, the initial program on this thread does not demonstrate that the FTN95 SPREAD function is slow. If you can generate a direct and simple use of SPREAD that shows its slowness then please let me know.

The program as shown calls SPREAD 20000 times whilst the alternative with an explicit call to MATMUL calls SPREAD twice.
Back to top
View user's profile Send private message AIM Address
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Fri Jun 22, 2018 2:09 am    Post subject: Reply with quote

Paul,

As I posted yesterday: See the linked test program that shows SPREAD is very slow.

https://www.dropbox.com/s/4r5kzn5onefvkka/kl2.f90?dl=0

Download it and run this test. I put write statements after each use of SPREAD.
Can't you see very clearly how slow SPREAD is for Klaus's modified example ?

Where is the 20,000 times ? It takes ages to do 1 iteration of the following loop. ( 2.6 seconds for a single SPREAD ! )
Code:
       allocate ( aa(m,n) )
       allocate ( bb(m,n) )
       Call elapse_Time ( time1 )

!       c (1:m,1:n) = spread ( a(1:m),dim=2,ncopies=size(b) )   &
!                   * spread ( b(1:n),dim=1,ncopies=size(a) )
           x = del_sec ()
       do k = 1,pass
         aa = spread ( a(1:m),dim=2,ncopies=size(b) )
           x = del_sec ()
           write (*,*) x, ' spread aa'
         bb = spread ( b(1:n),dim=1,ncopies=size(a) )
           x = del_sec ()
           write (*,*) x, ' spread bb'
         c = aa * bb
           x = del_sec ()
           write (*,*) x, ' multiply c'
       end do

       Call elapse_Time ( time2 )

I can run this example in Plato, selecting either FTN95 or gFortran with "Release x64" (try gFortran first )

Let me know if it can not be reproduced.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7916
Location: Salford, UK

PostPosted: Fri Jun 22, 2018 7:11 am    Post subject: Reply with quote

John

The way that I tested this was to put a print statement within the code for the SPREAD function. I have done the same for your test program with similar results and this was after changing line 148 to reduce the number of passes in this DO loop to one.

There may be a problem within FTN95 in that it may be treating SPREAD as "elemental" when it is not. Either way, I have no evidence at the moment that there is a problem with the code for SPREAD.
Back to top
View user's profile Send private message AIM Address
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group