forums.silverfrost.com
Welcome to the Silverfrost forums

 Calculating the outer vector product Goto page 1, 2, 3  Next
Author Message
KL

Joined: 16 Nov 2009
Posts: 142

Posted: Wed Jun 20, 2018 7:38 am    Post subject: Calculating the outer vector product

For the outer vector product an intrinsic procedure is not available. It can be calculated by two simple do loops, the spread function or may be by other techniques. The "spread solution" is recommended in several references (for instance in Numerical Recipes). In the following test program both methods are tested:
 Code: MODULE DataTypes         INTEGER  , PARAMETER :: I4B    = SELECTED_INT_KIND (9)         INTEGER  , PARAMETER :: I2B    = SELECTED_INT_KIND (4)         INTEGER  , PARAMETER :: I1B    = SELECTED_INT_KIND (2)         INTEGER  , PARAMETER :: SP     = KIND (1.0)         INTEGER  , PARAMETER :: dp     = KIND (1.0D0)         INTEGER  , PARAMETER :: LGT    = KIND (.true.)       END MODULE DataTypes

 Code: Program Test_OuterProduct       Use DataTypes       Implicit None       Integer , Parameter :: m = 100, n = 100       Integer             :: i, j       Integer             :: seed = 123       Real (dp) , Dimension (m)      :: a       Real (dp) , Dimension (n)      :: b       Real (dp) , Dimension (m,n)    :: c       Real (dp)                      :: harvest       Real (dp)                      :: time1, time2 !     --------------------------------------------------------------------------------------------- !                                         Defining a and b                  call random_seed (seed)       Do i=1,m           call random_number (harvest)           a (i) = harvest       End Do       Do i=1,n           call random_number (harvest)           b (i) = harvest       End Do !     --------------------------------------------------------------------------------------------- !                      Straightforward method for outer product by 2 do loops       Call Cpu_Time ( time1 )       Do i = 1, m         Do j = 1, n          c(i,j) = a(i) * b(j)         End Do       End Do       Call Cpu_Time ( time2 )       write (*,*) c(1,1), c(m,n)       write (*,*)       write (*,*) 'Processing time for straightforward method was', time2-time1, ' seconds'       write (*,*) !     --------------------------------------------------------------------------------------------- !                          "Recommended" method via the spread procedure             Call Cpu_Time ( time1 )       c (1:m,1:n) = spread ( a(1:m),dim=2,ncopies=size(b))*spread(b(1:n),dim=1,ncopies=size(a) )       Call Cpu_Time ( time2 )       write (*,*) c(1,1), c(m,n)       write (*,*)       write (*,*) 'Processing time employing spread           was', time2-time1, ' seconds'       write (*,*)       read  (*,*)     End Program Test_OuterProduct

With vectors a(m) and b(n) of size 1000 (both), the outer vector product can be calculated by the simple do loops in very short time (app. 15 ms). Using the spread intrinsic procedure of the FTN95 compiler results in extensively increasing computer times. For vector sizes much below 1000, FTN95 needs the following running times:

For m=n=100 approximately 5 s, for 150 app. 24 s and for 200 already 76(!) s.

The results of both methods are identical.

I would like to add that this is not the case when running this test case with Code:Blocks and the GNU Fortran compiler. With this compiler, both methods calculate the vector product of size 1000 in app. 15 ms[/b:969912
PaulLaidler

Joined: 21 Feb 2005
Posts: 6325
Location: Salford, UK

Posted: Wed Jun 20, 2018 9:09 am    Post subject:

The key is that for FTN95 you need an explicit call to MATMUL. Presumably gFortran is clever enough to work out that a matrix product is intended.

My guess is that FTN95 makes a literal interpretation and ends up doing the SPREAD for each element in turn.

You will need to check the details but for FTN95 you will need something like

(Do we divide by n or m?)
KL

Joined: 16 Nov 2009
Posts: 142

 Posted: Wed Jun 20, 2018 9:44 am    Post subject: Thank you very much, Paul. Unfortunately, my last sentence of my mail was cut-off: "Of course, one could question the method using the spread procedure as being too complicated for the offered advantages (parallel computation possible). Anyhow, I report this for eventual checking." I will have a closer look to your proposal and give feedback. But very likely I will not use the "spread solution". My problem was that this was included in another solution and it took some time to find out the source of the problem. Klaus
KL

Joined: 16 Nov 2009
Posts: 142

 Posted: Wed Jun 20, 2018 9:45 am    Post subject: Thank you very much, Paul. Unfortunately, my last sentence of my mail was cut-off: "Of course, one could question the method using the spread procedure as being too complicated for the offered advantages (parallel computation possible). Anyhow, I report this for eventual checking." I will have a closer look to your proposal and give feedback. But very likely I will not use the "spread solution". My problem was that this was included in another solution and it took some time to find out the source of the problem. Klaus
JohnCampbell

Joined: 16 Feb 2006
Posts: 2172
Location: Sydney

 Posted: Thu Jun 21, 2018 6:34 am    Post subject: Klaus, I would replace CPU_Time with Elapse_Time, which calls System_Clock or RDTSC_VAL@(). You won't get a meaningful result with CPU_Time for your first test. I am trying m=n=1000, which certainly shows FTN95 implementation of SPREAD does not work well. ( Still hasn't finished ! I was expecting a stack overflow error ) Alternatives could be: forall (i=1:m, j=1:n) c(i,j) = a(i)*b(j) or Do j = 1, n c(:,j) = a(:) * b(j) End Do You want "I" to be the inner loop, which my second option guarantees. With FORALL, the order is uncertain. Given the uncertainty of the stack requirement for SPREAD, how could this be a recommended approach ? John
PaulLaidler

Joined: 21 Feb 2005
Posts: 6325
Location: Salford, UK

 Posted: Thu Jun 21, 2018 7:03 am    Post subject: Please note that there is no implied problem with SPREAD. In this context, FTN95 does require an explicit call to MATMUL and this seems right to me.
JohnCampbell

Joined: 16 Feb 2006
Posts: 2172
Location: Sydney

Posted: Thu Jun 21, 2018 7:57 am    Post subject: Re:

 PaulLaidler wrote: FTN95 does require an explicit call to MATMUL and this seems right to me.
Paul, wouldn't this be a different calculation ?

oops ! The following adaptation still has a delay:
 Code: Call elapse_Time ( time1 ) !       c (1:m,1:n) = spread ( a(1:m),dim=2,ncopies=size(b) )   & !                   * spread ( b(1:n),dim=1,ncopies=size(a) )        allocate ( aa(m,n) )        allocate ( bb(m,n) )        aa = spread ( a(1:m),dim=2,ncopies=size(b) )        bb = spread ( b(1:n),dim=1,ncopies=size(a) )        c = aa * bb        Call elapse_Time ( time2 )        write (*,*) c(1,1), c(m,n)        write (*,*)        write (*,11) time2-time1, ' seconds : Processing time employing spread method : ',n        write (*,*)

https://www.dropbox.com/s/4r5kzn5onefvkka/kl2.f90?dl=0
KL

Joined: 16 Nov 2009
Posts: 142

 Posted: Thu Jun 21, 2018 10:23 am    Post subject: Dear Paul, dear John, thank you for your remarks. I am aware of the deficiency of my time evaluation. I just wanted to demonstrate the difference of computational costs between both methods: 2 (or even more) orders of magnitude! The problem arises when using standard programs from Numerical Recipes. A quick look reveals that 13 procedures make use of the utility program outerprod, which uses the “spread solution”. Amongst these programs are important solvers such as gaussj, ludcmp and svdcmp. Not modifying the procedure outerprod ends in a disaster for larger systems when using ftn95. Whether one uses do loops, forall loops in various variations instead does not really matter. How would your proposal, Paul, look in detail taking also different m and n into account? Or is your position that the “spread solution” is an unnecessary, excessive use of Fortran rules? I would like to add that any discussion what formal requirements intrinsic Fortran procedures must fulfill by far exceeds my knowledge. Klaus
PaulLaidler

Joined: 21 Feb 2005
Posts: 6325
Location: Salford, UK

 Posted: Thu Jun 21, 2018 10:46 am    Post subject: John My point is that the program supplied by Klaus does not do what he intends in the calls to SPREAD. The form c = SPREAD*SPREAD does not do a MATMUL on the right and then an element-by-element copy to the left. A MATMUL is required to get what is intended. I am only guessing but this seems to me to be the mostly likely explanation.
mecej4

Joined: 31 Oct 2006
Posts: 1274

Posted: Thu Jun 21, 2018 11:48 am    Post subject:

The computation of the outer product of two m X n matrices involves m.n multiplications plus 3 m.n load/store operations.

There is no need to do matrix multiplication in this. Firstly, the two input matrices A and B cannot be multiplied unless m = n. Secondly, the number of arithmetic operations needed for doing matrix multiplication is an order of magnitude higher than for forming the outer product. For these reasons, I do not see why MATMUL made its way into the discussion.

I find that with Gfortran the calculation with the "straightforward method" can be made about 10 times faster if the nested DO loops are replaced by
 Code: Do j = 1, n          c(1:m,j) = a(1:m) * b(j)       End Do

Perhaps this modification (i.e., using vector operations) could be tried with FTN95 as well.
KL

Joined: 16 Nov 2009
Posts: 142

 Posted: Thu Jun 21, 2018 12:01 pm    Post subject: The "spread solution" is not my idea. It has been taken from Numerical Recipes in Fortran 90, Second edition, where it is discussed in detail (Chapter 22 and 23). Klaus
JohnCampbell

Joined: 16 Feb 2006
Posts: 2172
Location: Sydney

 Posted: Thu Jun 21, 2018 12:05 pm    Post subject: The link I provided demonstrates a number of loop alternatives, including array syntax instead of the inner loop, which was the fastest option with FTN95. As Klaus reported, it also shows that FTN95 implementation of SPREAD is extremely slow. This needs to be reviewed. I used both win32 and /64 in tests.
PaulLaidler

Joined: 21 Feb 2005
Posts: 6325
Location: Salford, UK

 Posted: Thu Jun 21, 2018 12:44 pm    Post subject: John As I understand it, the initial program on this thread does not demonstrate that the FTN95 SPREAD function is slow. If you can generate a direct and simple use of SPREAD that shows its slowness then please let me know. The program as shown calls SPREAD 20000 times whilst the alternative with an explicit call to MATMUL calls SPREAD twice.
JohnCampbell

Joined: 16 Feb 2006
Posts: 2172
Location: Sydney

Posted: Fri Jun 22, 2018 2:09 am    Post subject:

Paul,

As I posted yesterday: See the linked test program that shows SPREAD is very slow.

https://www.dropbox.com/s/4r5kzn5onefvkka/kl2.f90?dl=0

Can't you see very clearly how slow SPREAD is for Klaus's modified example ?

Where is the 20,000 times ? It takes ages to do 1 iteration of the following loop. ( 2.6 seconds for a single SPREAD ! )
 Code: allocate ( aa(m,n) )        allocate ( bb(m,n) )        Call elapse_Time ( time1 ) !       c (1:m,1:n) = spread ( a(1:m),dim=2,ncopies=size(b) )   & !                   * spread ( b(1:n),dim=1,ncopies=size(a) )            x = del_sec ()        do k = 1,pass          aa = spread ( a(1:m),dim=2,ncopies=size(b) )            x = del_sec ()            write (*,*) x, ' spread aa'          bb = spread ( b(1:n),dim=1,ncopies=size(a) )            x = del_sec ()            write (*,*) x, ' spread bb'          c = aa * bb            x = del_sec ()            write (*,*) x, ' multiply c'        end do        Call elapse_Time ( time2 )

I can run this example in Plato, selecting either FTN95 or gFortran with "Release x64" (try gFortran first )

Let me know if it can not be reproduced.
PaulLaidler

Joined: 21 Feb 2005
Posts: 6325
Location: Salford, UK

 Posted: Fri Jun 22, 2018 7:11 am    Post subject: John The way that I tested this was to put a print statement within the code for the SPREAD function. I have done the same for your test program with similar results and this was after changing line 148 to reduce the number of passes in this DO loop to one. There may be a problem within FTN95 in that it may be treating SPREAD as "elemental" when it is not. Either way, I have no evidence at the moment that there is a problem with the code for SPREAD.
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 All times are GMT + 1 HourGoto page 1, 2, 3  Next Page 1 of 3