forums.silverfrost.com

KL · Joined: 16 Nov 2009 Posts: 155

For the outer vector product an intrinsic procedure is not available. It can be calculated by two simple do loops, the spread function or may be by other techniques. The "spread solution" is recommended in several references (for instance in Numerical Recipes). In the following test program both methods are tested:

PaulLaidler · Posted: Wed Jun 20, 2018 9:09 am Post subject:

The key is that for FTN95 you need an explicit call to MATMUL. Presumably gFortran is clever enough to work out that a matrix product is intended.

My guess is that FTN95 makes a literal interpretation and ends up doing the SPREAD for each element in turn.

You will need to check the details but for FTN95 you will need something like

KL · Joined: 16 Nov 2009 Posts: 155

Thank you very much, Paul.

Unfortunately, my last sentence of my mail was cut-off:

"Of course, one could question the method using the spread procedure as being too complicated for the offered advantages (parallel computation possible). Anyhow, I report this for eventual checking."

I will have a closer look to your proposal and give feedback. But very likely I will not use the "spread solution". My problem was that this was included in another solution and it took some time to find out the source of the problem.

Klaus

KL · Joined: 16 Nov 2009 Posts: 155

Thank you very much, Paul.

Unfortunately, my last sentence of my mail was cut-off:

"Of course, one could question the method using the spread procedure as being too complicated for the offered advantages (parallel computation possible). Anyhow, I report this for eventual checking."

I will have a closer look to your proposal and give feedback. But very likely I will not use the "spread solution". My problem was that this was included in another solution and it took some time to find out the source of the problem.

Klaus

JohnCampbell · Joined: 16 Feb 2006 Posts: 2627 Location: Sydney

Klaus,

I would replace CPU_Time with Elapse_Time, which calls System_Clock or RDTSC_VAL@(). You won't get a meaningful result with CPU_Time for your first test.

I am trying m=n=1000, which certainly shows FTN95 implementation of SPREAD does not work well. ( Still hasn't finished ! I was expecting a stack overflow error )

Alternatives could be:
forall (i=1:m, j=1:n) c(i,j) = a(i)*b(j)
or
Do j = 1, n
c(:,j) = a(:) * b(j)
End Do

You want "I" to be the inner loop, which my second option guarantees. With FORALL, the order is uncertain.

Given the uncertainty of the stack requirement for SPREAD, how could this be a recommended approach ?

John

PaulLaidler · Posted: Thu Jun 21, 2018 7:03 am Post subject:

Please note that there is no implied problem with SPREAD. In this context, FTN95 does require an explicit call to MATMUL and this seems right to me.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2627 Location: Sydney

KL · Joined: 16 Nov 2009 Posts: 155

Dear Paul, dear John,

thank you for your remarks. I am aware of the deficiency of my time evaluation. I just wanted to demonstrate the difference of computational costs between both methods: 2 (or even more) orders of magnitude!

The problem arises when using standard programs from Numerical Recipes. A quick look reveals that 13 procedures make use of the utility program outerprod, which uses the “spread solution”. Amongst these programs are important solvers such as gaussj, ludcmp and svdcmp. Not modifying the procedure outerprod ends in a disaster for larger systems when using ftn95. Whether one uses do loops, forall loops in various variations instead does not really matter. How would your proposal, Paul, look in detail taking also different m and n into account? Or is your position that the “spread solution” is an unnecessary, excessive use of Fortran rules?

I would like to add that any discussion what formal requirements intrinsic Fortran procedures must fulfill by far exceeds my knowledge.
Klaus

PaulLaidler · Posted: Thu Jun 21, 2018 10:46 am Post subject:

John

My point is that the program supplied by Klaus does not do what he intends in the calls to SPREAD.

The form

c = SPREAD*SPREAD

does not do a MATMUL on the right and then an element-by-element copy to the left.

A MATMUL is required to get what is intended. I am only guessing but this seems to me to be the mostly likely explanation.

mecej4 · Joined: 31 Oct 2006 Posts: 1933

The computation of the outer product of two m X n matrices involves m.n multiplications plus 3 m.n load/store operations.

There is no need to do matrix multiplication in this. Firstly, the two input matrices A and B cannot be multiplied unless m = n. Secondly, the number of arithmetic operations needed for doing matrix multiplication is an order of magnitude higher than for forming the outer product. For these reasons, I do not see why MATMUL made its way into the discussion.

I find that with Gfortran the calculation with the "straightforward method" can be made about 10 times faster if the nested DO loops are replaced by

KL · Joined: 16 Nov 2009 Posts: 155

The "spread solution" is not my idea. It has been taken from Numerical Recipes in Fortran 90, Second edition, where it is discussed in detail (Chapter 22 and 23). Klaus

JohnCampbell · Joined: 16 Feb 2006 Posts: 2627 Location: Sydney

The link I provided demonstrates a number of loop alternatives, including array syntax instead of the inner loop, which was the fastest option with FTN95.
As Klaus reported, it also shows that FTN95 implementation of SPREAD is extremely slow. This needs to be reviewed. I used both win32 and /64 in tests.

PaulLaidler · Posted: Thu Jun 21, 2018 12:44 pm Post subject:

John

As I understand it, the initial program on this thread does not demonstrate that the FTN95 SPREAD function is slow. If you can generate a direct and simple use of SPREAD that shows its slowness then please let me know.

The program as shown calls SPREAD 20000 times whilst the alternative with an explicit call to MATMUL calls SPREAD twice.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2627 Location: Sydney

Paul,

As I posted yesterday: See the linked test program that shows SPREAD is very slow.

https://www.dropbox.com/s/4r5kzn5onefvkka/kl2.f90?dl=0

Download it and run this test. I put write statements after each use of SPREAD.
Can't you see very clearly how slow SPREAD is for Klaus's modified example ?

Where is the 20,000 times ? It takes ages to do 1 iteration of the following loop. ( 2.6 seconds for a single SPREAD ! )

PaulLaidler · Posted: Fri Jun 22, 2018 7:11 am Post subject:

John

The way that I tested this was to put a print statement within the code for the SPREAD function. I have done the same for your test program with similar results and this was after changing line 148 to reduce the number of passes in this DO loop to one.

There may be a problem within FTN95 in that it may be treating SPREAD as "elemental" when it is not. Either way, I have no evidence at the moment that there is a problem with the code for SPREAD.