Dan,
You prompted me to look at the way I call the inner loop of the equation reduction : Dot_Product.
calling REDCOL is effectively the outer loop, called 159,139 for each equation and each block
Loop : DO J = JB,JT is the next loop, passing 271,679,207 times while
the loop inside Dot_Product / VecSum is the inner loop
I compared two forms of the inner loop call, using dot_product or my interface routine and produced a 3 x run time.
287 seconds went to 953 seconds, by changing
A(A_ptr_0+J) = A(A_ptr_0+J) - VecSum (A(A_ptr_0+JEQ_bot), B(B_ptr_0+JEQ_bot), JBAND)
to
A(A_ptr_0+J) = A(A_ptr_0+J) - Dot_Product ( A(A_ptr_b:A_ptr_t), B(B_ptr_b:B_ptr_t) )
Given it takes 287 seconds to do all the real*8 multiplications, what did it do for the other 666 seconds ?
both were compiled with /opt
My F77 form of the call is:
SUBROUTINE REDCOL_77 (A, B, NB, JB, JT, IEQ_bot)
!
! Reduces vector 'A' by block 'B'
! Allows for skyline matrix to be stored in multiple blocks
! Vector A might or might not be in block B
!
REAL*8 A(*), & ! column to be reduced provided as A(IEQ_bot:)
B(*) ! skyline block of columns
INTEGER*4 NB(*), & ! block structure of B
JB, & ! first equation to reduce (is in block B)
JT, & ! last equation to reduce (is in block B and < equation A)
IEQ_bot ! first active equation in column A
!
INTEGER*4 A_ptr_0, B_ptr_0, JEQ_bot, J, JBOT, JPOINT, JBAND
REAL*8 VecSum
EXTERNAL VecSum ! Dot_Product using F77 syntax
!
IF (JB > JT) RETURN
JBOT = NB(1) ! first equation column in Block B
A_ptr_0 = 1-IEQ_bot ! virtual pointer to A(0)
!
DO J = JB,JT ! loop over range of equations
JPOINT = J - JBOT + 3 ! pointer to index for column J in block B
B_ptr_0 = NB(JPOINT) - J ! virtual pointer to B(0,j)
!
JEQ_bot = NB(JPOINT-1) - B_ptr_0 + 1 ! first active equation of column J = J - jband + 1 where jband = NB(JPOINT)-NB(JPOINT-1)
IF (IEQ_bot > JEQ_bot) JEQ_bot = IEQ_bot
JBAND = J - JEQ_bot ! active bandwidth for equation J and equation A
IF (JBAND < 1) CYCLE
!
A(A_ptr_0+J) = A(A_ptr_0+J) - VecSum (A(A_ptr_0+JEQ_bot), B(B_ptr_0+JEQ_bot), JBAND) ! 99% of all the work is done here
!
END DO
!
RETURN
END
REAL*8 FUNCTION VECSUM (A, B, N)
!
! Performs a vector dot product VECSUM = [A] . [B]
! account is NOT taken of the zero terms in the vectors
!
integer*4, intent (in) :: n
real*8, dimension(n), intent (in) :: a
real*8, dimension(n), intent (in) :: b
!
vecsum = dot_product ( a, b )
RETURN
!
END
CPU TIMES FOR EQUATION REDUCTION
Description Num Cpu I/O Avg
1 READ BLOCK A 4 1.00 0.02 0.256
3 MOVE BLOCK B 3 0.42 0.02 0.146
4 REDUCE BLOCK A 4 287.26 1.68 72.235
5 APPLY BLOCK B 3 5.38 0.11 1.829
6 WRITE BLOCK A 4 1.17 0.06 0.308
7 DIAGONAL ANALYSIS 2 0.02 -0.01 0.003
99 TOTAL REDUCTION 4 295.25 1.88 74.282
[/code]