Paul,
I am trying to understand how real6 could be done and real10 is done.
My question re real*10 is : Is it hardware implemented, with all calculations done in the 80-bit math co-processor, or is that an obsolete technology?
To test out this I wrote a program that repeated vector dot product on 1000 element arrays as real8 or real10, using dot_product intrinsic or simple function which has a loop:-
REAL*10 FUNCTION VECSUM_10 (A, B, N)
!
! Performs a vector dot product VECSUM = [A] . [B]
! account is taken of the leading zero terms in the vectors
!
integer*4, intent (in) :: n
real*10, dimension(n), intent (in) :: a
real*10, dimension(n), intent (in) :: b
!
real*10 c
integer*4 i
!
c = 0
do i = 1,n
if (a(i) /= 0) exit
end do
do i = i,n
c = c + a(i)*b(i)
end do
!
vecsum_10 = c
return
!
end
Compiling without /opt The results are :-
Test Type Routine Seconds Ratio
real*8 test vecsum_8 4.28 1.00
real*8 test dot_product 4.276 1.00
real*10 test vecsum_10 5.515 1.29
real*10 test dot_product 7.432 1.74
real*4 test vecsum_4 2.923 0.68
Real10 takes 30% longer that real8, but 74% longer using the dot_product intrinsic. Real4 takes only 68% of real8 computation time.
This indicates to me that real10 is not simply taking the 80-bit result from the math co-processor while real8 and real*4 truncate the output. Either this or the instructions to move 4, 8 or 10 bytes take a lot of time.
Any advice ?
John