|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Thu May 10, 2012 2:11 pm Post subject: |
|
|
Dan,
The answer must be to multi-thread the J loop, as this loop group is initiated 160,000 times.
The K loop is initiated 270,000,000 times, which is what I previously tried to multi-thread. Too many threads, which is why I did not get a good result.
I introduced the K loop so there would be no call inside the J loop.
I'll look at your thread example.
Back in 70's, operations counts looked at the number of multiplies and divides, as these were considered the time consuming parts of the run. That is no longer the case !
For this example, I presented 3 alternative codes, 1 that takes 287 seconds and 2 that takes 950 seconds. We can assume the multiplies take only 287 seconds and something else takes 660 seconds. Thats 11 minutes of doing nothing, worse than when windows starts up.
Considering :
A(A_ptr_0+J) = A(A_ptr_0+J) - Dot_Product ( A(A_ptr_b:A_ptr_t), B(B_ptr_b:B_ptr_t) )
I thought this might be slow due to the array sections, but the K loop :
do k = JEQ_bot,J-1
c = c + A(A_ptr_0+k) * B(B_ptr_0+k)
end do
This has similar poor performance. ( I thought the K index, common to both A and B would have been efficient )
There must be another reason why it processes so slowly, and it is difficult to understand what it is.
Eddie,
I use EXTERNAL like /implicit_none, to document the functions being used.
Using functions as an argument has never appealed. I hope you note that the F77 syntax is far better than F90 in the above example !
John |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Thu May 10, 2012 7:56 pm Post subject: |
|
|
Well, that does not sound good. Processor must spend at least 0.01 second (from my very rough estimation. Could be 10 times more) in the thread for overhead of creating the thread to be small. Think if you can change the way the code works.
But did i understood correctly that the run takes around 10 min to complete on laptop? I'd say that's fast already to think about modifications! |
|
Back to top |
|
|
LitusSaxonicum
Joined: 23 Aug 2005 Posts: 2390 Location: Yateley, Hants, UK
|
Posted: Thu May 10, 2012 8:17 pm Post subject: |
|
|
John,
Surely the best way to indicate that something is a function is to include FUNCTION (or _FN) in its name. You don't have to for a subroutine, as it is CALLed.
For things that have to be EXTERNAL like Clearwin callbacks, I found that FTN95 is happy to be EXTERNAL without explicitly giving the INTEGER attribute, although I have begun to adopt the Fortran 90-ish version
INTEGER, EXTERNAL :: etc.
The habit of declaring routines as EXTERNAL when they aren't passed as parameters to a subprogram seems to have originated with named BLOCKDATA routines, as otherwise one could link without them and nobody would be any the wiser ...
Your timing problem looks to me like it is caused by differing numbers of cache misses between the various approaches.
Eddie |
|
Back to top |
|
|
davidb
Joined: 17 Jul 2009 Posts: 560 Location: UK
|
Posted: Fri May 11, 2012 7:13 am Post subject: Re: |
|
|
JohnCampbell wrote: |
By placing the inner loop in a parallel construct, there are 150,000 x 1,500 = 225 million initiated threads involving a dot product of size 1 to 1,500. This loop calculates the modification to a single element of column (:,j) : too many threads ! My attempt (1 year ago) was on the inner loop and did not work well since I did not have enough processors and there were too many threads.
John |
Forgive me, but this doesn't make sense. The number of threads and the number of iterations are different. You can have a loop doing 16 million iterations, shared across 8 threads with each thread doing 2 million iterations. The work should be shared out amongst the threads.
If you want to parallelise your inner loop (the dot product) you would do something like the following in OpenMP. Note, I haven't coded your offsets to the arrays.
Also I have vectorised contributions from successive 4 array elements. You might change this to 2 depending on the hardware and whether you are working in single or double precision. Keep it at 4 for double precision if your processor supports AVX, set it to 2 if it supports SSE2. (Obviously your compiler needs to support OpenMP and vectorization too!)
A mix of parallelising using threads and parallelising using vectorization within each thread should give good performance.
I have tried this and I get just less than a 4 X speed up with 4 cores when N is 1000000.
Code: |
! N is number of iterations
NN = MOD(N, 4)
C = SUM(A(1:NN)*B(1:NN))
S = 0.0D0
!$OMP PARALLEL DO PRIVATE(J) REDUCTION(+:S)
DO J=NN+1,N,4
S = S + SUM(A(J:J+3)*B(J:J+3))
END DO
C = C + S
|
The !$OMP directive creates the parallel region, and shares out the work of the DO loop amongst the threads. You can limit the number of threads by adding more clauses to the directive, but I have just assumed you want to use the maximum number (the number of cores usually). _________________ Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl
Last edited by davidb on Fri May 11, 2012 7:56 am; edited 2 times in total |
|
Back to top |
|
|
Wilfried Linder
Joined: 14 Nov 2007 Posts: 314 Location: D�sseldorf, Germany
|
Posted: Fri May 11, 2012 7:40 am Post subject: |
|
|
I would like to use parallel processing, and after reading this discussion, I try to find information about OpenMP. On the official page http://openmp.org/wp/openmp-compilers/ is a list of compilers supporting OpenMP - FTN95 is missing. Is it nevertheless possible to use OpenMP together with FTN95?
Regards - Wilfried |
|
Back to top |
|
|
davidb
Joined: 17 Jul 2009 Posts: 560 Location: UK
|
Posted: Fri May 11, 2012 8:01 am Post subject: |
|
|
Hi Wilfried,
Silverfrost's FTN95 doesn't support OpenMP. But it will compile OpenMP programs since it just treats the directives as comments. I use FTN95 to debug my programs and another compiler (Ifort, Gfortran etc) to create executables to run which enables the OpenMP features.
David. _________________ Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl |
|
Back to top |
|
|
Wilfried Linder
Joined: 14 Nov 2007 Posts: 314 Location: D�sseldorf, Germany
|
Posted: Fri May 11, 2012 8:52 am Post subject: |
|
|
Thank you, David. What a pity... FTN95: No development for 64 bits, no support of parallel processing, no support of OpenMP... It's like a Ferrari with an 80-hp-engine. I really hope that ClearWin will be prepared to work together with other compilers like Intel, then I will start using that compiler for calculations and the very good ClearWin library only for the GUI.
Fortran is the language of choice for number-crunching software, and here we need access to really all of the machines's computing power. For other purposes, I wouldn't use Fortran.
Regards - Wilfried |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Fri May 11, 2012 9:56 am Post subject: |
|
|
David,
Thanks for your comments. They have been very helpful.
To explain the calculation I am performing:
There are 154,220 equations in a blocked skyline, using 4 blocks.
Giving 159,139 column reductions (repeated for columns spanning blocks)
These results in 271,679,207 Dot_Product calls, roughly the number of numbers in the matrix profile.
The number of iterations in each Dot Product range from 1 to 9,861
0.5% < 10
2% < 34
10% < 100
66% in the range 400-1800
10% > 1,800
I hope this better describes the profile of my problem.
I was under the impression there would be new threads generated for each Dot_Product call if the process was parallelised using OpenMP.
I thought the use of vector instructions or AVX is a different approach which could suit yours and my example.
I certainly need to understand the difference between these approaches or if they are the same process.
My impression is that vector instructions or AVX involve vector instructions in a single processor and is the solution I should be chasing.
Back to the FTN95 solution
I have now produced 4 approaches to the "Dot_Product" call.
1) The old F77 code, using a DO loop:
c = 0
do k = 0,jband-1
c = c + A(A_ptr_b+k) * B(B_ptr_b+k)
end do
A(A_ptr_0+J) = A(A_ptr_0+J) - c
2) Using Dot_Product intrinsic with array sections:
A(A_ptr_0+J) = A(A_ptr_0+J) - Dot_Product (A(A_ptr_b:A_ptr_b+jband-1), B(B_ptr_b:B_ptr_b+jband-1) )
3) A DO loop inside a F77 wrapper
A(A_ptr_0+J) = A(A_ptr_0+J) - Vec_Sum (A(A_ptr_b), B(B_ptr_b), JBAND)
where
REAL*8 FUNCTION VEC_SUM (A, B, N)
!
integer*4, intent (in) :: n
real*8, dimension(n), intent (in) :: a
real*8, dimension(n), intent (in) :: b
!
c = 0
do k = 1,N
c = c + A(k) * B(k)
end do
vec_sum = c
end
4) A Dot_Product inside a F77 wrapper
where
REAL*8 FUNCTION VEC_SUM (A, B, N)
!
integer*4, intent (in) :: n
real*8, dimension(n), intent (in) :: a
real*8, dimension(n), intent (in) :: b
!
vec_sum = dot_product ( a, b )
end
Both options 1 and 2 take about 950 seconds while both wrapper options 3 and 4 take 300 seconds on my core i5 notebook using FTN95 Ver. 6.10 with /opt
I'm not sure if 4) becomes 3 or 3) becomes 4 when compiled ?
I don't think FTN95 has an optimised Dot_Product library function.
I like Eddie's suggestion that it is due to "cache misses" but I have no idea of what that could really mean or how to address it.
Again, 11 minutes of doing nothing due to a one line syntax change is also difficult to understand.
John |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Fri May 11, 2012 6:22 pm Post subject: |
|
|
Wilfried,
Absence of 64bit is really worst miss, i agree. But i hope for the movement here.
The OpenMP probably too but you have some choices - you can do some parallelization with FTN95 and do that NATIVELY what no other compiler can do (non-standard way, of course, same as non-standard is with the OpenMP). NVIDIA CUDA or AMD's own parallel compiler libraries support would be probably even better because our graphics cards are actually massively parallel vector supercomputers
John,
It's tricky for outsiders to suggest something they do not know in all details. I can only suggest something very general which works if you satisfy some requirements.
Does your method allow division of the workload into Np independent threads - one per processor core - which will do their 1/Np portion of job or not ? In other words, can you fit your code into this parallel program demo where i do non-threaded do loop run and then divide it into two threads with 1/2 job and measure time for all these tasks? You should see speedup almost 100% with two cores (or 4x, 6x .... if you add more threads same way with multicore CPU)
Code: | ! Multithreading example mult04.for
! Dan R Right 2012
!
! Watch in Task Manager how two threads run1 and run2
! grab two processor cores working in parallel
!
! Compilation: ftn95 mult04.FOR /clr /link /free /multi_threaded
!
include <clearwin.ins>
EXTERNAL run1, run2
common /abc_/kEnded1, kended2
character*1 cha
!..... straight non-threaded run
print*,' Run w/o threads started'
call clock@ (time_start)
d=2.0
do k=1,20
do i=1,10000000
d=alog(exp(d))
enddo
enddo
call clock@ (time_finish)
print*, 'Elapsed time without threads=', time_finish-time_start
!...multithreaded
call clock@ (time_start)
CALL CREATE_THREAD@(run1,22)
CALL CREATE_THREAD@(run2,23)
!...wait till both threads end
do while (kEnded1==0.or.kEnded2==0)
call sleep1@(0.1)
enddo
call clock@ (time_finish)
print*, 'Elapsed time with threads=', time_finish-time_start
print*, 'Enter any key to exit'
read(*,*) cha
END
!----------------
subroutine run1()
include <clearwin.ins>
common /abc_/kEnded1, kended2
lock; print*,'Thr.1 started'; end lock
kEnded1=0
d=2.0
do k=1,10
do i=1,10000000
d=alog(exp(d))
enddo
enddo
kEnded1=1
lock; print*,'Thr.1 ended' ; end lock
end
!----------------
subroutine run2()
include <clearwin.ins>
common /abc_/kEnded1, kended2
lock; print*,'Thr.2 started' ;end lock
kEnded2=0
d=2.0
do k=1,10
do i=1,10000000
d=alog(exp(d))
enddo
enddo
kEnded2=1
lock; print*,'Thr.2 ended' ;end lock
end |
Last edited by DanRRight on Sat May 12, 2012 8:54 am; edited 3 times in total |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Sat May 12, 2012 1:05 am Post subject: |
|
|
Dan,
I've looked at this problem a lot and one issue that worries me is why the poor FTN95 performance in approach 3) compared to 1)
Basically changing from:
Code: | !3) - good
do k = 1,n
c = c + A(k) * B(k)
end do
to:
! 1) - bad
do k = 0,n-1
c = c + A(A_ptr_b+k) * B(B_ptr_b+k)
end do
|
This results in a 3-fold increase in run time.
I've tested this on both a Core i5 and Xeon processor and got similar results.
I'm not sure if 3) has fluked a processor optimisation or 1) has a basic inefficiency, but the results are dramatic. It would be good to have an optimised/vectorised Dot_Product to apply to case 4) and see any possible change or if there is already some processor optimisation.
I think the performance, as in case 1) is similar to that seen in the Polyhedron benchmark tests.
I'm not sure if this example can be a generalised guide to DO loops with FTN95 or an isolated case.
I am not as enthusiastic about the OpenMP approach, as I think the new vector instructions have much to offer for this problem.
Thanks for your comments, as I will consider multiple threading further.
John |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7934 Location: Salford, UK
|
Posted: Sat May 12, 2012 8:04 am Post subject: |
|
|
John
I have noted this for investigation. What is the purpose of A_ptr_b and B_ptr_b? How do they vary?
An initial investigate would take a look at the /explist for each case and also possibly...
Code: | do k = 0,n-1
ia = A_ptr+k
ib = B_ptr+k
c = c + A(ia) * B(ib)
end do |
|
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2828 Location: South Pole, Antarctica
|
Posted: Sat May 12, 2012 9:02 pm Post subject: |
|
|
I'd like to remind all that small demo code would simplify understanding of the problem and speed up the solutions by the factor of .... 1000.
Optimization is trickiest problem in any language and the main achievement and pride of Fortran. Besides its inherent simplicity and legacy of developed libraries, it's the primary reason Fortran still survives. My suggestion is to ask the hard questions in comp.lang.fortran where you may have a luck finding the ultimate fortran pro in your subject. 2-3 fortran compiler developers and fortran standard committee people were often there and hidden tricks of optimization were always the favorite topic there. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Sun May 13, 2012 11:13 am Post subject: |
|
|
Dan,
I've generated a simplified code, based on the direct reduction of a profile solver. I've included 6 inner loop options, the 4 I presented plus 2 variations of Paul's post.
It should be compiled with /lgo or /opt /lgo.
To change the size of the problem, I'd recommend changing "max_size". It can run up to 2gb on Win_7-64 or probably 1.3gb on XP_32.
The results show that the wrapper improves by a factor of 3 with FTN95 /opt.
I need to test it with other compilers.
253 lines of code, so see how much can be copied per post.
John
Code: | module big_arrays
integer*4, parameter :: mb_8 = 1024*1024/8 ! real*8 variables in 1 mb
!
integer*4, parameter :: max_eqn = 200000 ! maximum equations possible
integer*4, parameter :: max_band = 2800 ! maximum profile size
integer*4, parameter :: max_size = 100*mb_8 ! maximum size profile matrix (CHANGE THIS)
!
integer*4, allocatable, dimension(:) :: nb ! profile storage
real*8, allocatable, dimension(:) :: B ! matrix storage
!
integer*4 neq
integer*4 mstor
integer*4 avg_band
!
end module big_arrays
use big_arrays
!
integer*4 ieq, iband, ipos, method, k
real*8 c_start(6), c_end(6), x, dt, ops_per_sec
integer*8 num_ops, num_call
!
character method_name(6)*30
data method_name / 'Original DO Loop', 'F90 syntax', 'F77 wrapper for DO Loop', &
'F77 wrapper for Dot_Product', 'Paul option', 'alternate Paul option' /
!
! generate profile in nb
!
allocate ( nb(max_eqn) )
!
iband = 1
nb(1) = 1
nb(2) = 0
do ieq = 1, max_eqn
call random_number (x)
iband = (iband + x * max_band) / 2
nb(ieq+2) = nb(ieq+1) + min (iband,ieq)
if (nb(ieq+2) > max_size) exit
end do
!
! generate diagonal pointer
neq = ieq-1
do ieq = 0,neq
nb(ieq+2) = nb(ieq+2) + neq+2
end do
!
mstor = nb(neq+2)
avg_band = mstor / neq
x = real(mstor) / real(mb_8)
!
write (*,2001) neq, ' equations'
write (*,2001) avg_band, ' average profile'
write (*,2001) mstor, ' coefficients'
write (*,2002) x , ' storage (mb)'
write (*,1001) ' '
2001 format (i10,a)
2002 format (f10.2,a)
!
! initialise the matrix
!
allocate ( B(mstor) )
do ieq = 1,neq
ipos = ieq+2
iband = nb(ipos) - nb(ipos-1)
do k = nb(ipos-1)+1, nb(ipos)
b(k) = 1
end do
b(nb(ipos)) = iband
end do
!
do method = 1,6
!
call cpu_time (c_start(method))
!
call redblk (B, NB, B, NB, method, num_ops, num_call)
!
call cpu_time (c_end(method))
dt = c_end(method)-c_start(method)
ops_per_sec = dble (num_ops) / dt
write (*,1001) ' Method', method, ' CPU_time =',dt,' ops/sec =',ops_per_sec, method_name(method)
end do
1001 format (a,i2,a,f9.3,a,en12.2,2x,a)
!
write (*,*) num_call,' Number of dot_product calls'
write (*,*) num_ops, ' Number of itterations'
!
end
SUBROUTINE REDBLK (A, NA, B, NB, method, num_ops, num_call)
!
! reduces block 'A' by block 'B'
!
REAL*8 A(*), B(*)
INTEGER*4 NA(*), NB(*), method
integer*8 num_ops, num_call
INTRINSIC MIN, MAX
!
INTEGER*4 I, IBOT, ITOP, IPOINT, A_ptr_0, A_ptr_I, IL, &
J, JBOT, JTOP, JB, JT, JI
|
|
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Sun May 13, 2012 11:15 am Post subject: |
|
|
continued Code: | !
IBOT = NA(1) ! FIRST EQUATION
ITOP = IBOT + NA(2) - 3 ! LAST EQUATION
!
JBOT = NB(1)
JTOP = JBOT + NB(2) - 3
num_ops = 0
num_call = 0
!
DO I = IBOT,ITOP
IPOINT = I - IBOT + 3
A_ptr_0 = NA(IPOINT) - I
!
IL = I - (NA(IPOINT) - NA(IPOINT-1)) + 1
JB = MAX (IL+1,JBOT)
JT = MIN (I-1 ,JTOP)
!
! patch to test the column reduction, returning stats
!
call redcol (A(A_ptr_0+IL), B, NB, JB, JT, IL, method, num_ops, num_call)
IF (IBOT /= JBOT) CYCLE
!
! NOW REDUCE PIVOT COLUMN ON ITSELF (now resert values)
!
A_ptr_I = A_ptr_0 + I
DO J = IL,JT
JI = A_ptr_0+J
A(JI) = 1
END DO
A(A_ptr_I) = NA(IPOINT) - NA(IPOINT-1)
!
END DO
!
RETURN
END
SUBROUTINE REDCOL (A, B, NB, JB, JT, IEQ_bot, method, num_ops, num_call)
!
! Reduces vector 'A' by block 'B'
! Allows for skyline matrix to be stored in multiple blocks
! Vector A might or might not be in block B
!
REAL*8 A(*), & ! column to be reduced provided as A(IEQ_bot:)
B(*) ! skyline block of columns
INTEGER*4 NB(*), & ! block structure of B
JB, & ! first equation to reduce (is in block B)
JT, & ! last equation to reduce (is in block B and < equation A)
IEQ_bot, & ! first active equation in column A
method
integer*8 num_ops, num_call
!
INTEGER*4 J, JEQ_bot, JBOT, JPOINT, JBAND, k, ia, ib, &
A_ptr_0, B_ptr_0, A_ptr_b, B_ptr_b, A_ptr_t, B_ptr_t
REAL*8 Vec_Sum, Vec_Sum_d, c
EXTERNAL Vec_Sum, Vec_Sum_d ! Dot_Product using F77 syntax
!
IF (JB > JT) RETURN
JBOT = NB(1) ! first equation column in Block B
A_ptr_0 = 1-IEQ_bot ! virtual pointer to A(0)
!
DO J = JB,JT ! loop over range of equations
JPOINT = J - JBOT + 3 ! pointer to index for column J in block B
B_ptr_0 = NB(JPOINT) - J ! virtual pointer to B(0,j)
!
JEQ_bot = NB(JPOINT-1) - B_ptr_0 + 1 ! first active equation of column J = J - jband + 1 where jband = NB(JPOINT)-NB(JPOINT-1)
IF (IEQ_bot > JEQ_bot) JEQ_bot = IEQ_bot
!
JBAND = J - JEQ_bot ! active bandwidth for equation J and equation A
IF (JBAND < 1) CYCLE
!
A_ptr_b = A_ptr_0 + JEQ_bot
B_ptr_b = B_ptr_0 + JEQ_bot
num_ops = num_ops + JBAND
num_call = num_call + 1
!
select case (method)
!
case (1) ! if (method == 1) then ! Original DO Loop
c = 0
do k = JEQ_bot,J-1
c = c + A(A_ptr_0+k) * B(B_ptr_0+k)
end do
A(A_ptr_0+J) = A(A_ptr_0+J) - c
!
case (2) ! else if (method == 2) then ! F90 syntax
!
A_ptr_t = A_ptr_b + JBAND - 1
B_ptr_t = B_ptr_b + JBAND - 1
A(A_ptr_0+J) = A(A_ptr_0+J) - Dot_Product (A(A_ptr_b:A_ptr_t), B(B_ptr_b:B_ptr_t) )
!
|
|
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2560 Location: Sydney
|
Posted: Sun May 13, 2012 11:18 am Post subject: |
|
|
hopefully the end Code: | case (3) ! else if (method == 3) then ! F77 wrapper for DO loop
!
A(A_ptr_0+J) = A(A_ptr_0+J) - Vec_Sum (A(A_ptr_b), B(B_ptr_b), JBAND)
!
case (4) ! else if (method == 4) then ! F77 wrapper for Dot_Product
!
A(A_ptr_0+J) = A(A_ptr_0+J) - Vec_Sum_d (A(A_ptr_b), B(B_ptr_b), JBAND)
!
case (5) ! else if (method == 5) then ! Paul option, simplified subscripts
c = 0
do k = JEQ_bot,J-1
ia = A_ptr_0+k
ib = B_ptr_0+k
c = c + A(ia) * B(ib)
end do
A(A_ptr_0+J) = A(A_ptr_0+J) - c
!
case (6) ! else if (method == 6) then ! alternate Paul option (auto increment ?)
c = 0
do k = JEQ_bot,J-1
c = c + A(A_ptr_b) * B(B_ptr_b)
A_ptr_b = A_ptr_b+1
B_ptr_b = B_ptr_b+1
end do
A(A_ptr_b) = A(A_ptr_b) - c
end select
!
END DO
!
RETURN
END
REAL*8 FUNCTION VEC_SUM (A, B, N)
!
! Performs a vector dot product VECSUM = [A] . [B]
! account is NOT taken of the zero terms in the vectors
!
real*8 a(*), b(*), c
integer*4 n, k
!
c = 0
do k = 1,N
c = c + A(k) * B(k)
end do
vec_sum = c
!
RETURN
!
END
REAL*8 FUNCTION VEC_SUM_d (A, B, N)
!
! Performs a vector dot product VECSUM = [A] . [B]
!
integer*4, intent (in) :: n
real*8, dimension(n), intent (in) :: a
real*8, dimension(n), intent (in) :: b
!
vec_sum_d = dot_product ( a, b )
!
RETURN
!
END
|
|
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|