|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
JohnCampbell
Joined: 16 Feb 2006 Posts: 2556 Location: Sydney
|
Posted: Tue Mar 14, 2017 3:38 am Post subject: |
|
|
Dan,
I don't use Laipe. I have been quoting Laipe performance reported on the equation.com web site. I find their quoted single thread performance to be incredibly slow. If you start from such a slow base, the improvements with multiple threads are not that significant, and ignore the real multi-thread problems that occur when combining AVX and !$OMP.
To achieve the single thread results they quote, they must have turned off vector instructions, not used -ffast-math and possibly other delays. Why ?
I can use AVX calculations and !$OMP on an i7-4790K and get better performance than the best equation.com quoted with these many core / many thread processors they have used, so why use Laipe ?
With FTN95, the latest results I posted show that axpy4@ (or axpy8@) give good vector performance. I would certainly recommend this approach where possible. I also demonstrated that a cache smart approach is important for good AVX performance.
I have not used multi-thread libraries with FTN95 or FTN95 /64; not sure how robust this would be.
John |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2826 Location: South Pole, Antarctica
|
Posted: Tue Mar 14, 2017 9:55 am Post subject: Re: |
|
|
JohnCampbell wrote: | Dan,
I don't use Laipe. I have been quoting Laipe performance reported on the equation.com web site. |
then there are just even more words |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2556 Location: Sydney
|
Posted: Wed Mar 15, 2017 2:34 am Post subject: |
|
|
Paul and Robert,
I think what Dan may be asking is could a third party .dll be linked into a FTN95 executable, either 32 or 64 bit ?
The .dll being proposed is a multi-thread computation, generated either from gFortran or ifort, that has !$OMP capabilities.
Could the following code (or a subset) be compiled in gFortran with options of -O3 -mavx -ffast-math -fopenmp then linked into a FTN95 calling program ?
Basically, we have seen the opposite with clearwin64.
John
Code: | subroutine laipe_matmul_cache (a,b,c,nra,nca,ncb)
use precision
! matrix multiplication : multi thread and cacheing strategy
integer*4 nra,nca,ncb, j,k, k1,k2
real(dp) :: a(nra,nca), b(nca,ncb)
real(dp) :: c(nra,ncb)
!
integer*4 num_cache_columns, nk
external num_cache_columns
!
! determine columns of A per pass
nk = num_cache_columns (nra,nca)
!
do k1 = 1,nca,nk
k2 = min ( k1+nk-1, nca)
!
!$OMP PARALLEL DO shared (a,b,c,nra,nca,ncb,k1,k2) private (j,k)
do j = 1,ncb
if (k1==1) c(:,j) = 0
do k = k1,k2
!! c(1:nra,j) = c(1:nra,j) + a(1:nra,k) * b(k,j)
call vec_add_dp ( c(1,j), a(1,k), b(k,j), nra )
end do
end do
!$OMP END PARALLEL DO
!
end do ! cache size passes of A
!
end subroutine laipe_matmul_cache
subroutine vec_add_dp ( y, x, a, n )
! DAXPY interface routine
use precision
integer*4 :: n
real(dp) :: y(n), x(n), a
!
INTEGER*8 :: n8
n8 = n
call AXPY4@(y,x,n8,a) ! FTN95 /64 routine
!
! y = y + x * a ! array syntax alternative
!
! do i = 1,n ! do loop alternative
! y(i) = y(i) + x(i) * a
! end do
end subroutine vec_add_dp
integer*4 function num_cache_columns (nra,nca)
!
! matrix multiplication : multi thread and cacheing strategy
! find the number of columns of A to store in each pass of multiplication
! number is based on
! size of cache and
! number of cores (threads) in use
!
use precision ! byte_size
use laipe_test ! cache_size, use_cores, nk, ncp
integer*4 nra, & ! number of rows in A
nca ! number of columns of A
!
! Check that A is cached to 5mb
! nk = number of columns per cache pass
! ncp = number of passes
!
! Estimate number of columns for cache limit
nk = (cache_size/byte_size) / nra - use_cores ! allow 1 column for C for each thread
!
if ( nk > nca ) then ! too many : no cache strategy required
nk = nca
ncp = 1
!
else if ( nk <= use_cores ) then ! too few : no smaller than 1 column per thread
nk = use_cores
ncp = (nca+nk-1)/nk ! number of passes
!
else
ncp = (nca+nk-1)/nk ! number of passes
nk = (nca+ncp-1)/ncp ! even up columns per pass
if ( use_cores > 1 ) & ! make sure multiple of use_threads
nk = ( (nk+use_cores-1)/use_cores ) * use_cores ! round up to columns as multiple of cores
!
end if
!
write (*,*) ' A is cached to',ncp,' passes of',nk,' for',nca,' columns'
!
num_cache_columns = nk
!
end function num_cache_columns |
|
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7933 Location: Salford, UK
|
Posted: Wed Mar 15, 2017 8:09 am Post subject: |
|
|
I haven't tried this kind of connection. I guess that it depends on whether the routines are exported as "extern "C"". It would be worth a try. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2826 Location: South Pole, Antarctica
|
Posted: Wed Mar 15, 2017 9:53 am Post subject: |
|
|
I know that IVF made LAIPE.LIB mostly works with 32bit FTN95. FTN95 understands its syntax. It did not work with some subroutines and the FTN95 generated EXE complained at run time about some missing system functions. But Intel's Steve Lionel wrote me that LIB file has to be substituted to DLL which gathers all the system functions used into the DLL file.
As to gFortran all who used it can check if DLL made in gFortran is compatible with FTN95. Parallelization job DLL library is doing is gFortran or IVF business |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1891
|
Posted: Wed Mar 15, 2017 12:19 pm Post subject: Re: |
|
|
PaulLaidler wrote: | I haven't tried this kind of connection. I guess that it depends on whether the routines are exported as "extern "C"". It would be worth a try. |
I have used FTN95-64 with a couple of 64-bit DLLs intended for use with Intel Fortran or with Intel/MS C.
One of them is the Pardiso library (V4.12 and V5.00). The Pardiso DLLs depend on the Intel OpenMP DLL, but I have that. The FTN95-produced 64-bit EXE ran fine on several large symmetric matrices from the NIST Matrix Market.
On the other hand, the MKL library uses somewhat complicated modules to map simplified interface names of library entry points to highly decorated actual entry point names. I could see that making this work would take considerable work and there is a good chance that it would fail.
In short, if making FTN95-64 work with third party libraries is important for you, it is worth trying out. If the third party library is supplied only as a static library, as DanRight said, first build a DLL from that library, and make the DLL export all the symbols that you wish to use from your FTN95-compiled program. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2826 Location: South Pole, Antarctica
|
Posted: Thu Mar 16, 2017 8:17 am Post subject: |
|
|
Thanks mecej4 for the info about Pardiso lib, it can be useful. Also for large collection of NIST Matrix Market, it is very interesting.
Now would be great if you'd check that gFortran which you also use has DLLs compatible with FTN95. It is not FTN95 to worry what and how DLL is doing parallelization inside. If it is also compatible with FTN95 then I'd encourage you to try LAIPE (it is really simple) and compare to other parallel algebra packages.
By the way the manual for Pardiso library tells that the Intel Fortran and MS Dev Studio has to be installed for it to work. LAIPE parallel algebra library, both LIB or DLL it does not matter, though, does not need anything else, you just call its subroutines as usual in Fortran and link it with other OBJ, LIB or DLL files with SLINK |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1891
|
Posted: Thu Mar 16, 2017 12:05 pm Post subject: |
|
|
I don't know what to make of Laipe. I have Gfortran/Gcc 6.2 from Equation.com, and it includes the Laipe libraries. I built the example at the end of Chapter II of the manual ( ftp://ftp.equation.com/laipe/document/laipe_eqsolver.pdf ). The program runs, but the "decomposed" matrix is the same as the original matrix, and the "solution" is the same as the input R.H.S. vector. I suspect that the library checks for a license key or file and does a short return when it finds none. The vendor has the right to require licensing, but giving a false impression of doing something fast is not good. I have no intention of buying a Laipe license. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2826 Location: South Pole, Antarctica
|
Posted: Thu Mar 16, 2017 12:56 pm Post subject: |
|
|
Mecej4, Was this Laipe or Laipe2 ? I did not try Laipe2, it may need to link also neuLoop DLL. Site says that it is free, unless i miss something |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1891
|
Posted: Thu Mar 16, 2017 1:04 pm Post subject: |
|
|
I believe that it is Laipe2+Neuloop4. I found it in the GCC/Gfortran 6.2 distribution from equation.com. Only the static libraries are provided. Here is the example code. The build command is in the first line as a comment.
Code: | ! gfortran -fdollar-ok -g laibnd.f -llaipe2 -lneuloop4
Program XLAIBND
implicit none
! *** Example program ***
! define variables where the length of A is determined by equation (2.2)
!
integer*4, parameter :: N = 7
integer*4, parameter :: LowerBandwidth=2
real*4 :: A((N-1)*LowerBandwidth+N), X(N)
integer*4 :: NoGood
DATA X/21.0,141.0,2.0,9.0,333.0,1.0,3.0/
!
! input the lower triangular part of [A]
!
CALL Input(A,LowerBandwidth)
!
! decompose in parallel
!
CALL laipe$decompose_CSP_4(A,N,LowerBandwidth, NoGood)
!
! stop if NoGood=1
!
IF(NoGood.eq.1) STOP 'Cannot be decomposed'
!
! perform substitutions in parallel
!
CALL laipe$substitute_CSP_4(A,N,LowerBandwidth,X)
!
! output decomposed matrix
!
CALL Output(A,N,LowerBandwidth)
!
! output the solution
!
Write(*,'('' Solution is as:'')')
Write(*,*) X
!
! end of the program
!
CALL laipe$done
STOP
END
SUBROUTINE Input(A,LowerBandwidth)
!
!
! routine to demonstrate application of data storage scheme
! (A)FORTRAN CALL: CALL Input(A,LowerBandwidth)
! 1.A: <R4> profile of matrix [A], dimension(*)
! 2.LowerBandwidth: <I4> lower bandwidth
!
!
! dummy arguments
!
INTEGER*4 :: LowerBandwidth
REAL*4 :: A(LowerBandwidth,1)
!
! input
!
A(1,1) = 1.0
A(2,1) = 4.0
A(3,1) = 2.0
A(2,2) = 25.0
A(3,2) = 29.0
A(4,2) = 9.0
A(3,3) = 88.0
A(4,3) = 34.0
A(5,3) = 3.0
A(4,4) = 89.0
A(5,4) = 23.0
A(6,4) = 11.0
A(5,5) = 45.0
A(6,5) = 7.0
A(7,5) = 3.0
A(6,6) = 22.0
A(7,6) = 2.0
A(7,7) = 9.0
!
RETURN
END
SUBROUTINE Output(A,N,LowerBandwidth)
!
!
! routine to output the decomposed matrix by data storage scheme
! (A)FORTRAN CALL: CALL Output(A,N,LowerBandwidth)
! 1.A: <R4> profile of matrix [A], dimension(*)
! 2.N: <I4> order of square matrix [A]
! 3.LowerBandwidth: <I4> lower bandwidth
!
!
! dummy arguments
!
INTEGER*4 :: N,LowerBandwidth
REAL*4 :: A(LowerBandwidth,1)
!
! local variables
!
INTEGER*4 :: Column,Row
!
! output the coefficients of decomposed matrix
!
WRITE(*,'('' Row Column Coefficient'')')
DO Column = 1,N
DO Row = Column, MIN0(Column+LowerBandwidth,N)
WRITE(*,'(I4,I6,F9.3)') Row,Column, A(Row,Column)
END DO
END DO
!
RETURN
END |
|
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2826 Location: South Pole, Antarctica
|
Posted: Thu Mar 16, 2017 5:24 pm Post subject: |
|
|
Yea, something I do not like in this test... For example there was no setting of number of threads or cotes. Matrix is in old array syntax which may need /oldarray option of FTN95. Better to take initially dense matrix case and when it will start working return back to this case again. |
|
Back to top |
|
|
kaliuzhkin
Joined: 17 Sep 2012 Posts: 33
|
Posted: Mon Jul 31, 2017 9:32 pm Post subject: identify version |
|
|
How do I identify the version of my current FTN95 package?
Dan |
|
Back to top |
|
|
kaliuzhkin
Joined: 17 Sep 2012 Posts: 33
|
Posted: Mon Jul 31, 2017 9:48 pm Post subject: wrong version |
|
|
Hm. Ftn95 /ver gives version 7.20 on the newly installed personal edition. |
|
Back to top |
|
|
kaliuzhkin
Joined: 17 Sep 2012 Posts: 33
|
Posted: Mon Jul 31, 2017 10:15 pm Post subject: sorry |
|
|
Sorry, I get 8.10 now. Please ignore these messages. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|