forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

FTN 95 8.10 Personal Edition
Goto page Previous  1, 2, 3
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Tue Mar 14, 2017 3:38 am    Post subject: Reply with quote

Dan,

I don't use Laipe. I have been quoting Laipe performance reported on the equation.com web site. I find their quoted single thread performance to be incredibly slow. If you start from such a slow base, the improvements with multiple threads are not that significant, and ignore the real multi-thread problems that occur when combining AVX and !$OMP.
To achieve the single thread results they quote, they must have turned off vector instructions, not used -ffast-math and possibly other delays. Why ?

I can use AVX calculations and !$OMP on an i7-4790K and get better performance than the best equation.com quoted with these many core / many thread processors they have used, so why use Laipe ?

With FTN95, the latest results I posted show that axpy4@ (or axpy8@) give good vector performance. I would certainly recommend this approach where possible. I also demonstrated that a cache smart approach is important for good AVX performance.
I have not used multi-thread libraries with FTN95 or FTN95 /64; not sure how robust this would be.

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Tue Mar 14, 2017 9:55 am    Post subject: Re: Reply with quote

JohnCampbell wrote:
Dan,
I don't use Laipe. I have been quoting Laipe performance reported on the equation.com web site.


then there are just even more words
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Wed Mar 15, 2017 2:34 am    Post subject: Reply with quote

Paul and Robert,

I think what Dan may be asking is could a third party .dll be linked into a FTN95 executable, either 32 or 64 bit ?

The .dll being proposed is a multi-thread computation, generated either from gFortran or ifort, that has !$OMP capabilities.
Could the following code (or a subset) be compiled in gFortran with options of -O3 -mavx -ffast-math -fopenmp then linked into a FTN95 calling program ?
Basically, we have seen the opposite with clearwin64.

John
Code:
   subroutine laipe_matmul_cache (a,b,c,nra,nca,ncb)
     use precision
!    matrix multiplication : multi thread and cacheing strategy

       integer*4 nra,nca,ncb,  j,k, k1,k2
       real(dp) :: a(nra,nca), b(nca,ncb)
       real(dp) :: c(nra,ncb)
!
       integer*4 num_cache_columns, nk
       external  num_cache_columns
!
!   determine columns of A per pass
      nk = num_cache_columns (nra,nca)
!
      do k1 = 1,nca,nk
        k2 = min ( k1+nk-1, nca)
!
!$OMP PARALLEL DO shared (a,b,c,nra,nca,ncb,k1,k2) private (j,k)
        do j = 1,ncb
          if (k1==1) c(:,j) = 0
          do k = k1,k2
!!            c(1:nra,j) = c(1:nra,j) + a(1:nra,k) * b(k,j)
            call vec_add_dp ( c(1,j), a(1,k), b(k,j), nra )
          end do
        end do
!$OMP END PARALLEL DO
!
      end do   ! cache size passes of A
!
   end subroutine laipe_matmul_cache

   subroutine vec_add_dp ( y, x, a, n )
!  DAXPY interface routine
     use precision
     integer*4 :: n
     real(dp)  :: y(n), x(n), a
!
     INTEGER*8 :: n8
     n8 = n
     call AXPY4@(y,x,n8,a)   ! FTN95 /64 routine
!
!       y = y + x * a    !  array syntax alternative
!
!       do i = 1,n         ! do loop alternative
!        y(i) = y(i) + x(i) * a
!       end do
   end subroutine vec_add_dp

   integer*4 function num_cache_columns (nra,nca)
!
!    matrix multiplication : multi thread and cacheing strategy
!    find the number of columns of A to store in each pass of multiplication
!    number is based on
!       size of cache and
!       number of cores (threads) in use
!
     use precision    !  byte_size
     use laipe_test   !  cache_size, use_cores,  nk, ncp

       integer*4 nra,     &    ! number of rows in A
                 nca           ! number of columns of A
!
!  Check that A is cached to 5mb
!     nk  = number of columns per cache pass
!     ncp = number of passes
!
!   Estimate number of columns for cache limit
      nk = (cache_size/byte_size) / nra - use_cores  !  allow 1 column for C for each thread
!
      if ( nk > nca ) then                           ! too many : no cache strategy required
        nk  = nca
        ncp = 1           
!
      else if ( nk <= use_cores ) then               ! too few : no smaller than 1 column per thread
        nk  = use_cores
        ncp = (nca+nk-1)/nk                          ! number of passes
!
      else
        ncp = (nca+nk-1)/nk                          ! number of passes
        nk  = (nca+ncp-1)/ncp                        ! even up columns per pass
        if ( use_cores > 1 )   &                     ! make sure multiple of use_threads
        nk  = ( (nk+use_cores-1)/use_cores ) * use_cores   ! round up to columns as multiple of cores
!
      end if
!
      write (*,*) ' A is cached to',ncp,' passes of',nk,' for',nca,' columns'
!
      num_cache_columns = nk
!
   end function num_cache_columns
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7916
Location: Salford, UK

PostPosted: Wed Mar 15, 2017 8:09 am    Post subject: Reply with quote

I haven't tried this kind of connection. I guess that it depends on whether the routines are exported as "extern "C"". It would be worth a try.
Back to top
View user's profile Send private message AIM Address
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Wed Mar 15, 2017 9:53 am    Post subject: Reply with quote

I know that IVF made LAIPE.LIB mostly works with 32bit FTN95. FTN95 understands its syntax. It did not work with some subroutines and the FTN95 generated EXE complained at run time about some missing system functions. But Intel's Steve Lionel wrote me that LIB file has to be substituted to DLL which gathers all the system functions used into the DLL file.

As to gFortran all who used it can check if DLL made in gFortran is compatible with FTN95. Parallelization job DLL library is doing is gFortran or IVF business
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Wed Mar 15, 2017 12:19 pm    Post subject: Re: Reply with quote

PaulLaidler wrote:
I haven't tried this kind of connection. I guess that it depends on whether the routines are exported as "extern "C"". It would be worth a try.

I have used FTN95-64 with a couple of 64-bit DLLs intended for use with Intel Fortran or with Intel/MS C.

One of them is the Pardiso library (V4.12 and V5.00). The Pardiso DLLs depend on the Intel OpenMP DLL, but I have that. The FTN95-produced 64-bit EXE ran fine on several large symmetric matrices from the NIST Matrix Market.

On the other hand, the MKL library uses somewhat complicated modules to map simplified interface names of library entry points to highly decorated actual entry point names. I could see that making this work would take considerable work and there is a good chance that it would fail.

In short, if making FTN95-64 work with third party libraries is important for you, it is worth trying out. If the third party library is supplied only as a static library, as DanRight said, first build a DLL from that library, and make the DLL export all the symbols that you wish to use from your FTN95-compiled program.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Thu Mar 16, 2017 8:17 am    Post subject: Reply with quote

Thanks mecej4 for the info about Pardiso lib, it can be useful. Also for large collection of NIST Matrix Market, it is very interesting.

Now would be great if you'd check that gFortran which you also use has DLLs compatible with FTN95. It is not FTN95 to worry what and how DLL is doing parallelization inside. If it is also compatible with FTN95 then I'd encourage you to try LAIPE (it is really simple) and compare to other parallel algebra packages.

By the way the manual for Pardiso library tells that the Intel Fortran and MS Dev Studio has to be installed for it to work. LAIPE parallel algebra library, both LIB or DLL it does not matter, though, does not need anything else, you just call its subroutines as usual in Fortran and link it with other OBJ, LIB or DLL files with SLINK
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Thu Mar 16, 2017 12:05 pm    Post subject: Reply with quote

I don't know what to make of Laipe. I have Gfortran/Gcc 6.2 from Equation.com, and it includes the Laipe libraries. I built the example at the end of Chapter II of the manual ( ftp://ftp.equation.com/laipe/document/laipe_eqsolver.pdf ). The program runs, but the "decomposed" matrix is the same as the original matrix, and the "solution" is the same as the input R.H.S. vector. I suspect that the library checks for a license key or file and does a short return when it finds none. The vendor has the right to require licensing, but giving a false impression of doing something fast is not good. I have no intention of buying a Laipe license.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Thu Mar 16, 2017 12:56 pm    Post subject: Reply with quote

Mecej4, Was this Laipe or Laipe2 ? I did not try Laipe2, it may need to link also neuLoop DLL. Site says that it is free, unless i miss something
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Thu Mar 16, 2017 1:04 pm    Post subject: Reply with quote

I believe that it is Laipe2+Neuloop4. I found it in the GCC/Gfortran 6.2 distribution from equation.com. Only the static libraries are provided. Here is the example code. The build command is in the first line as a comment.
Code:
!     gfortran -fdollar-ok -g laibnd.f -llaipe2 -lneuloop4
      Program XLAIBND
      implicit none
! *** Example program ***
! define variables where the length of A is determined by equation (2.2)
!

      integer*4, parameter :: N = 7
      integer*4, parameter :: LowerBandwidth=2
      real*4 :: A((N-1)*LowerBandwidth+N), X(N)
      integer*4 :: NoGood
      DATA X/21.0,141.0,2.0,9.0,333.0,1.0,3.0/
!
! input the lower triangular part of [A]
!

      CALL Input(A,LowerBandwidth)

!
! decompose in parallel
!

      CALL laipe$decompose_CSP_4(A,N,LowerBandwidth, NoGood)

!
! stop if NoGood=1
!

      IF(NoGood.eq.1) STOP 'Cannot be decomposed'

!
! perform substitutions in parallel
!

      CALL laipe$substitute_CSP_4(A,N,LowerBandwidth,X)

!
! output decomposed matrix
!

      CALL Output(A,N,LowerBandwidth)

!
! output the solution
!

      Write(*,'('' Solution is as:'')')
      Write(*,*) X

!
! end of the program
!

      CALL laipe$done
      STOP
      END

      SUBROUTINE Input(A,LowerBandwidth)
!
!
! routine to demonstrate application of data storage scheme
! (A)FORTRAN CALL: CALL Input(A,LowerBandwidth)
! 1.A: <R4> profile of matrix [A], dimension(*)
! 2.LowerBandwidth: <I4> lower bandwidth
!

!
! dummy arguments
!
      INTEGER*4 :: LowerBandwidth
      REAL*4 :: A(LowerBandwidth,1)

!
! input
!

      A(1,1) = 1.0
      A(2,1) = 4.0
      A(3,1) = 2.0
      A(2,2) = 25.0
      A(3,2) = 29.0
      A(4,2) = 9.0
      A(3,3) = 88.0
      A(4,3) = 34.0
      A(5,3) = 3.0
      A(4,4) = 89.0
      A(5,4) = 23.0
      A(6,4) = 11.0
      A(5,5) = 45.0
      A(6,5) = 7.0
      A(7,5) = 3.0
      A(6,6) = 22.0
      A(7,6) = 2.0
      A(7,7) = 9.0

!
      RETURN
      END

      SUBROUTINE Output(A,N,LowerBandwidth)
!
!
! routine to output the decomposed matrix by data storage scheme
! (A)FORTRAN CALL: CALL Output(A,N,LowerBandwidth)
! 1.A: <R4> profile of matrix [A], dimension(*)
! 2.N: <I4> order of square matrix [A]
! 3.LowerBandwidth: <I4> lower bandwidth
!

!
! dummy arguments
!

      INTEGER*4 :: N,LowerBandwidth
      REAL*4 :: A(LowerBandwidth,1)

!
! local variables
!

      INTEGER*4 :: Column,Row
!
! output the coefficients of decomposed matrix
!

      WRITE(*,'('' Row Column Coefficient'')')
      DO Column = 1,N

          DO Row = Column, MIN0(Column+LowerBandwidth,N)
               WRITE(*,'(I4,I6,F9.3)') Row,Column, A(Row,Column)

         END DO
      END DO
!
      RETURN
      END
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Thu Mar 16, 2017 5:24 pm    Post subject: Reply with quote

Yea, something I do not like in this test... For example there was no setting of number of threads or cotes. Matrix is in old array syntax which may need /oldarray option of FTN95. Better to take initially dense matrix case and when it will start working return back to this case again.
Back to top
View user's profile Send private message
kaliuzhkin



Joined: 17 Sep 2012
Posts: 33

PostPosted: Mon Jul 31, 2017 9:32 pm    Post subject: identify version Reply with quote

How do I identify the version of my current FTN95 package?

Dan
Back to top
View user's profile Send private message Send e-mail
kaliuzhkin



Joined: 17 Sep 2012
Posts: 33

PostPosted: Mon Jul 31, 2017 9:48 pm    Post subject: wrong version Reply with quote

Hm. Ftn95 /ver gives version 7.20 on the newly installed personal edition. Question
Back to top
View user's profile Send private message Send e-mail
kaliuzhkin



Joined: 17 Sep 2012
Posts: 33

PostPosted: Mon Jul 31, 2017 10:15 pm    Post subject: sorry Reply with quote

Sorry, I get 8.10 now. Please ignore these messages. Sad
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group