
forums.silverfrost.com Welcome to the Silverfrost forums

View previous topic :: View next topic 
Author 
Message 
DanRRight
Joined: 10 Mar 2008 Posts: 1875 Location: South Pole, Antarctica

Posted: Sun Mar 26, 2017 5:39 am Post subject: 


Blocks inside the matrix are squares which go along the major diagonal, so the geometrically the shape of matrix is symmetric, but matrix elements all are different, so formally it is not symmetric (not equal to transposed as a definition of symmetric).
Dense 20000x20000 real*8 matrix contains 3GB of data. My block sparse one 1/2 of that 

Back to top 


DanRRight
Joined: 10 Mar 2008 Posts: 1875 Location: South Pole, Antarctica

Posted: Sun Mar 26, 2017 5:48 pm Post subject: 


Here is the test with single precision using LAIPE block sparse solver. I substituted VAG_8 by VAG_4 and all arrays with real*4. It shows twice the speed of VAG_8 above
Code:  VAG_4 VAG_8 MKL_8 MKL_4
500 0.005 0.006 0.191 0.117
1000 0.020 0.027 0.062 0.061
1500 0.051 0.083 0.157 0.140
2000 0.113 0.189 0.360 0.267
2500 0.222 0.398 0.424 0.487
3000 0.373 0.673 0.680 0.591
3500 0.571 1.158 0.951 0.797
4000 0.846 1.745 1.176 1.257
4500 1.250 2.476 1.654 1.514
5000 1.704 3.537 2.034 1.775
5500 2.277 4.652 2.482 2.413
6000 2.950 6.048 3.024 2.615
10000 15.43 31.10 9.268 9.326
20000 151.5 326.7 54.21 40.99 
Freaking "fun" is that I can not create the same test with older LAIPE circa 1997 made by the Microsoft Fortran which I used all this time with no problems. It crashes and crashes from the start. Can I say again and again: "programming is full of @#%% devilry"? Just one step aside and you are burned for hours where you expected less. Attention to tiny details, good memory and good mood/health are literally a must for programmers or you will do and redo things 1000 times...Am I the only who are experiencing this type of hiccups with some ups but often downs all the way? I never heard any complaints from colleagues of all ages to the same extreme extent which bothers me, and I'd like to believe that am not the most stupid person, or completely not suitable for this kind of work which I do all life and like it...
Last edited by DanRRight on Tue Mar 28, 2017 7:47 am; edited 7 times in total 

Back to top 


mecej4
Joined: 31 Oct 2006 Posts: 1002

Posted: Mon Mar 27, 2017 2:49 am Post subject: 


If by "older LAIPE circa 1997 made by the Microsoft Fortran" you mean a library made for use with Fortran Powerstation (FPS) 4, to call routines in the library you need to use the STDCALL convention. In the caller, you may need to add declarations for this purpose, or use compiler options, depending on whether your code is compiled with IFort or FTN95.
Using the FPS4 compiler, I ran the code that I posted in my post of March 22, 2017 after changing the subroutine names to LAIPE1 names and providing interface blocks for the three LAIPE subroutines that are called. The run times were roughly the same as with Intel Fortran, because most of the run time is spent in the LAIPE1 library. 

Back to top 


JohnCampbell
Joined: 16 Feb 2006 Posts: 1993 Location: Sydney

Posted: Mon Mar 27, 2017 3:22 am Post subject: 


Dan,
Using Laipe’s laipe$decompose_VAG_x, I would recommend two utility routines to check your solution:
1) A routine that, given LAST and LABEL, checks the profile definition to confirm compliance with infill below the diagonal (say laipe$check_profile_VA_x) and,
2) A routine that, given the calculated solution {X}, calculates the error vector {E} = {B} – [A]{X} and reports on the maximum error (say laipe$check_error_VA_8)
If nothing else, these checks would help with understanding the data structure and accuracy of the solution.
The use of REAL*8 :: A(1,1) and A(i,LABEL(j)) is an interesting approach. I am sure a lot of Fortran inspectors would complain.
Code:  Subroutine laipe$Check_Profile_VAG_x (N_i, Label_i, Last_i, NoGood_o)
!
! routine to check profile structure for VA matrix
! Beginning(I) must be in range (1:I)
! Last(I) must be in the range (I:N) and (Last(I1):N)
!
INTEGER*4 :: N_i ! number of equations
INTEGER*4 :: Label_i(N_i) ! column storage offset in A(1,1)
INTEGER*4 :: Last_i(N_i) ! last active equation in column
INTEGER*4 :: NoGood_o ! count of errors ( =0 for OK )
!
integer*4 :: Ieq, Begin_i
!
NoGood_o = 0
!
!! Beginning(I) = Label(I1) + Last(I1)  Label(I) + 1 (9.2)
Begin_i = 1
DO Ieq = 1, N_i
if ( Ieq > 1 ) then
Begin_i = Last_i(Ieq1)  ( Label_i(Ieq)  Label_i(Ieq1) ) + 1
if ( Last_i(Ieq) < Last_i(Ieq1) ) NoGood_o = NoGood_o + 1
end if
if ( Begin_i < 1 ) NoGood_o = NoGood_o + 1
if ( Begin_i > Ieq ) NoGood_o = NoGood_o + 1
if ( Last_i(Ieq) < Ieq ) NoGood_o = NoGood_o + 1
if ( Last_i(Ieq) > N_i ) NoGood_o = NoGood_o + 1
END DO
!
if ( NoGood_o > 0 ) Write (*,*) NoGood_o,' problems with VAG_8 profile'
!
End Subroutine laipe$Check_Profile_VAG_x



Back to top 


JohnCampbell
Joined: 16 Feb 2006 Posts: 1993 Location: Sydney

Posted: Mon Mar 27, 2017 3:24 am Post subject: 


.ctd Code:  REAL*8 Function laipe$Max_Error_VAG_8 (A_i, B_i, X_i, N_i, Label_i, Last_i)
!
! routine to calculate both Max error and RMS error for the solution of [A].{X} = {B}
! [A] and {B} are original inputs
! {X} is the solution obtained
!
INTEGER*4 :: N_i ! number of equations
REAL*8 :: A_i(*) ! original matrix [A], before being decomposed
REAL*8 :: B_i(N_i) ! {B} : original rhs
REAL*8 :: X_i(N_i) ! {X} : available solution to [A].{X} = {B}
INTEGER*4 :: Label_i(N_i) ! column storage offset in A(1,1)
INTEGER*4 :: Last_i(N_i) ! last active equation in column
!
integer*4 :: Ieq, Begin_i, End_i, N_eq, Start_i
real*8 :: err_i, max_err, rms_err
REAL*8, allocatable :: B_new(:)
!
ALLOCATE ( B_new(N_i) )
B_new = 0
!
! Begin_i is first active equation in column i
! Start_i is storage location of this first coefficient in A(*) storage
! last_i is last active equation of column i
!
! Begin_i = Last_i(Ieq1)  ( Label_i(Ieq)  Label_i(Ieq1) ) + 1
! Start_i = Begin_i + (Label_i(ieq)1)*1
!
Begin_i = 1
DO Ieq = 1, N_i
if ( Ieq > 1 ) then
Begin_i = Last_i(Ieq1)  ( Label_i(Ieq)  Label_i(Ieq1) ) + 1
end if
End_i = Last_i(Ieq)
N_eq = End_i  Begin_i + 1
!
!! B_new(Begin_i:End_i) = B_new(Begin_i:End_i) + A_i(Begin_i:End_i,Label_i(Ieq)) * X_i(Ieq)
!
Start_i = Begin_i + Label_i(ieq)1
call daxpy ( B_new(Begin_i), A_i(Start_i), X_i(Ieq), N_eq )
!
END DO
!
max_err = 0
rms_err = 0
DO Ieq = 1, N_i
err_i = abs ( B_new(Ieq)  B_i(Ieq) )
if ( err_i > max_err ) max_err = err_i
rms_err = rms_err + err_i**2
END DO
rms_err = sqrt ( rms_err / N_i )
!
write (*,*) 'RMS error = ', rms_err
write (*,*) 'Max error = ', max_err
!
laipe$Max_Error_VAG_8 = max_err
!
End Function laipe$Max_Error_VAG_8

I have not tested them, so hopefully the idea is there, even if there are a few errors !! 

Back to top 


DanRRight
Joined: 10 Mar 2008 Posts: 1875 Location: South Pole, Antarctica

Posted: Mon Mar 27, 2017 4:06 am Post subject: 


Mecej4, There exist one more library besides the ones you have. It does not need anything to be called from FTN95. And it worked for 20 years with no hiccup. Until today...I tried few things but just have no time to find the reason.
Does this pardiso subroutine above have single precision version?
John, I may play with checking of solution but what gives to the user this extra checking? Have you ever seen big problems here? 

Back to top 


mecej4
Joined: 31 Oct 2006 Posts: 1002

Posted: Mon Mar 27, 2017 11:37 am Post subject: Re: 


DanRRight wrote:  Does this pardiso subroutine above have single precision version? 
Yes, and you really have to read the documentation before attempting to use that and other options. Unlike most of the Lapack routines in MKL, for calling Pardiso you specify singleprecision by setting one of the elements of the iparm() array, and not by calling a different subroutine name. Similarly, Pardiso does different things based on control variables that you pass to it. 

Back to top 


JohnCampbell
Joined: 16 Feb 2006 Posts: 1993 Location: Sydney

Posted: Mon Mar 27, 2017 12:30 pm Post subject: 


Dan wrote:  John, I may play with checking of solution but what gives to the user this extra checking? Have you ever seen big problems here? 
In my analyses I always do a calculation of the form Laipe$Max_Error_VAG_8.
Often [A].{x} can be written as SUM ( [Ai].{x} ) where [Ai] are much smaller matrices. This also confirms the assembly of [A].
The calculation of errors can help to confirm the correctness of the solution and also identifies possible roundoff where the matrix/problem is poorly defined. I carry out this check wherever possible in FE analysis, especially in static analysis or eigenvector extraction.
Remember that pivoting has not been selected, which can be a problem for a skyline storage solution. 

Back to top 


mecej4
Joined: 31 Oct 2006 Posts: 1002

Posted: Tue Mar 28, 2017 1:58 am Post subject: 


John, please note that an unsymmetric sparse matrix need not satisfy the tests
Code: 
if ( Begin_i > Ieq ) NoGood_o = NoGood_o + 1
if ( Last_i(Ieq) < Ieq ) NoGood_o = NoGood_o + 1

that you have in your laipe$Check_Profile_VAG_x() subroutine. For example, the matrix IMPCOL_B at the Matrix Market has five columns that begin below the diagonal, yet there is no problem obtaining a solution to A.x = b with that matrix using the SuperLU solver (the Laipe solver fails to factorize the matrix, as least with my driver code for obtaining the solution using the VAG subroutines). 

Back to top 


JohnCampbell
Joined: 16 Feb 2006 Posts: 1993 Location: Sydney

Posted: Tue Mar 28, 2017 3:15 am Post subject: 


Mecej4,
I was trying to identify some warnings for a Profile_VAG approach.
If it fails those tests, then there is no use using a profile solver that doesn't use any pivoting. However there are many practical equation systems where pivoting is not required.
I find it difficult to understand how you could use partial pivoting with a skyline/column storage model.
How do MKL and Paradiso work with the sparse matrix definition and manage the storage required during reduction ? I would expect that the Profile_VAG approach requires minimal additional storage, (until pivoting is attempted). 

Back to top 


mecej4
Joined: 31 Oct 2006 Posts: 1002

Posted: Tue Mar 28, 2017 3:33 am Post subject: 


Pardiso does not provide for any special treatment of skyline profile or even banded matrices, and I suspect that for skyline profile matrices there may be other packages that do better.
Only nonzero offdiagonal elements need to be supplied. In that aspect, I dislike the design decision taken by Laipe as to unsymmetric matrices  for the matrix Sherman2 from the Matrix Market, N = 1080, nnz = 23,094 but, after filling in all the zeros that Laipe expects, the matrix has 401,023 elements, i.e, 18 times larger.
Pardiso is a modern routine, and allocates any storage that it needs without getting the user involved. A terminal call is needed after the solution has been obtained to tell Pardiso to release internally allocated storage.
Pivoting is handled internally, but you can choose between MMD and Metis reordering.
Last edited by mecej4 on Wed Mar 29, 2017 7:54 pm; edited 1 time in total 

Back to top 


DanRRight
Joined: 10 Mar 2008 Posts: 1875 Location: South Pole, Antarctica

Posted: Tue Mar 28, 2017 7:54 am Post subject: Re: 


mecej4 wrote:  If by "older LAIPE circa 1997 made by the Microsoft Fortran" you mean a library made for use with Fortran Powerstation (FPS) 4, to call routines in the library you need to use the STDCALL convention. In the caller, you may need to add declarations for this purpose, or use compiler options, depending on whether your code is compiled with IFort or FTN95.
Using the FPS4 compiler, I ran the code that I posted in my post of March 22, 2017 after changing the subroutine names to LAIPE1 names and providing interface blocks for the three LAIPE subroutines that are called. The run times were roughly the same as with Intel Fortran, because most of the run time is spent in the LAIPE1 library. 
There exist one more older LAIPE library besides LAIPE1 circa 2008, let's call it LAIPE0 or LAIPE1997, the LIB file of which is compatible with FTN9532 because was done by MS Fortran. No need to add anything. But it does not work with this small test by some reason despite of working with my programs for 20 years 

Back to top 


DanRRight
Joined: 10 Mar 2008 Posts: 1875 Location: South Pole, Antarctica

Posted: Tue Mar 28, 2017 10:34 pm Post subject: 


"The road to hell is paved with good intentions".
I finally succeeded to reproduce the same test with the 1997 LAIPE or LAIPE0.
Well, that was classical devilry pawed with lures and good intensions.
We have
 LAIPE0 circa 1997 made with MS Fortran
 LAIPE1 circa 2008 made with Intel Fortan
 LAIPE2 of recent years compiled with the gFortran from which Mecej4 made DLL to be compiled with FTN95
Dense matrices we started with are used exclusively for their simplicity, they have zero practical applicability in my case.
Dense LAIPE1 showed factor of 1.2 speedup from Dense LAIPE0. And Dense LAIPE2 showed 2x speedup over Dense LAIPE1. Lure is clear: let's try LAIPE2 for Sparse and then go drink Champaign!
We tried Sparse LAIPE2 versus Sparse LAIPE0 and got ... only 20% speed increase versus 20 years older libraries. Here is LAIPE0...
Code: 
nEqu = 500 0.005 s
nEqu = 1000 0.023 s
nEqu = 1500 0.061 s
nEqu = 2000 0.137 s
nEqu = 2500 0.269 s
nEqu = 3000 0.459 s
nEqu = 3500 0.724 s
nEqu = 4000 1.079 s
nEqu = 4500 1.526 s
nEqu = 5000 2.079 s
nEqu = 5500 2.855 s
nEqu = 6000 3.646 s
nEqu = 10000 17.13 s
nEqu = 20000 fail

That is for Real*4 data which go 2x faster then Real*8 with LAIPE why I mostly use it. I summarized most of test results in older post few days back.
Mecej4 tried dense MKL solver and it showed so great promis of more then an order of magnitude speedup over LAIPE that I thought sparse solver will also deliver such speedup. Well, sparse MKLPardiso only did up to 3x speedup and only for superlarge matrices 20000 size which good but very rarely used. During the simulation matrix could be very small too and changing methods is possible but potentially troublesome. On typically smallish sizes <6000 it is not faster then LAIPE and on small ones ~1000 it's up to 3 times slower then LAIPE Real*4. Oh, yea, forgot to mention that, and at the end here is kick below the waist line: single precision MKLPardiso gets no visible speedup over real*8 MKLPardiso, see the columns called MKL_8 and MKL_4 in the post of few days back
Double devil... Which is how called my place in some foreign language where I moved recently by the way. Beautiful place of course like it should be
ADDITION: After being disappointed with current sparse matrix testing results for some time I think there is one more chance: to ask author of LAIPLE to compile it with latest Intel Fortran using SSE and AWX.
Last edited by DanRRight on Thu Mar 30, 2017 1:23 am; edited 2 times in total 

Back to top 


DanRRight
Joined: 10 Mar 2008 Posts: 1875 Location: South Pole, Antarctica

Posted: Wed Mar 29, 2017 7:02 am Post subject: 


Addition 2
Wow! Finally! Look what I've got with real*8 block sparse matrix: WAY faster then the fastest before
Code: 
1000 0.007 s
3000 0.115
6000 0.742
10000 3.408
20000 27.21 
And fasten your seat belts now. Same with Real*4
Code: 
1000 0.003 s
2000 0.020
3000 0.063
4000 0.135
5000 0.273
6000 0.421
10000 1.743
20000 14.12

Killing speed!
To Intel Pardiso: "You are fired" (c) 

Back to top 




You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum

Powered by phpBB © 2001, 2005 phpBB Group
