forums.silverfrost.com

DanRRight · Posted: Sun Mar 26, 2017 5:39 am Post subject:

Blocks inside the matrix are squares which go along the major diagonal, so the geometrically the shape of matrix is symmetric, but matrix elements all are different, so formally it is not symmetric (not equal to transposed as a definition of symmetric).

Dense 20000x20000 real*8 matrix contains 3GB of data. My block sparse one 1/2 of that

DanRRight · Posted: Sun Mar 26, 2017 5:48 pm Post subject:

Here is the test with single precision using LAIPE block sparse solver. I substituted VAG_8 by VAG_4 and all arrays with real*4. It shows twice the speed of VAG_8 above

mecej4 · Joined: 31 Oct 2006 Posts: 1886

If by "older LAIPE circa 1997 made by the Microsoft Fortran" you mean a library made for use with Fortran Powerstation (FPS) 4, to call routines in the library you need to use the STDCALL convention. In the caller, you may need to add declarations for this purpose, or use compiler options, depending on whether your code is compiled with IFort or FTN95.

Using the FPS-4 compiler, I ran the code that I posted in my post of March 22, 2017 after changing the subroutine names to LAIPE-1 names and providing interface blocks for the three LAIPE subroutines that are called. The run times were roughly the same as with Intel Fortran, because most of the run time is spent in the LAIPE-1 library.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

Dan,
Using Laipe’s laipe$decompose_VAG_x, I would recommend two utility routines to check your solution:
1) A routine that, given LAST and LABEL, checks the profile definition to confirm compliance with in-fill below the diagonal (say laipe$check_profile_VA_x) and,
2) A routine that, given the calculated solution {X}, calculates the error vector {E} = {B} – [A]{X} and reports on the maximum error (say laipe$check_error_VA_8)
If nothing else, these checks would help with understanding the data structure and accuracy of the solution.

The use of REAL*8 :: A(1,1) and A(i,LABEL(j)) is an interesting approach. I am sure a lot of Fortran inspectors would complain.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

DanRRight · Posted: Mon Mar 27, 2017 4:06 am Post subject:

Mecej4, There exist one more library besides the ones you have. It does not need anything to be called from FTN95. And it worked for 20 years with no hiccup. Until today...I tried few things but just have no time to find the reason.

Does this pardiso subroutine above have single precision version?

John, I may play with checking of solution but what gives to the user this extra checking? Have you ever seen big problems here?

mecej4 · Joined: 31 Oct 2006 Posts: 1886

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

mecej4 · Joined: 31 Oct 2006 Posts: 1886

John, please note that an unsymmetric sparse matrix need not satisfy the tests

JohnCampbell · Joined: 16 Feb 2006 Posts: 2554 Location: Sydney

Mecej4,

I was trying to identify some warnings for a Profile_VAG approach.
If it fails those tests, then there is no use using a profile solver that doesn't use any pivoting. However there are many practical equation systems where pivoting is not required.
I find it difficult to understand how you could use partial pivoting with a skyline/column storage model.
How do MKL and Paradiso work with the sparse matrix definition and manage the storage required during reduction ? I would expect that the Profile_VAG approach requires minimal additional storage, (until pivoting is attempted).

mecej4 · Joined: 31 Oct 2006 Posts: 1886

Pardiso does not provide for any special treatment of skyline profile or even banded matrices, and I suspect that for skyline profile matrices there may be other packages that do better.

Only non-zero off-diagonal elements need to be supplied. In that aspect, I dislike the design decision taken by Laipe as to unsymmetric matrices -- for the matrix Sherman2 from the Matrix Market, N = 1080, nnz = 23,094 but, after filling in all the zeros that Laipe expects, the matrix has 401,023 elements, i.e, 18 times larger.

Pardiso is a modern routine, and allocates any storage that it needs without getting the user involved. A terminal call is needed after the solution has been obtained to tell Pardiso to release internally allocated storage.

Pivoting is handled internally, but you can choose between MMD and Metis reordering.

DanRRight · Posted: Tue Mar 28, 2017 7:54 am Post subject: Re:

DanRRight · Posted: Tue Mar 28, 2017 10:34 pm Post subject:

"The road to hell is paved with good intentions".

I finally succeeded to reproduce the same test with the 1997 LAIPE or LAIPE0.

Well, that was classical devilry pawed with lures and good intensions.

We have
- LAIPE0 circa 1997 made with MS Fortran
- LAIPE1 circa 2008 made with Intel Fortan
- LAIPE2 of recent years compiled with the gFortran from which Mecej4 made DLL to be compiled with FTN95

Dense matrices we started with are used exclusively for their simplicity, they have zero practical applicability in my case.

Dense LAIPE1 showed factor of 1.2 speedup from Dense LAIPE0. And Dense LAIPE2 showed 2x speedup over Dense LAIPE1. Lure is clear: let's try LAIPE2 for Sparse and then go drink Champaign!

We tried Sparse LAIPE2 versus Sparse LAIPE0 and got ... only 20% speed increase versus 20 years older libraries. Here is LAIPE0...

DanRRight · Posted: Wed Mar 29, 2017 7:02 am Post subject:

Addition 2
Wow! Finally! Look what I've got with real*8 block sparse matrix: WAY faster then the fastest before