forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Skyline solver accelerated 42 times on 48 procesdors
Goto page Previous  1, 2, 3, 4  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Wed Mar 22, 2017 7:06 pm    Post subject: Re: Reply with quote

DanRRight wrote:

2) Also how about your previous assessment that LAIPE2 is always slower then LAIPE1 while the test shows opposite?

That assessment remains more useful. The results in the previous two posts, while falling into the same category as your "Gaussmark", pertain to dense random matrices. My previous statement was based on sparse symmetric matrix runs.
Quote:

3) How about using 64bit Laipe with 64bit FTN95 ?

Works fine, if you build a DLL first with Gfortran/GCC.
Quote:
4) Do you know if MKL have block matrix solver for the symmetric matrix like below? Arrow show the current width of block. In Laipe this is Decompose_VAG_8

MKL/Pardiso covers that type. They classify based on symmetric/unsymmetric and +def/indef. Whether a matrix is block-sparse or ragged-sparse they don't care. Laipe will probably be at a disadvantage with matrices such as https://www.cise.ufl.edu/research/sparse/matrices/Cannizzo/sts4098.html , where there are large empty regions between the diagonal and the opposite corners (N.E. and S.W.). Laipe input is almost the same as for a full, dense matrix, whereas with other sparse packages you do not have to fill in the zeros.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Wed Mar 22, 2017 7:54 pm    Post subject: Reply with quote

I do not know about very sparse cases but for the block matrix like above Laipe stores data only for the blocks omitting empty places. It is packing data into 1D array contiguously. How about licensing conditions of MKL/Patdiso ? I suppose this sparse part of MKL is is also parallel.

Also, noticed strange slowness at 500 matrix size for MKL and Laipe2? first time DLL loading ovehead ?
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Wed Mar 22, 2017 8:25 pm    Post subject: Re: Reply with quote

DanRRight wrote:
How about licensing conditions of MKL/Patdiso ? I suppose this sparse part of MKL is is also parallel.

https://software.intel.com/en-us/performance-libraries

Quote:

Also, noticed strange slowness at 500 matrix size for MKL and Laipe2? first time DLL loading overhead ?

Possible. You could circumvent the timing error caused by DLL load by making a dummy call to some DLL routine before starting the timer.

There are plenty of strange things with Laipe. I have noticed cases where run time increases as the number of threads is increased.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Wed Mar 22, 2017 10:18 pm    Post subject: Reply with quote

I tried your 64bit Laipe DLL with dense matrix code and it works with no single huccup.

I also installed MKL and want to try with your program for Lapack/MKL above first. How it has to be compiled with FTN95 ?

The work you have done by adopting external libraries has great benefits for all FTN95 users. All now can use MKL parallel libraries, updated and future updates of LAIPE as well as basically a lot of other software via DLL. My prize for that which i promised goes to you with all big thanks
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Thu Mar 23, 2017 1:07 am    Post subject: Re: Reply with quote

DanRRight wrote:
I also installed MKL and want to try with your program for Lapack/MKL above first. How it has to be compiled with FTN95 ?

As follows:
Code:
ftn95 tlapack.f90 /64
slink64 tlapack.obj <path to MKL directory>\compilers_and_libraries_2017.2.187\windows\redist\intel64\mkl\mkl_rt.dll /file:tlapack
path %path%;<path to directory containing 64-bit MKL_RT.DLL>
tlapack
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Thu Mar 23, 2017 2:07 am    Post subject: Reply with quote

Can you please send exact BAT file? I have some error in syntax

Tried just to copy all MKL DLLs into the same directory and SLINK64 them but still at run time it does not allow to load MKL_Intel_thread.dll or something else
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Thu Mar 23, 2017 2:40 am    Post subject: Reply with quote

I typed the commands in a command window -- no batch file. I do not install MKL in the default location, since I keep multiple versions for use as needed.

If you have Intel Parallel Studio installed, open a compiler command window for x64, and you will find all the MKL DLLs in %root%\redist\intel64\mkl.

I have never installed MKL by itself, so I don't know how the installer sets up the MKL environment in that case. Nevertheless, there is probably a setup file called mklvars.bat or something similar, which should work for you.

If all that fails, post the error message here or in the MKL forum.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Thu Mar 23, 2017 4:00 am    Post subject: Reply with quote

mecej4,

Thanks for demonstrating the possibilities. This gives me some useful pointers as to how to link into FTN95 /64.

My review of Laipe2 tests suggests that they (the published tests) do not use SSE or AVX instructions. I don't know why this would be the case, although my suspicion is that the added speed of AVX instructions would introduce the memory speed bottleneck to the performance, which must be a significant issue when many threads are used.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Thu Mar 23, 2017 11:25 am    Post subject: Reply with quote

The 64-bit Laipe2 library that is included with Equation.com's recent GFortran distributions (6.2, 6.3) contains SSE2 instructions. Which version did you test?
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Thu Mar 23, 2017 2:01 pm    Post subject: Reply with quote

Mecej4, From the Intel's link you have mentioned above out of 4-5 different software packages I installed only Intel MKL 2017 update 2. Was really surprised that Intel offer all of them for free. May try different package version, saw that this specific update had problems linking for some people.
Do not see MKL directory \Program Files (x86)\IntelSWTools in the System/Environment Variables/path, may be need a reboot, but I can not reboot now...
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Thu Mar 23, 2017 2:10 pm    Post subject: Reply with quote

That MKL package is fine. Intel's releasing MKL in a "community" edition is a recent development.

If you do not have some version of Visual Studio 2015 installed, you will not have the support libraries and DLLs needed to make MKL work. My usual suggestion is to

(i) install VS 2015 community edition, if needed

(ii) test that you can build some C programs using VC and only then

(iii) install MKL or Parallel Studio.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Fri Mar 24, 2017 12:15 am    Post subject: Reply with quote

Tried reinstall MKL, install DevStudio, uninstalled, took different version of MKL was not free and required license key, etc etc etc until I realized that nothing that was needed and I just manually put the path ...hell knows where was my damn error...Devilry. Anyway I returned to your initial bat file.

Also, huge mess is this Intel, different versions do things differently with environment variables, their own tests do not work complaining of missing this LIB, missing that DLL...

Code:

ftn95 tlapack.f90 /64 /debug /check /free /err /set_error_level error 298  /no_truncate /zeroise >a_FTN95___

slink64  tlapack.obj "c:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.1.143\windows\redist\intel64\mkl\mkl_rt.dll" /file:tlapack.exe  >a_link___



Same DIR was added to the path in the System/Environment Variables


Last edited by DanRRight on Fri Mar 24, 2017 2:19 am; edited 6 times in total
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Fri Mar 24, 2017 12:15 am    Post subject: Reply with quote

mecej4 wrote:
Which version did you test?

I am comparing equation.com's reported single thread performance of the Intel Xeon and AMD and comparing to what I can achieve on my Intel i5 and i7 processors using basic DO loop code or MATMUL intrinsic. The only way I could get that performance would be to exclude vector instructions. Laipe2 appears to be too slow.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Fri Mar 24, 2017 11:57 am    Post subject: Re: Reply with quote

JohnCampbell wrote:

I am comparing equation.com's reported single thread performance of the Intel Xeon and AMD and comparing to what I can achieve on my Intel i5 and i7 processors using basic DO loop code or MATMUL intrinsic. The only way I could get that performance would be to exclude vector instructions.

Would it be correct to conclude that you did not actually run programs using Laipe1 or Laipe2, but are estimating what times Laipe might yield on your computer(s)?

Laipe2-64 bit definitely uses SSE2 for floating point operations. Here is proof: The EXE produced by Gfortran 6.2-64-bit for Dan's dense random square matrix problem (Laipe2 static library linked) yields this:
Code:

s:\sparse\LAIPE>objdump -d a.exe | findstr /i fadd
  40fadd:       48 8b 84 24 a8 01 00    mov    0x1a8(%rsp),%rax
  42f5b0:       d8 05 a2 dd 01 00       fadds  0x1dda2(%rip)        # 0x44d358
  4305c0:       d8 05 92 cd 01 00       fadds  0x1cd92(%rip)        # 0x44d358

Surely, one cannot implement Gaussian elimination with just two FADD instructions? Furthermore, the benchmark is for double precision matrices, and what you see here are single precision FADDS instructions, probably from some RTL routine.

Here are more timing results, with Dan's timings added for comparison:
Code:

               T W O   C O R E I5 - 4200U              I7 - 4770K
       64-bit     32-bit       32-bit       64-bit     32-bit
  N     MKL        MKL         Laipe1       Laipe2     Laipe1 (DanR)
----   -----      -----        ------        -----     ----
 500   0.110      0.000         0.062        0.031
 750   0.000      0.016         0.141        0.110
1000   0.031      0.047         0.328        0.219     0.09
1250   0.047      0.078         0.610        0.407
1500   0.078      0.093         1.000        0.656
1750   0.078      0.125         1.594        1.015
2000   0.110      0.203         2.285        1.469     0.75
2250   0.172      0.250         3.401        2.125
2500   0.234      0.375         4.509        2.875
2750   0.344      0.484         6.547        3.735
3000   0.375      0.641         7.625        4.796     2.44
3250   0.468      0.828        10.283        6.328
3500   0.578      1.054        12.580        7.734
3750   0.703      1.223        15.933        9.609
4000   0.892      1.422        19.148       11.422     5.90



Last edited by mecej4 on Sun Aug 11, 2019 9:04 am; edited 2 times in total
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Fri Mar 24, 2017 12:07 pm    Post subject: Re: Reply with quote

DanRRight wrote:

Also, huge mess is this Intel, different versions do things differently with environment variables, their own tests do not work complaining of missing this LIB, missing that DLL...

Code:

ftn95 tlapack.f90 /64 /debug /check /free /err /set_error_level error 298  /no_truncate /zeroise >a_FTN95___

slink64  tlapack.obj "c:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2017.1.143\windows\redist\intel64\mkl\mkl_rt.dll" /file:tlapack.exe  >a_link___


Same DIR was added to the path in the System/Environment Variables

Dan, did you finally get the program built and did you run it? You redirected the error messages to files, and forgot to post the contents of those files!

For the purposes of this test, you do not need any of the compiler options that you used, especially /check and /zeroise.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Goto page Previous  1, 2, 3, 4  Next
Page 2 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group