Because we are comparing LAIPE and your methods, not different computers, John. So take the same tests with the same precision and run two on the same computer, and then ... when you will come we will collect from your almost South pole. Is beer OK there? Or 5-10 years old cognacs?
Take real linear algebra AX=B test not something which is tiny building blocks of it, auxiliary subroutines, general purpose utilities, system level utilities, supplemental material etc. What you are comparing to now is not even in the LAIPE library (!). Not only you are not comparing apples to apples, you are comparing apples to vacuum 😄
It has no big sense to run matmul for comparison because typically if matrix fits the cache matmul goes very fast and overhead of parallelization makes the parallelization useless. If matrix is too large then we deal with the bottleneck of the memory subsystem and the tricks of better utilization of the caches not the speed of compiler, processor, or method of solution.
This test on equation website was some demonstration of how well LAIPE2 compiled by GFortran (means also it is much slower then original LAIPE compiled with iFort or Lahey) handles multithreading on some very slow server and not how fast it is.
The same I have done in the past to show that the FTN95 is the best in the world respect to scaling with the number of threads DOING FP CALCULATIONS: we have only 4 FP cores but compiler runs as if there was 8. There I took a= log(exp(a)) for some arbitrary simulations:
https://forums.silverfrost.com/Forum/Topic/2239&postdays=0&postorder=asc&highlight=net+paralell+parallel&start=0
Here is LAIPE content. You can play with Matmul (though i do not see it in documentation), but then take good old LAIPE compiled with Intel Fortran and chose something serious from it to run
'- Constant-Bandwidth, Symmetric, and Positive Definite Systems.
- Variable-Bandwidth, Symmetric, and Positive Definite Systems.
- Dense, Symmetric, and Positive Definite Systems.
- Constant-Bandwidth and Symmetric Systems.
- Variable-Bandwidth and Symmetric Systems.
- Dense and Symmetric Systems.
- Constant-Bandwidth and Asymmetric Systems.
- Variable-Bandwidth and Asymmetric Systems.
- Dense and Asymmetric systems.
- Constant-Bandwidth and Asymmetric Solvers with Partial Pivoting.
- Constant-Bandwidth, Symmetric, and Positive Definite Solvers with Partial Pivoting.
- Constant-Bandwidth and Symmetric Solvers with Partial Pivoting.
- Dense Solvers with Partial Pivoting.
- Dense Solvers with full pivoting.'
'This manual covers parallel direct solvers, i.e., Cholesky decomposition, skyline solver, Crout decomposition, multiple entry solvers, and other popular and useful techniques. Solvers for dense and sparse systems are included. More than 90% of scientific and engineering problems are formulated into a system of equations. Solution of system equations is required in many scientific and engineering computing. LAIPE has the most useful and highly efficient solvers for scientific and engineering computing'
From all that i am interested only with block matrix on the main diagonal solvers VAG_S and VAG_D. If anyone would extract them from LIB and put into DLL and that worked with FTN95 you would collect from my North Pole immediately 😃