Topic: Sparse Matrix Solution tools for matrix inversion in Support

Hi Dan, The skyline solver is the solver I use. I am aware of Laipe, but prefer to develop my own solver and learn from the process. I have made good progress and have a solver which works well in comparison to other solvers I have reviewed. The main outcome of this study was:

use of OpenMP to enable multiple threads
use of vector instructions to speed up each thread
partitioning of the solution process to balance thread load
and importantly adopting a strategy of cache usage to minimise the limitations of the memory transfer bottleneck.

When I first started learning about OpenMP, I did not appreciate the limitation of memory access speed, transferring information to and from memory. It is important to use strategies to keep information in the cache and to modify the cache info, minimising the transfers between cache and memory. If you have 8 threads all accessing memory this becomes the performance bottleneck. Even single thread AVX vector instructions become constrained and can only work effectively if the vectors are already in the cache. You can see this when running on processors with different clock rates, the performance ratios are more dominated by the memory access speed and the cache size. (Early on, I wrote tests that could not show the benefit of AVX over SSE, because of the memory speed problem)

There are lots of calculations that become too complex for multi-thread coding (too much work!) or the calculation packet is too small to overcome the multi-thread overheads (entering a !$OMP region can take about 20,000 processor cycles). I have found they can be more easily improved by running multiple single-thread processes that target vector instructions. The hopeful inclusion of vector instructions in FTN95 will maintain it as a useful development and operation compiler.

John