|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
JohnCampbell
Joined: 16 Feb 2006 Posts: 2593 Location: Sydney
|
Posted: Mon Mar 24, 2014 1:40 am Post subject: SSE Instructions |
|
|
I have recently done testing of SSE/AVX and OpenMP instructions on other compilers for a range of (cheaper) hardware.
The results have shown that for large vector calculations, memory access speeds are the bottleneck and using FTN95 with David's SSE routines compare very favourably with those alternatives.
These routines for real*8 dot product and vector addition would be a valuable addition to FTN95. If provided as a library for real*8 vector calculations these would be a valuable addition for FTN95's performance.
These could possibly be expanded to a few other basic routines of similar structure.
Perhaps a /SSE switch could incorporate these.
For general use, these SSE instructions can be applied at the inner DO loop.
The complexity in their use relates to management of alignment of variables, which could be assisted by reviewing the management of array alignment.
John |
|
Back to top |
|
|
LitusSaxonicum
Joined: 23 Aug 2005 Posts: 2393 Location: Yateley, Hants, UK
|
Posted: Mon Mar 24, 2014 3:22 pm Post subject: |
|
|
Don�t I remember that the old DBOS version of FTN77 used to need specific memory alignments for some purposes? My guess is that without CHARACTER, and with all other data types of the same length as in Fortran 66, memory alignment was more or less guaranteed then.
Using SSE3/AVX etc automatically is an extension to the /P6 option, and as I doubt that anyone uses that Pentium CPU any more, and indeed, it is hard to believe that any hardware regularly in use doesn�t support most SSE/AVX options, then perhaps the defaults need changing.
Eddie |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2593 Location: Sydney
|
Posted: Tue Mar 25, 2014 12:02 am Post subject: |
|
|
Eddie,
Via experimentation, I have learnt some interesting results of AVX and SSE instructions.
I certainly don't understand why the alignment issue is there. They (Intel) should have designed an instruction set that could cope with 8 bytes spanning memory "segments".
What I have found is that I can't get AVX instructions to be very effective in comparison to SSE for large memory problems. They only work well if the information is in the cache and you would think that memory alignment could have been dealt with better in the cache.
Trying to push performance via SSE, AVX or OpenMP requires that the variables you are working on are cached, so for large calculations the key processor measure is memory access speed, rather than CPU clock rate. ( Large calculations are when the inner loop is accessing more memory than can be stored in the cache, which is about 10 to 20 Mb.)
With Davidb�s SSE routines and FTN95, the SSE instructions show significant utilisation of the cache, while FTN95 does not appear to utilise the cache for the tests I performed.
It is also interesting, that relying on experimentation to claim you know how a computer works is an uncertain approach, as it does not take long until you are proven wrong. When you don't have the range of hardware performance to test your hypothesis, it is easy to reach the wrong conclusion.
Anyway, memory access speed appears to be the performance limiter at the moment; that is for compilers that support SSE, AVX or OpenMP instructions.
John |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|