I have recently done testing of SSE/AVX and OpenMP instructions on other compilers for a range of (cheaper) hardware. The results have shown that for large vector calculations, memory access speeds are the bottleneck and using FTN95 with David's SSE routines compare very favourably with those alternatives. These routines for real8 dot product and vector addition would be a valuable addition to FTN95. If provided as a library for real8 vector calculations these would be a valuable addition for FTN95's performance. These could possibly be expanded to a few other basic routines of similar structure. Perhaps a /SSE switch could incorporate these. For general use, these SSE instructions can be applied at the inner DO loop. The complexity in their use relates to management of alignment of variables, which could be assisted by reviewing the management of array alignment.
John