For many years I have investigated the run time performance of the Fortran code I write, especially for solution of large sets of linear equations, as is found in finite element calculations. In this calculation there are two main vector calculations: dot. product = Vector_A . Vector_B, and vector subtraction : Vector_A = Vector_A - constant x Vector_B In recent years with FTN95, the vector subtraction has shown very poor performance times, with run times of up to 4 times that of other compilers. I have a sample problem where total time of these calculations is 50 seconds on LF95 and 206 seconds on FTN95. It puzzles me what is happening for the other 156 seconds, as clearly it is not the floating point calculations, which can be done in 50 seconds. Published benchmark run time performance for Salford FTN95 on polyhedron.com has also consistently shown FTN95 to be lagging behind most others.
All this has now changed !!
I have recently obtained a new desktop ( HP-Z400 ) which has dual Xeon processors. While the vector subtraction calculations have changed from 50 to 47 seconds for LF95, it has changed from 206 to 33 seconds for FTN95. By comparison dot_product has changed from 51/97 (lf/sal) to 31/31 on the Xeon. The old performance times are on my notebook Centrino, and my old desktop was a Core 2. From an operations count analysis, Dot_Product and Vec_subtraction run times should be similar. I explain the differences by how the processor optimises (or hampers) the calculation. My estimation for what has happened is there has been a significant shift in the optimisation approach within the Xeon processor in comparison to other processors (I have tested) The change for vector subtraction performance for LF95 shows the change in Xeon optimisation does not suit it. It would appear that what I anticipate is the “forward calculation optimisation” in the Intel processors has changed in the Xeon to the benefit of FTN95. If others are aware of similar changes or can offer a more accurate explanation to what I have observed, I would appreciate your comments, as I don’t think I fully understand this change. It would appear that FTN95 poor run-time performance may have a reprieve. It would be good if we knew why!
John