Topic: Array expressions mishandled when /opt is used in 64-bit

mecej4

Posts: 1911

Back to Top

12 Oct 2018 2:39 #22657

Here is another instance of an optimization bug. This one occurs when array sections are copied and /opt /64 has been requested. The bug is rather fragile, and minor changes to the code make it disappear. To make the bug easy to notice, I made two copies of a single subroutine, which modifies an input 2-D array. The two copies are identical except for one line. The first version has, on line 64,

         v (:mp1, j) = v (:mp1, jcol)               ! array section copy

The second version has, instead, a DO loop, on lines 91-93:

         Do k = 1, mp1;  v (k, j) = v (k, jcol);  End Do

The main program compares the returned arrays v from the two versions. In the absence of bugs, the two should be identical. With 32-bit compilations, or with /64 but not /opt, they are identical. With /64 /opt, the bug surfaces.

The source code:

Program tst
      Implicit None
      Integer :: m = 306, nvar = 4, n = 5, ns = 2
      Double Precision :: eq (4, 5)
      Double Precision, Dimension (307, 6) :: v, v1, v2
      Integer :: i, j
      Data eq/0.d0,1.d0,3*0.d0,1.d0,3*0.d0,1.d0,2*0.d0,1.d0,3*0.d0, &
             1.d0,3.d0,2*0.d0/
!
      do i = 1, 307
         do j = 1,6
            v(i, j) = i*(3.d0-j)
         end do
      end do
!
      v1 = v
      Call scrch1 (m, nvar, eq, v1)! Uses vector assignment, v (:mp1, j) = v (:mp1, jcol)
!
      v2 = v
      Call scrch2 (m, nvar, eq, v2)! Uses DO loop, DO k=1, mp1; v(k, j) = v(k, jcol)
!
! Check that the two results match, as they should
!
      If (any(Abs(v1-v2) > 1d-6)) Then
         Write (*, 10) m, nvar, n, ns
         Write (*,*) 'SCRCH, diffs found'
         Do i = 1, 307
            Do j = 1, 6
               If (Abs(v1(i, j)-v2(i, j)) > 1d-5) &
              &    write (*, 20) i, j,v1 (i, j), v2 (i, j)
            End Do
         End Do
      Else
         Write (*,*) 'SCRCH, no diffs'
      End If
      Stop
!
10    Format ('m, nvar, meqa, n, ns = ', 5 I5)
20    Format (2 I4, 2 x, 2 ES15.7)
!
End Program
!
Subroutine scrch1 (m, nvar, eq, v)
      Implicit None
      Integer, Intent (In) :: m, nvar
      Integer :: nscol (2) = (/ 2, 3 /), iresl (4) = (/ 1, 4, 2, 1 /)
      Double Precision, Intent (In) :: eq (4, 5)
      Double Precision, Dimension (307, 6), Intent (Inout) :: v
      Integer :: l, j, lrow, lcol, jcol, n, np1, mp1, ns
      Double Precision :: fact
      n = nvar + 1; np1 = n + 1;  mp1 = m + 1; ns = nvar - 2
!
      Do l = 1, 2
         lrow = iresl (2*l-1); lcol = iresl (2*l)
         Do j = 1, ns
            jcol = nscol (j); fact = eq (lrow, jcol)
            v (:mp1, jcol) = v (:mp1, jcol) - fact * v (:mp1, lcol)
         End Do
         fact = eq (lrow, n)
         v (:mp1, np1) = v (:mp1, np1) - fact * v (:mp1, lcol)
      End Do
      Do j = 1, ns
         jcol = nscol (j); If (jcol == j) Cycle
         v (:mp1, j) = v (:mp1, jcol)               ! array section copy
      End Do
      v (:mp1, n-2) = v (:mp1, n); v (:mp1, n-1) = v (:mp1, np1)
      Return
End Subroutine scrch1

NOTE: The second version of the subroutine is posted in a follow-on posting below, because of the forum line limits.

mecej4

Posts: 1911

Back to Top

12 Oct 2018 2:41 #22658

Here is the second version of the subroutine. Put the two pieces together into a single source file, compile and run with /64 /opt, then with /64 alone.

Subroutine scrch2 (m, nvar, eq, v)
      Implicit None
      Integer, Intent (In) :: m, nvar
      Integer :: nscol (2) = (/ 2, 3 /), iresl (4) = (/ 1, 4, 2, 1 /)
      Double Precision, Intent (In) :: eq (4, 5)
      Double Precision, Dimension (307, 6), Intent (Inout) :: v
      Integer :: l, j, k, lrow, lcol, jcol, n, np1, mp1, ns
      Double Precision :: fact
      n = nvar + 1; np1 = n + 1; mp1 = m + 1; ns = nvar - 2
!
      Do l = 1, 2
         lrow = iresl (2*l-1); lcol = iresl (2*l)
         Do j = 1, ns
            jcol = nscol (j); fact = eq (lrow, jcol)
            v (:mp1, jcol) = v (:mp1, jcol) - fact * v (:mp1, lcol)
         End Do
         fact = eq (lrow, n)
         v (:mp1, np1) = v (:mp1, np1) - fact * v (:mp1, lcol)
      End Do
      Do j = 1, ns
         jcol = nscol (j); If (jcol == j) Cycle
         Do k = 1, mp1                                    ! Do Loop copy
            v (k, j) = v (k, jcol)
         End Do
      End Do
      v (:mp1, n-2) = v (:mp1, n); v (:mp1, n-1) = v (:mp1, np1)
      Return
End Subroutine scrch2

PaulLaidler

Posts: 7971 Salford, UK

Back to Top

13 Oct 2018 7:05 #22663

Thank you for this report. I have logged it for investigation.

PaulLaidler

Posts: 7971 Salford, UK

Back to Top

8 Nov 2018 1:39 #22760

An initial investigation indicates that the fault lies with item 30 of the optimiser. So a temporary 'fix' is to add /inhibit_opt 30.

The issue remains outstanding.

mecej4

Posts: 1911

Back to Top

8 Nov 2018 2:14 #22762

Paul, where can we see a list of these 'optimisation item's? How many of these are there, and are they grouped into categories?

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

8 Nov 2018 2:21 #22763

I agree with Mecej4 - it would be interesting to see what optimisations are performed, and which might simply be skipped.

It's a long time since I even attempted to use /OPT, as my codes run adequately fast without it, especially on my new computer that is considerably faster than the old one.

Eddie

mecej4

Posts: 1911

Back to Top

8 Nov 2018 2:51 (Edited: 11 Nov 2018 11:05) #22765

Incidentally, Eddie, I was also brought up with 'subexpression optimisation' in my blood. More recently, I have been weaning myself from that habit, because (i) it makes the code less readable, as it always did, and (ii) these days, it usually makes the code slower as well.

The variables that are created for the purpose of holding many of these subexpressions are very short-lived and, if they are declared in the subprogram declarations section, the compiler ends up generating code that will be doing lots of copying to and from main memory. Left to itself, the compiler optimiser can do a better job of recognising and confining these variables to registers.

PaulLaidler

Posts: 7971 Salford, UK

Back to Top

8 Nov 2018 3:42 #22768

I will make enquiries.

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

8 Nov 2018 4:51 #22771

Mecej4,

I think it may well be the case that common subexpression removal by an efficient and working optimization is probably better than doing it yourself, but surely, for readability it depends how you write it, doesn’t it? And that, in part, depends on the naming of the variables that hold the pre-calculated common subexpressions. 'm1' doesn't cut it for me.

One programmer whose work I admired always used ‘GASH’ (British military slang (specifically from the Royal Navy and Royal Marines) for rubbish (garbage), or for something that is considered useless, broken or otherwise of little value, rather than any other definition) for such a variable, and when more than one was required, supplemented them with ‘GESH’, ‘GISH’, ‘GOSH’ and ‘GUSH’ if he had to, which wasn’t often.

As for me, I tended to prefer COEF with various other suffixes for the same job. (COEF_A, COEF_B etc)

In both cases, the readability seems to me to be enhanced rather than degraded.

Eddie

JohnCampbell

Posts: 2526 Sydney

Back to Top

11 Nov 2018 12:11 #22785

I looked at array sections in scrch1 and found:

     v (:mp1, j) = v (:mp1, jcol)               ! opt fails
     v (1:mp1, j) = v (1:mp1, jcol)               ! opt fails
     v (:, j) = v (:, jcol)               ! opt is OK

The first 2 may take a section copy of the array, while the 3rd might not.

for ' v (:, np1) = v (:, np1) - fact * v (:, lcol) ', I would usually replace it by the following: fact = -eq (lrow, n) call daxpy (mp1, fact, v(1, lcol), v(1, np1) )

As with mecej4's observations of compiler optimisation performance, it is now better left to the compiler to clean this up. ( most of my F77 wrapper approaches are no longer effective)

mecej4

Posts: 1911

Back to Top

11 Nov 2018 1:34 #22789

Quoted from JohnCampbell I looked at array sections in scrch1 and found:
     v (:mp1, j) = v (:mp1, jcol)               ! opt fails
     v (1:mp1, j) = v (1:mp1, jcol)               ! opt fails
     v (:, j) = v (:, jcol)               ! opt is OK
The first 2 may take a section copy of the array, while the 3rd might not.

Since the upper bound of the first index of v is > mp1, we should not write

     v (:, j) = v (:, jcol)               ! opt is OK

as that would clobber v(mp1+1:,j). Consequently, in chasing after efficiency, we may have to redo the whole program, changing the dimensions of v to exactly fit the problem being treated, rather than something 'big enough'.

JohnCampbell

Posts: 2526 Sydney

Back to Top

12 Nov 2018 4:30 #22796

I was more trying to identify which types of array syntax are failing, hopefully to assist with the bug identification I also tried a 3rd option that used F77 wrappers, but it is not as clearly presented as the array syntax.

https://www.dropbox.com/s/tso7bfyhwxikh5l/tst_opt.f90?dl=0

LitusSaxonicum

Posts: 2284 Yateley, Hants, UK

Back to Top

13 Nov 2018 11:08 #22809

The reason why it might make the code slower is that when you put a subexpression result in a named variable, that variable has to be assigned its value, then retrieved each time, rather than the compiler holding the result in a register for re-use.

Why the compiler can't recognise the single variable as a subexpression to hold in a register I don't know, which costs you only the code to assign it to a named variable, and then of course, it will probably be still retained in the cpu cache anyway, but there you are.

Of course, the common subexpression recognition in the compiler and how effective it is depends in part on (a) being able to recognise it in the first place, and (b) how far ahead is the lookahead, (c) how many subexpressions are dealt with, and of course, whether the compiler does it correctly.

Personally, I gave up on /opt when I got funny results, believing that they were the result of re-ordering, which is a well-known issue with optimising compilers (I am usually in the fortunate position of being able to detect a stupid result). If it is the result of a bug, then perhaps I did the right thing.

I still always do simple, straightforward hand optimization, and believe the idea that it slows things down is an urban myth, but that may be my own myth.

Eddie

PaulLaidler

Posts: 7971 Salford, UK

Back to Top

4 Feb 2019 2:22 #23205

The original bug on this thread has now been fixed for the next release of FTN95.