Topic: Incorrect results with /opt in Support

mecej4

Posts: 1912

Back to Top

10 Apr 2017 7:08 #19377

The Fortran package ENLSIP solves nonlinear least squares problems with constraints ( http://plato.asu.edu/ftp/other_software/ENLSIP.tar.gz ). It has about 8000 lines of code, and contained (until it was corrected yesterday) two calls to a subroutine, one of which involved aliasing. The subroutine heading is

SUBROUTINE ADDIT(OA,OI,T,Q,K)

where OA and OI are integer arrays, T, Q and K are scalar integers. The subroutine modifies OA, OI, T and Q, but not K. The call in question read CALL ADDIT(OA,OI,T,Q,Q) One way of removing the aliasing is to replace this line by CALL ADDIT(OA,OI,T,Q,(Q)) Placing the last argument within parentheses creates an anonymous variable whose value is set to the value of the expression within the parentheses, and passes that anonymous variable as the actual fifth argument to the subroutine. With FTN95 (32 or 64-bit), this artifice does not work when /opt is used.

Here is a test program:

      PROGRAM ALIASBUG
      INTEGER :: T = 2, Q = 5
      INTEGER :: OA(3) = (/ 1,2,0 /)
      INTEGER :: OI(6) = (/ 3,4,5,7,1,0 /)
      !
      CALL ADDIT(OA,OI,T,Q,(Q))  ! <<== WAS CALL ADDIT(OA,OI,T,Q,Q)
      PRINT *,'AFTER ADDIT, T,Q = ',T,Q

      CONTAINS
      SUBROUTINE ADDIT(OA,OI,T,Q,K)
      INTEGER T,Q,K,OA(T+1),OI(Q+1)
C
C     ADD A CONSTRAINT TO THE ACTIVE SET
C
      INTEGER I
      T = T + 1
      OA(T) = OI(K)
      DO I = K, Q
         OI(I) = OI(I+1)
      END DO
      Q = Q - 1
      RETURN
      END SUBROUTINE ADDIT
      END PROGRAM

FTN95 with /opt gives the output

 After Addit, T,Q =            3           5

rather than the correct result

 After Addit, T,Q =            3           4

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

11 Apr 2017 5:59 #19380

Thank you for the feedback.

Is the package provided as a DLL or have you compiled it using FTN95? I am guessing that the /opt only relates to the program that you have posted. Presumably it is 32 bits.

I have made a note of this issue.

mecej4

Posts: 1912

Back to Top

11 Apr 2017 8:29 (Edited: 11 Apr 2017 9:23) #19384

The ENLSIP package is provided in source form, and includes a main program (driver) for a test problem (Hock-Schittkowski #65). I compiled it into an EXE using FTN95. The program gave correct output without /opt, but aborted with an access violation when I used /opt. After investigating and locating the problem, I pared the code down to obtain the reproducer that I posted above.

The /opt bug occurs with the ENLSIP test program as well as with the reproducer, using FTN95 8.1 32-bit on Windows 10. The bug disappears if I specify INTENT(IN) for the fifth subroutine argument, K.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

11 Apr 2017 9:00 #19386

Many thanks.

Paul

mecej4

Posts: 1912

Back to Top

11 Apr 2017 10:30 #19388

Paul, this is a rather tricky bug, and I appreciate your patience.

It turns out that the current compiler generates incorrect code, with or without /opt, with or without /64, for actual arguments that are expressions consisting of a variable name surrounded by parentheses. By coincidence, there are harmful effects only with /opt in 32-bit mode because of the way in which the compiler makes local copies of subroutine arguments, works with the local copies in the body of the subroutine, and updates the arguments just before returning from the subroutine, to reflect the changes to the local copies. These are the reasons why this compiler bug is elusive. The bug has probably gone unnoticed because it is rare that we use the source code pattern (<variable>) as an actual argument.

Consider the code generated by the compiler for the line

CALL ADDIT(OA,OI,T,Q,(Q))

In 32-bit mode, the assembly code is

   0006         CALL ADDIT(OA,OI,T,Q,(Q))                  AT 20
      00000020(16/3/95)          push      ebx ; For eight-byte alignment
      00000021(17/4/94)          lea       eax,Q
      00000027(18/3/95)          push      eax
      00000028(19/4/93)          lea       ecx,Q
      0000002e(20/3/95)          push      ecx
      0000002f(21/4/92)          lea       edi,T
      00000035(22/3/95)          push      edi
...etc...

Please note that the address of Q is pushed twice. In effect, the compiler is disregarding the parentheses around the second Q in the CALL.

Similarly, in 64-bit mode,

00000021(#104,6):      LEA       R15,Q
00000028(#103,7):      LEA       R9,Q
0000002f(#102,8):      LEA       R8,T
00000036(#101,9):      LEA       RDX,OI
0000003d(#100,10):      LEA       RCX,OA
00000044(#105,11):      MOV_Q     [RSP+32],R15
00000049(#105,12):      CALL      ALIASBUG~ADDIT

Note for readers unfamiliar with the Windows 64-bit ABI: The first four integer arguments are passed in registers RCX, RDX, R8, R9. Subsequent arguments are passed on the stack. In the present case, the fifth argument, (Q), i.e., a copy of Q, should have been passed on the stack. The compiler is passing Q itself, instead.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

11 Apr 2017 12:09 #19389

mecej4

Your analysis is very helpful and much appreciated.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

26 Apr 2017 8:13 #19467

The incorrect behaviour for an argument of the form (Q) has now been fixed for the next release except for the case where Q is a complex value and /64 is used. This case remains outstanding.

mecej4

Posts: 1912

Back to Top

29 Apr 2017 12:26 #19494

Paul, thanks for the expeditious resolution of the problem.

Although this thread is about a bug in the compiler, it is worthwhile to note the important role played by a compiler with excellent error-hunting capabilities such as FTN95 in isolating and fixing errors in widely used software.

More significantly, FTN95 already includes many facilities that may be used to diagnose bugs in the compiler itself.

Please consider, as a long-term goal, adding a compiler option to check for and report instances of argument aliasing in user code. As far as I am aware, no current and widely used compiler provides such a facility.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

29 Apr 2017 6:19 #19495

Thanks. Can you explain what 'argument aliasing' is may with an example.

mecej4

Posts: 1912

Back to Top

29 Apr 2017 12:35 #19497

Great! I have started a new thread with a 'request for feature' in a new thread:

 http://forums.silverfrost.com/viewtopic.php?p=21819#21819

Thanks.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

13 Dec 2018 5:08 #23001

mecej4

I have revisited the issue that was reported as 'outstanding' above and have not been able to find a fault. I am assuming that this particular issue is now completely fixed.

mecej4

Posts: 1912

Back to Top

13 Dec 2018 6:34 #23002

Paul,

Thanks for asking. The example code that I presented involved only integer variables, and one of those variables is used as the upper limit of a DO index.

When I read your response of 26 April 2017 that the only case outstanding was the one where an actual argument was a complex variable surrounded by parentheses, I was slightly puzzled. There are several such example codes that could be constructed, and I did not know which one you had used.

I put together a new example, and tried 8.30.279 on it. The 32-bit EXEs from it give the correct result, but the 64-bit EXEs do not.

program aliasbug 
   complex :: p = (1.0,2.0) 
! 
   call addit(p,(p))  ! <<== (p) used to prevent aliasing 
   print *,'after addit, p = ',p 

contains 
   subroutine addit(p,q) 
      complex :: p,q 
      p=p+q
      return 
   end subroutine addit 
end program

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

13 Dec 2018 11:28 #23003

mecej4

Many thanks for your help with this.

The complex example that you have posted above did not work with v8.40 but has now been fixed for the next release.