forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Code generation bug with /64

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> 64-bit
View previous topic :: View next topic  
Author Message
mecej4



Joined: 31 Oct 2006
Posts: 1884

PostPosted: Mon Aug 26, 2019 5:27 am    Post subject: Code generation bug with /64 Reply with quote

A mysterious bug appeared in a large production Fortran code (12 K lines of code). When compiled and run with /check or /debug with /64, the program ran fine. When compiled with just /64 and with or without /debug, it ran fine again. When compiled with /opt /64, the program crashed with an access violation.

Fortunately, after the crash the address of the crash was present in the pop up. Rerunning the FTN95-compiled EXE under the Visual Studio debugger revealed that improper CMP instructions had been generated.

Generally, these are the circumstances when the bug occurs: an expression involving 4-byte integers is (1) evaluated, (2) compared to zero, and (3) the result of the comparison is used as a condition for whether or not to evaluate another expression and assign the result to a variable. However, I am not sure at this point that this bug occurs only if /opt is used. We would have to examine how the compiler treats other large codes to settle that question.

For example, for the source line

Code:
if (m1 - incj2 .gt. 0 .and. m1-incj2 .le. nxyz) then


the generated instructions were

Code:
  000000000040182F: 4D 8B E9               mov         r13,r9    ; R9 contains M1 already
  0000000000401832: 44 2B AD CC 50 00  00  sub         r13d,dword ptr [rbp+00000000000050CCh]     ; subtract INCJ2
  0000000000401839: 49 81 FD 00 00 00 000  cmp         r13,0 ; should have been r13d     ; is M1 - INCJ2 > 0 ?


At this point, R13 contained the value 0000 0000 FFFF FFFC, which is a positive 64-bit signed integer. However, R13D, which should have been used in the CMP instruction (or R13D sign-extended to R13 with MOVSX), contains FFFF FFFC, which is a negative 32-bit signed integer. This negative integer is then used as an index into an array, and is likely to result in an access violation and abort. Or, worse, the program may output incorrect results that may not be obviously wrong.

I was able to create a reproducer (see below). The results from compiling and running with /64 /opt:

Code:
           i1  m1-incj2     xxn(i1)

  1         2        -1  4.0000E+00
  2         3         4  6.0000E+00
  3         2         2  4.0000E+00
  4         3         7 -4.3000E+01


The "-1" in the first line of results is wrong. In fact, that whole line should be absent, as running with /64 (i.e., without /opt) shows:

Code:
           i1  m1-incj2     xxn(i1)

  1         3         4  6.0000E+00
  2         2         2  4.0000E+00
  3         3         7 -4.3000E+01


The bug does not occur if, instead of

Code:
if (m1 - incj2 .gt. 0 .and. m1-incj2 .le. nxyz) then


we have

Code:
if (m1 .gt. incj2 .and. m1-incj2 .le. nxyz) then



Because of the page size limit of this forum, I have posted the source code of the program in the next post in this thread


Last edited by mecej4 on Mon Aug 26, 2019 12:33 pm; edited 8 times in total
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1884

PostPosted: Mon Aug 26, 2019 5:29 am    Post subject: Reply with quote

Here is the source code of the reproducer:

Code:
      program tst
      implicit none
      real xx(7),xxn(7),va(7,7)
      xx = 71.
      xxn = 7.
      va  = 49.
      call sor2l(xx,xxn,va)
      end program

      subroutine sor2l(xx, xxn, va)
      implicit none
      integer :: l2x, i1, i2, ix2m = 7, m1, m2
      integer :: nx1 = 3, nx2 = 4, incj1 = 5, incj2 = 20
      integer :: ii1 = 2, ii3 = 7, nx3 = 8, i3
      real :: xxn(7), va(7, 7), xx(7)
      integer :: nx = 11, nxy = 3, nxyz = 7
      integer :: ilog(5),i1log(5)
      real :: xlog(5)

      xx(1:nxyz-1) = 0.
      xx(nxyz) = 1.
      l2x = 0
      print *
      print *,'          i1  m1-incj2     xxn(i1)'
      print *
      do i3 = ii3, nx3
         do i2 = 1, nx2 - 1, 2
            m1 = ((i3 - 1)*nxy + (i2 - 1)*nx + 1) - incj1
            m2 = m1 + incj2
            do i1 = ii1, nx1
               m1      = m1 + incj1
               m2      = m2 + incj1
               xxn(i1) = 2.0*i1
!
! The next IF (condition) THEN statement causes an incorrect CMP instruction
! to be generated with /64 /opt. All the conditional expressions are
! 4-byte integers, so only the lower half of a 64-bit x64 register
! should be used in CMP instructions.
!
! Inserting PRINT statements to report the values will not work.
! Store into memory instead of printing, to let optimiser be uninhibited.
! Print logged values after all loops are done
!
               if (m1 - incj2 .gt. 0 .and. m1-incj2 .le. nxyz) then
                  xxn(i1) = xxn(i1) - va(ix2m, m1-incj2)*xx(m1-incj2)
                  l2x=l2x+1
                  i1log(l2x) = i1
                  ilog(l2x) = m1-incj2
                  xlog(l2x) = xxn(i1)
               end if
            end do
         end do
      end do
      print '(1x,i2,2i10,ES12.4)', (i1,i1log(i1),ilog(i1),xlog(i1), i1=1,l2x)
      end subroutine sor2l


Last edited by mecej4 on Mon Aug 26, 2019 9:33 am; edited 1 time in total
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7912
Location: Salford, UK

PostPosted: Mon Aug 26, 2019 8:00 am    Post subject: Reply with quote

mecej4

Many thanks for the feedback and extensive analysis.

The current developers' version runs this code successfully but I will make a note that this needs to be check out.
Back to top
View user's profile Send private message AIM Address
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7912
Location: Salford, UK

PostPosted: Mon Nov 04, 2019 4:47 pm    Post subject: Reply with quote

This bug exists in the current release and has now been fixed for the next release of FTN95.
Back to top
View user's profile Send private message AIM Address
John-Silver



Joined: 30 Jul 2013
Posts: 1520
Location: Aerospace Valley

PostPosted: Wed Nov 06, 2019 6:13 am    Post subject: Reply with quote

another bug bites the dust ..... another one bites the dust
and another bug bites ... another one bites
another bug bites the dust !

well done Paul ! keep ticking them off

when is the next dll release due ? must be mature & 'stable' (whatever that means) by now I'd have thought (unless the horse has bolted Wink)
(I'm interested to see the TEX fixes fom a while back
ref. http://forums.silverfrost.com/viewtopic.php?t=4066)

... will the personal version be released or not at same time this time ?)
_________________
''Computers (HAL and MARVIN excepted) are incredibly rigid. They question nothing. Especially input data.Human beings are incredibly trusting of computers and don't check input data. Together cocking up even the simplest calculation ... Smile "
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> 64-bit All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group