forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

This is why it is important to have a lot of users

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
DanRRight



Joined: 10 Mar 2008
Posts: 1521
Location: South Pole, Antarctica

PostPosted: Sat Sep 09, 2017 12:10 am    Post subject: This is why it is important to have a lot of users Reply with quote

Software has bugs, my own one specifically. Some bugs are so hidden that I call them "devilry". Look what the devil's nail I catched.

In the hurry years ago I made a mistake writing something like
Code:
if(A.lt.A) then

instead of
Code:
if(A.lt.B) then

and got sometimes strange "access violation" crashes (crashes got on my nerve lately). Bug was disappearing if I changed something or used /DEBUG and then reappearing again. Finally after 10 years, tired hiding or retired, this devil gave up and I got its demo. The fact that it is 100% reproducible is un-be-lie-vable, isn't it?

Code:
   A =1.234567
   B =2.345678

   if(A.lt.A) then
     B = 0
   endif

  end


Do I live in the Matrix or need some extreme form of exorcism for atheists? ))))
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 715

PostPosted: Sat Sep 09, 2017 3:50 am    Post subject: Reply with quote

The error happens only when generating 32-bit code.

Analysing the bug involves looking at the instruction codes generated.

The code listing (/EXP) shows
Code:
      0000002e(18/4/9)           fld       A
      00000031(19/4/9)           fcomp     fr0,fr0
      00000033(20/4/9)           ffree     bh
      00000035(21/4/9)           wait
      00000036(22/4/9)           fstswax
      00000038(23/4/9)           sahf
      00000039(24/4/9)           jae       __N3

The third instruction is nonsense -- you can only free the X87 registers. Perhaps, FFREE FR0 was intended. By using an FCOMPP instead of FCOMP in the second instruction, the FFREE would have been made unnecessary.

Disassembly of the OBJ file shows
Code:
  0000002E: D9 45 EC           fld         dword ptr [ebp-14h]
  00000031: D8 D8              fcomp       st(0)
  00000033: DD BF 9B DF E0 9E  fnstsw      word ptr [edi-611F2065h]
  00000039: 0F 83 07 00 00 00  jae         00000046

The access violation is apparently caused by the third instruction, which is using a bad offset. Byte 34 should have been C0, which would have given the following three instructions:
Code:
ffree st(0)
wait
fnstsw ax
sahf

Errors of a slightly different nature occur when /p6 is used.
Code:
      0000002e(18/4/9)           fld       A
      00000031(19/4/9)           fcomip    fr0,fr1
      00000033(20/4/9)           fcomp     fr0,fr0           ; Clean up the coprocessor stack
      00000035(21/4/9)           jbe       __N3

The first comparison instruction refers to FR1, but that register has not been loaded. The FCOMP instruction not only pops the stack, but also affects the flags.


Last edited by mecej4 on Sat Sep 09, 2017 1:59 pm; edited 1 time in total
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 4875
Location: Salford, UK

PostPosted: Sat Sep 09, 2017 8:27 am    Post subject: Reply with quote

Thanks for the feedback Dan and for the explanation mecej4. I have logged this as needing fixing.
Back to top
View user's profile Send private message
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 1644
Location: Yateley, Hants, UK

PostPosted: Sat Sep 09, 2017 11:28 am    Post subject: Reply with quote

Does it happen for IF (A .gt. A) ?

Probably no-one reported it, because no-one ever did it before.

Makes you wonder where you go to with an ELSE, or

GOTO (A-A) 10, 20, 30

One could waste a whole day, or worse longer, on this.

Seriously, if the correct answer to generate code that works assuming the programmer's intent is explicit, or should it generate an error message or warning at compilation? By analogy with the warning when you compare something to zero, I suspect the latter is better - or both the warning and the correct executable.

Eddie
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 715

PostPosted: Sat Sep 09, 2017 12:51 pm    Post subject: Reply with quote

Quote:
Does it happen for IF (A .gt. A) ?


Yes, it does.

The situation is complicated in more recent versions of Fortran when A, B or both can be NaNs, or if one compares a REAL to an INTEGER.

I sometimes wish that we had a logical type with three possible values, TRUE, FALSE and "UNKNOWABLE" to handle such situations.

By the way, the venerable FTN77 4.03 compiler generates correct code for Dan's test program.


Last edited by mecej4 on Sat Sep 09, 2017 1:43 pm; edited 1 time in total
Back to top
View user's profile Send private message
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 1644
Location: Yateley, Hants, UK

PostPosted: Sat Sep 09, 2017 1:16 pm    Post subject: Reply with quote

Wow,

That's amazing. I think the problem of comparing a variable with itself ought to be recognisable as a programming error, but I would not expect a compiler to recognise this as the same case:

Code:
      B = A
      IF (A .LT. B) THEN


As far as LOGICALs go, since one bit pattern is .TRUE. and another is .FALSE., then even LOGICAL*1 has plenty of spare bit patterns to represent shades of the truth. Perhaps there is a need for a new type: POLITICAL perhaps, where every shade between .FALSE. and .TRUE. can be represented?

Eddie
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 715

PostPosted: Sat Sep 09, 2017 1:17 pm    Post subject: Reply with quote

Paul, I recognise that SDBG is probably not meant for detailed machine code level work, but it does offer a disassembly window (accessed with F11), which is sometimes quite useful.

While using this feature with Dan's program, I found the following deficiencies:

1. The NOP instruction, opcode 90h, is disassembled as INT 3, which has the opcode CCh.

2. The disassembler appears to be unaware of P6 instructions, such as FCOMIP, which are generated by the FTN95 compiler when the /P6 option is used. The disassembler simply shows DB (data byte) codes for each byte in this two-byte instruction:
Code:
401034 DB DF
401035 DB F1

instead of
Code:
401034 FCOMIP      FR0, FR1

It would be nice to be able to display the code bytes (in hex) along with the mnemonics in the F11 window.


Last edited by mecej4 on Sun Sep 10, 2017 2:04 pm; edited 1 time in total
Back to top
View user's profile Send private message
Robert



Joined: 29 Nov 2006
Posts: 218
Location: Manchester

PostPosted: Sun Sep 10, 2017 11:46 am    Post subject: Reply with quote

It is fantastic that people use the F11 assembler window in sdbg. I will amend the disassembler to include the P6 instructions and include the instruction bytes (like sdbg64 does).
Back to top
View user's profile Send private message Visit poster's website
DanRRight



Joined: 10 Mar 2008
Posts: 1521
Location: South Pole, Antarctica

PostPosted: Tue Sep 12, 2017 2:49 am    Post subject: Reply with quote

If I correctly understood that after fixing this bug there will be no access violation. But doesn't this mean that we will never find similar bugs in the future ??? I support Eddie's comment that this has to be considered as user's error

Respect to exorcism mecej4 implicitly hinted that I have to learn assembler and computer codes. No, not only I will not do that but will stress again and again that the debugger never has to switch to assembler by itself, only do that if user asks explicitly. In such cases there is usually no debugging information because the code was compiled with /nocheck. To not to scare novice programmers I suggest to babysit them as much as possible. The debugger in this case has to be very verbose and show warnings that this happened due to /nocheck in this particular subroutine and instruct user that some further debugging still can be done.


Last edited by DanRRight on Tue Sep 12, 2017 3:41 am; edited 3 times in total
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 715

PostPosted: Tue Sep 12, 2017 3:32 am    Post subject: Reply with quote

Dan, there is no strict relation between compiler option /check and the access violation. In fact, had Byte 34 been C0 instead of the erroneous BF, there would not even be a memory access, let alone an illegal memory access. Nor can you expect the debugger to catch lapses in the code generator. On the other hand, without the code generation bug, there would have been no access violation and your A.LT.A bug would have gone undetected for a longer time.

Consider this modified version of your code:

EQUIVALENCE (A,B)
...
IF (A .LT. B) THEN
...

and suppose that "..." stands for hundreds of lines, and that there is no code generation bug. Running this in the debugger, you would probably not notice that A and B have the same address unless you look at the assembly code.

I am not suggesting that everyone should learn assembler. I am simply pointing out that for detecting certain types of bugs you will have to rely on someone who does; and, if that someone is not you, the bug will probably survive for a longer time than otherwise.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1756
Location: Sydney

PostPosted: Tue Sep 12, 2017 6:07 am    Post subject: Reply with quote

Dan,

We all develop code with "devilry".
As we have done this for so many years, we have also developed ways of identifying these code "features" and ways of eliminating them.
Only recently I introduced something like "if(A.lt.A)", by typing too fast and not working in a sound proof room !
We have to test the code, with hopefully sufficient test options to eliminate these features.
I also remember Eddie's recent post of "ftn95 prog /c" where /c did nothing bad for years.

If I could suggest an approach I find helpful (no not /implicit_node) is to use a file compare utility and do an audit of all changes against a past version that appeared to work.

The other is to never give up on searching for any possible "devilry", as while you have a test case that might exposes the bug, it is the best time find it.

What scares me lately is the increase in typing errors that are creeping into my keyboard usage.

John
Back to top
View user's profile Send private message
Robert



Joined: 29 Nov 2006
Posts: 218
Location: Manchester

PostPosted: Tue Sep 12, 2017 8:44 am    Post subject: Reply with quote

Quote:
1. The NOP instruction, opcode 90h, is disassembled as INT 3, which has the opcode CCh.


It is correct. When your code is stepped sdbg places INT 3s over the NOPs - that is what the NOPs are for, they are places that INT 3s can be safely placed.
Back to top
View user's profile Send private message Visit poster's website
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 1644
Location: Yateley, Hants, UK

PostPosted: Tue Sep 12, 2017 11:21 am    Post subject: Reply with quote

John, Your increasing keyboard errors are symptomatic of other things than ageing or carelessness. I have a very small 'travel' laptop that recently stopped recording the letter 'n' - it transpired that it was a tiny bit of fluff (possibly cat hair) under the key, and it probably got there when I cleaned the keyboard, and the problem was resolved by a second clean. Of course keyboards do go bad, but less commonly than in the past. A change of keyboard can work wonders even if it is a user problem as the feel is different.

As for errors like IF (A .LT. A) (and indeed, as well as .GT. perhaps .LE. .GE. and .NE. need checking) and whether you get them via EQUIVALENCE, I can't see the point of equivalencing a single named variable to another. Perhaps someone can enlighten me.

Where EQUIVALENCE comes in useful is for example selecting single integer 'grey codes' from a big array to make them simpler to handle locally, and then the corresponding silly IF would be something like:

IF (i_all_grey_codes(76) .NE. i_grey_code_button_76) ...

(where the two had previously been EQUIVALENCEd.)

I respectfully suggest that coding something like that would be prima facie evidence that one needs to take a vacation.

Eddie
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group