forums.silverfrost.com

mecej4 · Joined: 31 Oct 2006 Posts: 1943 Location: USA

For the following program, FTN95 V 8.51 generates bad X86 machine code when the options /opt /p6 are used.

PaulLaidler · Posted: Sat Aug 10, 2019 6:14 am Post subject:

Thank you for the bug report. Does the program fail or give incorrect results? It runs OK for me.

PaulLaidler · Posted: Sat Aug 10, 2019 6:26 am Post subject:

This is a bit strange. My explist is different but at the same time I don't recall that any changes have been made in this respect.

Perhaps we should wait till the next release to see if it is still a problem at your end.

mecej4 · Joined: 31 Oct 2006 Posts: 1943 Location: USA

It crashed with an access violation, since it tried to read memory at absolute address 00000003. The crash is at ERFDIF + 00000027.

I compiled with /opt /p6. Without /opt, the bug does not occur.

I posted only the portion of the assembly code listing encompassing the subroutine ERFDIF. The leading part of the /EXP listing is given below. I am curious to see the listing that you generated.

PaulLaidler · Posted: Sat Aug 10, 2019 1:49 pm Post subject:

This is what I get with the current developers' version...

mecej4 · Joined: 31 Oct 2006 Posts: 1943 Location: USA

Thanks; that listing does not exhibit the bug with the unsaved register being used across the function call.

I'll wait for the next version of the compiler.

LitusSaxonicum · Posted: Sat Aug 10, 2019 4:54 pm Post subject:

In the meantime, if you are desperate for the results of your program (even though you said that you'd wait), pre-calculating the two erf function results and then doing the subtraction in another statement does work. (As, I suspect, you knew already).

It did make me think about manipulating the error function, but a quick read of the Wikipedia page reminded me that I had better things to do with my time, like mowing the lawn!

Eddie

mecej4 · Joined: 31 Oct 2006 Posts: 1943 Location: USA

Eddie, I am not desperate at all, I am ready for Brexit or no Brexit. I have other compilers to use for such situations.

In the actual code where I noticed the problem, the Polyhedron AerMod benchmark, the error function is evaluated thousands of times, with arguments that are known only at run time, and covering the range 0 to very large values.

Had the code simply run, and produced incorrect results, I would not have noticed anything. However, the code ( over 50,000 lines) actually crashed, and investigation led me to the tiny reproducer that I reported.

Erf, Erfc and Erfd_scaled are standard intrinsic functions in F2008.

LitusSaxonicum · Posted: Sat Aug 10, 2019 6:02 pm Post subject:

Well if they are intrinsic in F2008, you are jolly lucky to find them in FTN95, then! (And anyway, is it a bit hopeful to expect the error function not to yield errors?)

Incidentally, what did people who wanted the erf do originally? Would it be the same if you used a user-written erf, or an erf function from a third-party library? Is it the same with two intrinsic functions of any sort, or just erf?

As an answer to my own question, AERMOD seems to use one of the series functions that one finds on the Wikipedia page, and not only that, the tactic used looks like precalculation of the results.

More seriously, do you genuinely get much benefit from /opt or /p6 anyway? (genuine enquiry there). I got put off /opt when it caused crashes, but that was years ago.

And as for other compilers, they may be brilliant at all sorts of things, but only FTN95 has Clearwin+ ...

Eddie

PS. There's much more chance of Brexit happening than of there being no bugs in any software.

mecej4 · Joined: 31 Oct 2006 Posts: 1943 Location: USA

The standard reference for transcendental functions is Abramowitz and Stegun, see http://people.math.sfu.ca/~cbm/aands/abramowitz_and_stegun.pdf ; see Chap. 7 for ERF. Netlib is the source for Fortran code (often decades old, though) for such functions.

FTN95's /opt gives some improvement in speed, but not as much as with Gfortran or Intel. In the assembler listings given above you can see many redundant loads and stores. However, as long as an option is provided and is likely to be used by a number of users, use of the option ought not to produce errors.

LitusSaxonicum · Posted: Sun Aug 11, 2019 9:42 am Post subject:

mecej4 · Joined: 31 Oct 2006 Posts: 1943 Location: USA

Sorry, the trick (assigning values of sub-expressions to new variables and then summing the variables) that you suggested is a risky solution. It works sometimes, fooling you into thinking that it is a reliable solution.

1. You try it out on a toy program, and it succeeds.

2. You try it on a slightly different toy program, and it fails.

3. You try it in a big program, where it changes the results. If you had skipped step 2, and you were not able to judge whether the results were correct, you would be tempted to accept erroneous results as correct.

Here is a counterexample.

LitusSaxonicum · Posted: Mon Aug 12, 2019 11:16 am Post subject:

Mecej4,

As you have shown that you like precise language, I refer you back to my previous post. To have a workaround, you need to know (a) that one is required, and (b) what does actually work. That, I’m afraid, is the job of documentation in the absence of a bug fix.

Regarding ‘toy programs’ you should note the very large number of complainants that report that their problems occur in ‘large’ programs, yet Paul inevitably responds with a request for a manageably small reproducer. I think that your example should send shivers down Paul’s spine when he realises the significance of your new example, that is that fixing the problem in such a small reproducer may not be the whole answer, as there is some deeper malaise. Therein, for me at least, is a most valuable point of your post.

The central point of my post was not that such a procedure was a solution, but if you knew etc.

As for whether or not you accept the results of a program as correct, then perhaps one should always be sceptical. With appropriate experience, one can detect nonsensical results, even if one cannot determine how and why they have been produced, or how large the error is. Unfortunately, scepticism is equated to 'denial of science' in some fields nowadays.

Just out of interest, I tried your new code using the user-supplied ERFX function from AERMOD, and found it worked. I then encapsulated the ERF function inside a user-supplied function, as follows:

mecej4 · Joined: 31 Oct 2006 Posts: 1943 Location: USA

Eddie, you make some excellent points in this post.

Optimizer bugs are elusive, hard to preserve while cutting away chunks of source code (in order to prepare a reproducer that is small enough to avoid the bug report being put into a "to do on a day when there is nothing else to do" list) and -- worst of all -- fixing the compiler to make it work properly on the reproducer does not guarantee that the fix will also work on the original application code.

You may find the event described in the following report interesting:

< http://www.envisage-project.eu/proving-android-java-and-python-sorting-algorithm-is-broken-and-how-to-fix-it/ >

The authors used a software formal verification tool to discover a flaw in the standard sort routine in the Java runtime, and proposed a fix. The Java developers acted upon the report, but implemented their own ad-hoc fix.

Currently sold compilers get updated at least once a year. Workarounds in users' source code, on the other hand, may stay in place far longer than the duration in which they had a purpose to serve. In fact, the Polyhedron Aermod source code -- in exactly the ERFDIF function that we have been discussing -- contains comments portraying some lines of code as workarounds for the "flakey Lahey compiler". There were many versions of the Lahey compiler that came after that workaround was added, and those versions did not need the workaround. Yet, the code changes have existed for three decades.

mecej4 · Joined: 31 Oct 2006 Posts: 1943 Location: USA

Lack of robustness is not just the compiler's fault. It can be caused by, for example:

1. Assuming that local variables are saved in some subroutines

2. Aliased actual arguments

3. Calling a subroutine inside a DO loop with the DO index variable as an argument that gets changed in the subroutine

4. Improper usage of mixed precision expressions
...
n. Any combination of all the preceding causes.