|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Thu Sep 19, 2019 2:29 am Post subject: Optimisation bug, 32-bit FTN95 8.51 |
|
|
The following reproducer contains only one line where the local variable jww is assigned a value. The program is error free, and that it is so can be checked by compiling with /checkmate or with another Fortran compiler.
When a 32-bit EXE is built using FTN95 8.51 with /opt and run, the output is
Code: | IWEL JWW KWW
1 25 11
jww at line-40 = 0
**** STOP: Bug encountered |
instead of the correct output
Code: | IWEL JWW KWW
1 25 11
2 19 6
3 20 6
4 14 11
5 15 11
6 16 11
dz = 12.5000 |
How did the variable jww get set to zero?
The bug does not occur with FTN95 7.20 (note: that version will require that initialisation expressions be written with '(/ ... /)' instead of '[ ... ]'). Nor does the bug occur when 64-bit EXEs are built, with or without /opt .
The source code:
Code: | module wells
implicit none
integer, parameter :: NWM = 6, NXM = 41, NZ = 18
integer :: nw, nx
integer, dimension(NWM) :: jw, kw, lcbotw, lctopw
real, dimension(NXM) :: x
end module wells
subroutine initwh(dz)
use wells
implicit none
integer :: iwel, k, l, is, jww, kww, jww0
real :: dz, wisec(4)
print *,' IWEL JWW KWW'
do iwel = 1 , nw
kww = kw(iwel)
jww = jw(iwel) ! only place where jww is set
print '(3I5)',iwel,jww,kww
jww0 = jww ! save jww for checking later
do k = lcbotw(iwel) , lctopw(iwel)
do is = 1 , 4 ! This loop has no purpose other than
wisec(is) = 0. ! to instigate the bug, in this abridged
enddo ! test program. It is needed in the full program.
do l = 1 , 2
if ( k==1 ) then
if ( l==1 ) cycle
dz = 0.5*(x(2)-x(1))
elseif ( k==nx ) then
if ( l==2 ) cycle
dz = 0.5*(x(k)-x(k-1))
elseif ( l==1 ) then
dz = 0.5*(x(k)-x(k-1))
else
dz = 0.5*(x(k+1)-x(k))
endif
if (jww /= jww0) then ! should never be .true.
print *,'jww at line-40 = ',jww
stop 'Bug encountered'
endif
enddo
enddo
enddo
return
end subroutine initwh
program sim
use wells
implicit none
integer i
real dz
!
nw = NWM
nx = NXM
jw = [25, 19, 20, 14, 15, 16]
kw = [11, 6, 6, 11, 11, 11]
lcbotw = [10, 15, 17, 19, 20, 23]
lctopw = [22, 17, 27, 20, 23, 32]
x(6:41) = [(-525.0+i*25.0, i=6,41)]
x(1:5) = [-750., -625., -525., -450., -410.]
call initwh(dz)
print *,'dz = ',dz
end program |
|
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Thu Sep 19, 2019 5:40 am Post subject: |
|
|
I can confirm that it fails with my install of Ver 8.51.
I did some additions to the code, which does not appear to change the error.
I am reporting the address of the local stack variables to see their relative location and values of wisec and jww either side of "do is"
jww is changed during operation of the optimised "do is" loop
Code: | module wells
implicit none
integer, parameter :: NWM = 6, NXM = 41, NZ = 18
integer :: nw, nx
integer, dimension(NWM) :: jw, kw, lcbotw, lctopw
real, dimension(NXM) :: x
end module wells
subroutine initwh(dz)
use wells
implicit none
integer :: iwel, k, l, is, jww, kww, jww0
real :: dz, wisec(4)
!
write (*,*) 'is ',loc(is)
write (*,*) 'jww ',loc(jww)
write (*,*) 'kww ',loc(kww)
write (*,*) 'jww0 ',loc(jww0)
write (*,*) 'wisec',loc(wisec)
write (*,*) 'dz ',loc(dz)
!
is = 0
wisec = 1
print *,' IWEL JWW KWW'
do iwel = 1 , nw
kww = kw(iwel)
jww = jw(iwel) ! only place where jww is set
print '(3I5)',iwel,jww,kww
jww0 = jww ! save jww for checking later
do k = lcbotw(iwel) , lctopw(iwel)
write (*,*) k,is,jww,kww, wisec
do is = 1 , 4 ! This loop has no purpose other than
!zz write (*,*) k,is,jww,kww, wisec ! this print changes the bug
wisec(is) = 0. ! to instigate the bug, in this abridged
enddo ! test program. It is needed in the full program.
write (*,*) k,is,jww,kww, wisec
do l = 1 , 2
if ( k==1 ) then
if ( l==1 ) cycle
dz = 0.5*(x(2)-x(1))
elseif ( k==nx ) then
if ( l==2 ) cycle
dz = 0.5*(x(k)-x(k-1))
elseif ( l==1 ) then
dz = 0.5*(x(k)-x(k-1))
else
dz = 0.5*(x(k+1)-x(k))
endif
write (*,*) k,l,jww
if (jww /= jww0) then ! should never be .true.
print *,'jww at line-40 = ',jww
stop 'Bug encountered'
endif
enddo
enddo
enddo
return
end subroutine initwh
program sim
use wells
implicit none
integer i
real dz
!
nw = NWM
nx = NXM
jw = [25, 19, 20, 14, 15, 16]
kw = [11, 6, 6, 11, 11, 11]
lcbotw = [10, 15, 17, 19, 20, 23]
lctopw = [22, 17, 27, 20, 23, 32]
x(6:41) = [(-525.0+i*25.0, i=6,41)]
x(1:5) = [-750., -625., -525., -450., -410.]
call initwh(dz)
print *,'dz = ',dz
end program |
|
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2554 Location: Sydney
|
Posted: Thu Sep 19, 2019 6:17 am Post subject: |
|
|
I can confirm, for the revised test code version I have posted:
FTN95 Ver 8.51 fails
FTN95 Ver 8.50 fails
FTN95 Ver 8.40 works ok
FTN95 Ver 8.30 works ok
Also replacing the "do is" loop with wisec = 0 also removes the appearance of the bug. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7927 Location: Salford, UK
|
Posted: Thu Sep 19, 2019 7:19 am Post subject: |
|
|
Many thanks for the bug report and comments. I have made a note that this needs fixing. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Thu Sep 19, 2019 10:15 am Post subject: Re: |
|
|
JohnCampbell wrote: | ... replacing the "do is" loop with wisec = 0 also removes the appearance of the bug. |
That kind of response to minor changes is typical of optimiser bugs. Simply adding a PRINT statement to display the values of suspected variables can make the bug disappear, and this has caused some programmers to call such bugs "Heisenbug"s.
This property makes it troublesome to prepare a reproducer. We may try paring away a few lines of seemingly unrelated source code, hoping that the bug will not disappear. If it does disappear, as it often does, we have to revert to the previous version of the source code, and look for something else to cut out. The reduction in size is definitely very slow until, say, something like 50 percent reduction has been reached. Near the end, progress is often faster than one expects.
The fast compilation of FTN95 helps to make all this less of a burden, but it is an obstacle that we are unable to use SDBG with code compiled with /opt. Other related problems are the choice of a proprietary format for 64-bit OBJ files and the unavailability of tools (such as Microsoft's DUMPBIN /disasm) to list the instructions in the OBJ files. The /EXP listings are useful in other contexts, but the display of local variables by name (rather than by RBP or RSP offsets), or even pseudovariable names such as "extracted_expression_73" is often not enough when an addressing error is being investigated.
John, thanks for testing with other versions of the compiler and confirming the bug. Your comments regarding the DO IS loop are also helpful. I had observed that while preparing the reproducer, but decided not to describe it in order to fit the important matters into a single posting.
P.S. My shorter reproducer, posted hours after I wrote this posting, leads to an explanation of why replacing the DO IS loop by an array assignment removes the bug. The rules of Fortran require that after the DO IS=1,4 loop completes four iterations, the DO index IS should be 4+1, i.e., 5. Thus, the DO loop has the same effect as the array assignment, with one addition: setting the index variable at loop exit. The bug happens with setting the index variable. More generally, I think that the bug occurs whenever the compiler unrolls a DO loop and sets the value of the DO index variable to its expected value from the normal (unoptimised) execution of a DO loop.
Last edited by mecej4 on Sat Sep 21, 2019 12:35 pm; edited 3 times in total |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1886
|
Posted: Thu Sep 19, 2019 6:04 pm Post subject: |
|
|
Here is a shorter reproducer for the issue.
Code: | program bwel
integer iw(4), jw
call wel(iw)
print *,iw
end program
subroutine wel(iw)
integer iw(4)
integer i, jww
jww = 5
do i=1,4
iw(i) = i
end do
print *,'jww = ',jww
return
end |
Compile with /opt /p6, link and run. With V 8.51, the printed value of JWW is 0, whereas with V 7.2 it is 5.
The /exp listing of this code with V 8.51 illustrates how the nature of the listing makes the problem a bit obscure. Here is an extract:
Code: | 0010 jww = 5
0011 do i=1,4
0012 iw(i) = i
0000000d(52/7/53) mov eax,address of IW
00000010(51/3/19) mov JWW,=5 ; <<=== [ebp-10h]
00000017(53/7/53) mov [eax],=1
0000001d(54/6/58) mov [eax+4],=2
00000024(55/5/64) mov [eax+8],=3
0000002b(56/4/70) mov [eax+12],=4
00000032(57/4/76) mov I[4],=0 ; <<=== [ebp-10h]
00000039(58/4/76) mov I,=5 |
There is nothing in the listing to indicate that JWW and I[4] occupy the same address. Running dumpbin /disasm on the OBJ file shows that both these occupy dword ptr [ebp-10h].
The last two instructions in the extract are setting the local variable I to the value that it should have after the DO loop terminates, but the compiler seems to think that I is an 8-byte integer. For some reason, the upper 4 bytes overlap the local variable JWW.
Last edited by mecej4 on Fri Sep 20, 2019 4:27 am; edited 3 times in total |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2818 Location: South Pole, Antarctica
|
Posted: Fri Sep 20, 2019 1:38 am Post subject: |
|
|
Good that mecej4 already not the first time addressing optimization issue of the compiler. This is ages old problem. Pity no one at SF likes to dig into the optimization which might bring the largest advancement of the compiler itself together with f90 and 64bits since its day one as FTN77.
Hope that some day somebody also will demand compatibility with MPI/CUDA parallel instructions to add to the options and finally the FTNXX for supercomputers (Linux and Windows) will be made. Personal supercomputing era is coming soon. Even with mere 128 cores in personal use, which is just two chips currently, our PIC code runs as fast as 1000-core supercomputer with its time share and highly congested usage (you got 24 hours then you wait in the queue for few days for continuation. And then your usage limit ends ). Prices should drop like a rock with the AMD competition. You can buy previous generation 16-core Intel Xeon E5 chips for $100-200 on eBay currently. And for the supercomputers often it's not the processor speed but the RAM speed and interconnect are the performance limiting factors. Rewriting the codes from C to Fortran also may give factor of 3 speedup in some cases probably |
|
Back to top |
|
|
LitusSaxonicum
Joined: 23 Aug 2005 Posts: 2388 Location: Yateley, Hants, UK
|
Posted: Fri Sep 20, 2019 12:41 pm Post subject: |
|
|
Dan has a point. I always used the optimisation options automatically with every compiler until I moved over to FTN95, and hen (probably wrongly) I put some of the problems down to asynchronicity within Windows. As a result, I've avoided /opt, and never encountered such bugs afterwards.
I'm in the lucky position that my stuff runs adequately quickly on any computer I care to run it on even without /opt, but each time I see the Polyhedron benchmarks (which I try to avoid) I have a pang of envy that other, lesser, compilers have faster runtimes. It's like the benchmarks for Intel v. AMD. When I look at the scores, Intel wins. But when in the real world I've had to use Intel machines, I simply don't experience that difference.
And as for PC v. supercomputer, I can remember when my 16Mb RAM machine could run problems that many mainframes couldn't tackle, and the runtime was only a minor component of the time between job submission and receiving the results, as the immediacy of the PC beat a timesharing system hands down.
So please, Paul, do take the optimisation issues seriously.
Eddie |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2818 Location: South Pole, Antarctica
|
Posted: Sun Sep 22, 2019 9:10 am Post subject: |
|
|
Yes, Eddie, Polyhedron run results may be irrelevant, or may be are the example of bad code, but for general public unfamiliar with the subject they like hit below the waist undermine the whole idea of using FTN95 and even Fortran as fast language. Pain and no excuse seeing that for 20 years !
Though I personally did not suffer from that because the key for me is fast development, deep error checking, GUI, while super-fast run I get from parallel libraries which use multi-core processors and made in all different compilers so you can chose the fastest.
We also started using more supercomputers recently. There FTN95 also offers the fastest read speed, clearly exceeding HDF5 while doing that with ultimate simplicity (were not able to use HDF5 directly with FTN95 yet, may be Silverfrost developers will help and compile their sources in Fortran or C or make a DLL)
Still this polyhedron blow is totally unacceptable and has to be resolved somehow, don't you agree at SF? |
|
Back to top |
|
|
John-Silver
Joined: 30 Jul 2013 Posts: 1520 Location: Aerospace Valley
|
Posted: Thu Sep 26, 2019 7:20 am Post subject: |
|
|
The polyhedron published results are getting quite an airing again on several posts recently.
.... so after starting to write some comments on here I decided to create a new thres'ad dedicated to it.
You can find it HERE _________________ ''Computers (HAL and MARVIN excepted) are incredibly rigid. They question nothing. Especially input data.Human beings are incredibly trusting of computers and don't check input data. Together cocking up even the simplest calculation ... "
Last edited by John-Silver on Thu Sep 26, 2019 9:31 am; edited 2 times in total |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7927 Location: Salford, UK
|
Posted: Fri Oct 11, 2019 12:03 pm Post subject: |
|
|
This bug has now been fixed for the next release of FTN95. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|