|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
mecej4
Joined: 31 Oct 2006 Posts: 1887
|
Posted: Wed Jan 10, 2018 4:48 pm Post subject: Performance penalty from using 64-bit integers |
|
|
When working with a test code related to a compiler bug (for details, see this recent thread: http://forums.silverfrost.com/viewtopic.php?p=23660), I thought that I had found another compiler bug: a small change in the program, namely, changing two variables from 32-bit integers to 64-bit integers, seemed to make the resulting EXE hang. It turns out that the program had become about twenty times slower (800 times slower than with Gfortran).
Here is the test program:
Code: | program fbug
implicit none
integer, parameter :: N8 = selected_int_kind(15)
integer, parameter :: HundredMill = 100000000
integer(N8) :: i, j ! could be plain integers, instead
integer(N8) :: s
s = 0_N8
do i = 1, 30
do j=1,HundredMill
s = s + j
end do
write(*,*)i,s
end do
write (*,*) 's =', s
end |
Here are some timing results from this program:
Code: | gftn -m32 -O2 0.047 s
gftn -m64 -O2 0.047 s
ftn95 /opt 40.89 s
ftn95 /opt /64 2.199 s
|
The GCC versions were 4.8 (32-bit) and 6.2 (64-bit), and I used FTN95 8.10, all on a laptop with an i5-4200U CPU and running Windows 10 64-bit.
I think that one has to be careful about using 64-bit integers with 32-bit FTN95. The use of X87 instructions for performing 8-byte integer arithmetic is probably the root cause of the slow-down.
Changing the DO loop index variables to 32-bit integers (by removing "(N8)" on Line 5) improves the timings, but there is room for much improvement.
Code: | gftn -m32 -O2 3.118 s
gftn -m64 -O2 1.256 s
ftn95 /opt /64 2.221 s
ftn95 /opt 16.594 s
|
It is curious that the same change (removing "(N8)") that helped speed up the EXE compiled with FTN95 caused the EXE compiled with GFortran to slow down significantly. Please note that with "(N8)" in place, the GFortran compiler is smart enough to optimize away the inner loop, which explains the apparent high speed of the EXE that it produces. |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7930 Location: Salford, UK
|
Posted: Wed Jan 10, 2018 5:38 pm Post subject: |
|
|
I have run this code on my machine using the developers' FTN95 and I can confirm the slowness for 32 bits.
For 64 bits I get:
gftn (not optimised) 8.9 secs.
ftn95 (not optimised) 9.1 secs.
ftn95 (optimised) 1.8 secs.
I used SYSTEM_CLOCK for timing and noted that gftn and ftn95 use different count rates. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2555 Location: Sydney
|
Posted: Fri Jan 12, 2018 1:23 am Post subject: |
|
|
Paul,
I am surprised by the improvement you report for your test with FTN95 /64 /opt. I have not been able to achieve similar results.
Would FTN95's 32-bit performance be due to the 8-byte integer instructions that are either not being used or are not available in 32-bit ?
In general I have been impressed by the performance of 8-byte integers, although I mainly generate 64-bit .exe.
Mecej4, I too have a i5-4200U (with 3mb cache) running Windows 10 64-bit. It's performance is very disappointing in comparison to other pcs and laptop that I have. A purchase I regret. Now considering an i7-8700K desktop, but so often, the improvements are minimal. |
|
Back to top |
|
|
John-Silver
Joined: 30 Jul 2013 Posts: 1520 Location: Aerospace Valley
|
Posted: Fri Jan 12, 2018 8:11 am Post subject: |
|
|
following recent events, I wonder if PC's have had a 'built in mechanism' to make the machine run slower over a period of time thus prompting users to upgrade their machines. After all 'everyone' now has a PC, and for the vast majority the power of the machine is far in excess of what they need, so how do they maintain their sales ?
Couple to that the latest mysterious 'hack-open' loophole, apparently in ALL chips, and the amazingly quick software patch that has been produced - maybe they are trying to cover up some 'VW-esque' software () rip-off that's been in the marketplace for some time ? Stranger things have happened !
There seems to be some differences of opinion over whether the new 'fix' will affect performance, for example ....
https://betanews.com/2018/01/03/intel-security-flaw/
As a lay-programmer I suspiciously eye this as a devious Intel/M$ tactic to mis-inform, spread confusion and subsequently get people to take the easy (if not cheap) route out and just buy a new machine.
THe 'fix' could be anything and what's for sure is they won't tell anyone the real reason and how it's being tackled.
Think 1984. Power to the pigs !
Of course such performance hits will subsequently throw havoc into the ring of performance comparisons such as the one the subject of this post ! |
|
Back to top |
|
|
John-Silver
Joined: 30 Jul 2013 Posts: 1520 Location: Aerospace Valley
|
Posted: Fri Jan 12, 2018 8:26 am Post subject: |
|
|
having just written all that I drop upon this gem of an articl .....
https://betanews.com/2018/01/11/pc-market-up/
it's even headed with a photo of Paul ... clebrating his latest success in his 'great %pl bug-fix hunt '
Then there's this :
https://betanews.com/2018/01/03/meltdown-spectre-apocalypse/
if ever there was a scare-mogering Y2K-esque (remember that panic-fest ?) article this is it .... the minions will be queuing up already to purchas the latest over-powered, over-priced 'securty-flaw' hardware !!!
In the US the lawyers must be queuing up ad drooling at the mouth ready for action ! |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 7930 Location: Salford, UK
|
Posted: Fri Jan 12, 2018 9:18 am Post subject: |
|
|
John
I don't know why 32 bit mode INTEGER*8 arithmetic is so slow. |
|
Back to top |
|
|
mecej4
Joined: 31 Oct 2006 Posts: 1887
|
Posted: Fri Jan 12, 2018 3:33 pm Post subject: Re: |
|
|
PaulLaidler wrote: | John
I don't know why 32 bit mode INTEGER*8 arithmetic is so slow. |
The slowdown highlighted in this thread is probably of little significance to real life applications. Here we have created a loop which does little but gets executed billions of times. Real applications do not do such things.
In 64-bit mode, the code that FTN95 produces for the inner loop is just six instructions long. Two of those instructions could be removed as stated in the comments following '#'.
Code: | N_6:
ADD_Q RDI,RSI
MOV_Q R15,RSI # REMOVE
INC_Q RSI
MOV_Q R15,RSI # REMOVE
CMP_Q R15,100000000 # REPLACE R15 by RSI
JLE N_6
#Storing information in registers at exit of loop
MOV_Q S,RDI
MOV_Q J,RSI
|
More importantly, the instructions make no memory references.
The corresponding 32-bit code, however, makes lots of memory references:
Code: | Label __N6
mov ecx,S
mov eax,S[4]
add ecx,J
adc eax,J[4]
mov Temp@1,ecx
mov Temp@1[4],eax
mov eax,Temp@1
mov edi,Temp@1[4]
mov S[4],edi
mov S,eax
mov edi,J
mov ecx,J[4]
add edi,1_4
adc ecx,1_4[4]
mov Temp@2,edi
mov Temp@2[4],ecx
mov ecx,Temp@2
mov eax,Temp@2[4]
mov J[4],eax
mov J,ecx
qfild 100000000_4
qfild J
fcomip fr0,fr1
ffree fr0
jbe __N6
|
There is quite a bit of copying and fetching of temporary results to/from memory. The use of X87 instructions just to test if the DO loop is done is also rather odd.
In real life, where something substantial is done inside the loop, these inefficiencies are probably have negligible effect. We just need to be careful not to use 8-byte integers for DO loop index variables unless they are necessary. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|