forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Performance penalty from using 64-bit integers

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support
View previous topic :: View next topic  
Author Message
mecej4



Joined: 31 Oct 2006
Posts: 1024

PostPosted: Wed Jan 10, 2018 4:48 pm    Post subject: Performance penalty from using 64-bit integers Reply with quote

When working with a test code related to a compiler bug (for details, see this recent thread: http://forums.silverfrost.com/viewtopic.php?p=23660), I thought that I had found another compiler bug: a small change in the program, namely, changing two variables from 32-bit integers to 64-bit integers, seemed to make the resulting EXE hang. It turns out that the program had become about twenty times slower (800 times slower than with Gfortran).

Here is the test program:
Code:
  program fbug
   implicit none
   integer, parameter :: N8 = selected_int_kind(15)
   integer, parameter :: HundredMill = 100000000
   integer(N8) :: i, j             ! could be plain integers, instead
   integer(N8) ::  s

   s = 0_N8
   do i = 1, 30
      do j=1,HundredMill
         s = s + j
      end do
      write(*,*)i,s
   end do

   write (*,*) 's =', s
   end

Here are some timing results from this program:
Code:
gftn -m32 -O2          0.047 s
gftn -m64 -O2          0.047 s
ftn95 /opt            40.89  s
ftn95 /opt /64         2.199 s

The GCC versions were 4.8 (32-bit) and 6.2 (64-bit), and I used FTN95 8.10, all on a laptop with an i5-4200U CPU and running Windows 10 64-bit.

I think that one has to be careful about using 64-bit integers with 32-bit FTN95. The use of X87 instructions for performing 8-byte integer arithmetic is probably the root cause of the slow-down.

Changing the DO loop index variables to 32-bit integers (by removing "(N8)" on Line 5) improves the timings, but there is room for much improvement.
Code:
gftn -m32 -O2          3.118 s
gftn -m64 -O2          1.256 s 
ftn95 /opt /64         2.221 s
ftn95 /opt            16.594 s

It is curious that the same change (removing "(N8)") that helped speed up the EXE compiled with FTN95 caused the EXE compiled with GFortran to slow down significantly. Please note that with "(N8)" in place, the GFortran compiler is smart enough to optimize away the inner loop, which explains the apparent high speed of the EXE that it produces.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5584
Location: Salford, UK

PostPosted: Wed Jan 10, 2018 5:38 pm    Post subject: Reply with quote

I have run this code on my machine using the developers' FTN95 and I can confirm the slowness for 32 bits.

For 64 bits I get:

gftn (not optimised) 8.9 secs.
ftn95 (not optimised) 9.1 secs.
ftn95 (optimised) 1.8 secs.

I used SYSTEM_CLOCK for timing and noted that gftn and ftn95 use different count rates.
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 1997
Location: Sydney

PostPosted: Fri Jan 12, 2018 1:23 am    Post subject: Reply with quote

Paul,

I am surprised by the improvement you report for your test with FTN95 /64 /opt. I have not been able to achieve similar results.

Would FTN95's 32-bit performance be due to the 8-byte integer instructions that are either not being used or are not available in 32-bit ?
In general I have been impressed by the performance of 8-byte integers, although I mainly generate 64-bit .exe.

Mecej4, I too have a i5-4200U (with 3mb cache) running Windows 10 64-bit. It's performance is very disappointing in comparison to other pcs and laptop that I have. A purchase I regret. Now considering an i7-8700K desktop, but so often, the improvements are minimal.
Back to top
View user's profile Send private message
John-Silver



Joined: 30 Jul 2013
Posts: 959
Location: Aerospace Valley

PostPosted: Fri Jan 12, 2018 8:11 am    Post subject: Reply with quote

following recent events, I wonder if PC's have had a 'built in mechanism' to make the machine run slower over a period of time thus prompting users to upgrade their machines. After all 'everyone' now has a PC, and for the vast majority the power of the machine is far in excess of what they need, so how do they maintain their sales ?
Couple to that the latest mysterious 'hack-open' loophole, apparently in ALL chips, and the amazingly quick software patch that has been produced - maybe they are trying to cover up some 'VW-esque' software () rip-off that's been in the marketplace for some time ? Stranger things have happened !

There seems to be some differences of opinion over whether the new 'fix' will affect performance, for example ....
https://betanews.com/2018/01/03/intel-security-flaw/

As a lay-programmer I suspiciously eye this as a devious Intel/M$ tactic to mis-inform, spread confusion and subsequently get people to take the easy (if not cheap) route out and just buy a new machine.

THe 'fix' could be anything and what's for sure is they won't tell anyone the real reason and how it's being tackled.
Think 1984. Power to the pigs !

Of course such performance hits will subsequently throw havoc into the ring of performance comparisons such as the one the subject of this post !
Back to top
View user's profile Send private message
John-Silver



Joined: 30 Jul 2013
Posts: 959
Location: Aerospace Valley

PostPosted: Fri Jan 12, 2018 8:26 am    Post subject: Reply with quote

having just written all that I drop upon this gem of an articl .....
https://betanews.com/2018/01/11/pc-market-up/

it's even headed with a photo of Paul ... clebrating his latest success in his 'great %pl bug-fix hunt ' Wink


Then there's this :
https://betanews.com/2018/01/03/meltdown-spectre-apocalypse/

if ever there was a scare-mogering Y2K-esque (remember that panic-fest ?) article this is it .... the minions will be queuing up already to purchas the latest over-powered, over-priced 'securty-flaw' hardware !!!
In the US the lawyers must be queuing up ad drooling at the mouth ready for action !
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 5584
Location: Salford, UK

PostPosted: Fri Jan 12, 2018 9:18 am    Post subject: Reply with quote

John

I don't know why 32 bit mode INTEGER*8 arithmetic is so slow.
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1024

PostPosted: Fri Jan 12, 2018 3:33 pm    Post subject: Re: Reply with quote

PaulLaidler wrote:
John

I don't know why 32 bit mode INTEGER*8 arithmetic is so slow.


The slowdown highlighted in this thread is probably of little significance to real life applications. Here we have created a loop which does little but gets executed billions of times. Real applications do not do such things.

In 64-bit mode, the code that FTN95 produces for the inner loop is just six instructions long. Two of those instructions could be removed as stated in the comments following '#'.
Code:
N_6:
ADD_Q     RDI,RSI
MOV_Q     R15,RSI         # REMOVE
INC_Q     RSI
MOV_Q     R15,RSI          # REMOVE
CMP_Q     R15,100000000   # REPLACE R15 by RSI
JLE       N_6
#Storing information in registers at exit of loop
MOV_Q     S,RDI
MOV_Q     J,RSI

More importantly, the instructions make no memory references.

The corresponding 32-bit code, however, makes lots of memory references:
Code:
Label     __N6     
mov       ecx,S         
mov       eax,S[4]     
add       ecx,J         
adc       eax,J[4]     
mov       Temp@1,ecx   
mov       Temp@1[4],eax
mov       eax,Temp@1   
mov       edi,Temp@1[4]
mov       S[4],edi     
mov       S,eax         
mov       edi,J         
mov       ecx,J[4]     
add       edi,1_4       
adc       ecx,1_4[4]   
mov       Temp@2,edi   
mov       Temp@2[4],ecx
mov       ecx,Temp@2   
mov       eax,Temp@2[4]
mov       J[4],eax     
mov       J,ecx         
qfild     100000000_4   
qfild     J             
fcomip    fr0,fr1       
ffree     fr0           
jbe       __N6         


There is quite a bit of copying and fetching of temporary results to/from memory. The use of X87 instructions just to test if the DO loop is done is also rather odd.

In real life, where something substantial is done inside the loop, these inefficiencies are probably have negligible effect. We just need to be careful not to use 8-byte integers for DO loop index variables unless they are necessary.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> Support All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group