forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Some issues with roundoff and a whole lot else.

 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> 64-bit
View previous topic :: View next topic  
Author Message
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 2388
Location: Yateley, Hants, UK

PostPosted: Sun Mar 05, 2017 6:20 pm    Post subject: Some issues with roundoff and a whole lot else. Reply with quote

I thought long and hard about the 64-bit version of FTN95. It is abundantly clear that Microsoft intends that the 64-bit version of Windows becomes the default, since it is installed on the majority of laptops for example, even those with soldered RAM of 2Gb that is not expandable.

The fundamental difference between 32-bit and 64-bit version of the compiler lies predominantly in the size of data structures (singly and together) that can be addressed. Whether or not one has single arrays that exceed 4Gb in length, a two-dimensional or more array can exceed the 4 GB length without individual array dimensions exceeding the capacity of INTEGER*4. Although notionally 4Gb is the limit, in 32-bit mode the limit is probably a lot smaller, around 2Gb by the time one has considered the operating system and the addressing limits of that.

It is sometime since I last looked at the pricing of a super-capacity PC, and I did so recently discovering that a mainboard and CPU could cost respectively GBP300 and GBP600, and that such a mainboard would support 128GB of RAM at a cost of about GBP1350. By the time one considers the other components, one is considering spending GBP3k on a system. It is definitely a function of need, because a tablet with Sparc Station capabilities can be had for under GBP50!

128Gb of RAM could support 64 2Gb data structures, or 32 at 4Gb, with this super specification, and as it is likely that one or a few of those arays will take up most of that RAM, a 64-bit compiler needs only consider that a few data structures will be used in any program. Even with the likely future developments it is rather improbable that the average Windows user will boast that level of RAM for some years to come. Indeed, my observation is that many machines are far less well-equipped because size, weight, and battery life are where technical innovations being made fastest.

That brings me to FTN95 64-bit, which is far as I can tell is a completely new product rather than an evolution of the 32-bit version. I can see why users who need the larger data structures would sacrifice everything for speed, as cranking through those huge data structures is inevitably time-consuming. I can see how using the SSEx instructions helps vs. x87 instructions. However, the whole point of the x87 extended-to-80-bit arithmetic is to reduce round off error. The switch to SSEx will ensure that some numerical procedures actually yield different answers. This takes me back some 35 years to Microsoft Fortran which had different libraries for coprocessor arithmetic and coprocessor emulation, and the two versions decidedly gave different answers: huge differences in cases where single precision arithmetic was done.

I recently came across a situation where an application (not in FTN95, but in some Pascal variant) produced quite different results for its single precision DOS version to those of its double precision Windows version. A very contentious issue when reviewing analysis and design in the course of a litigation.

It struck me that if FTN95 32-bit and 64-bit versions are completely or largely separate, and have therefore the potential to produce different results, there could be some quite unhappy users. This is compounded by the fact that many coders while professing an understanding of round off do not genuinely understand every issue. I include myself in this. I’m not even sure that all computer scientists fully understand it. For example, FTN95 issues a warning for comparing floating point numbers when it is obvious that a comparison to 0 is not attempting to trap the result of the floating point operations that yield exactly 0, but is instead picking up errors in data input that could lead to a numerical overflow. It is better to trap this with an IF statement and exit gracefully rather than put up with an FTN95 error. I appreciate that to Silverfrost, the error message is graceful exit, but it certainly isn’t to an end-user!

The actual arithmetic need not be any different between a 32-bit and a 64-bit version of the compiler, and
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Mon Mar 06, 2017 12:21 am    Post subject: Reply with quote

Eddie,
I am interested to read your discussion of round-off and 64-bit. There is a potential problem with historical codes where integer*4 subscripts limit the 64-bit addressing capacity. 2^31 addressing for real*8 still provides 16 gb.
Most pc’s and laptops now come with 8gb, which is not enough for 64-bit calculations. My main PC has 32gb memory and my main 64-bit program has a single large array with integer*8 addressing. The PC cost $AU 1,700, which is a reasonable cost. (Interesting that any extra I*8 performance overhead is not identifiable)
The recent round-off problem being discussed is with real*4, which has long been considered as unsuitable for most calculations.
Your discussion of x87 vs SSE is a comparison of round-off with REAL*8 vs REAL*10. This introduction of SSE instructions (before 64-bit addressing) did change the accuracy of some calculations, although there is not much of a practical alternative! The developers of the F90 standard would suggest you change the precision via SELECT_REAL_KIND, although this ignores the reality of the limited real number forms available on most pc’s or computer hardware. FTN95 /64 does not provide any alternative, but most calculations get adequate accuracy with REAL*8.
For my field of structural engineering, I had few examples that had problems with increased round-off error. The one notable problem I did investigate ( I actually wrote a real*10 solver to test it) showed the problem was not real*8 vs real*10 being the cause, but a poorly defined model. Fortunately I always do a re-calculation of error estimate to check for this.
You are correct that moving to 64-bit can change the round-off issue, but your example appears to be real*4, which is a programmer’s error.
The errors in computation is an important issue, although the litigation that sometimes follows can be a very lucrative area for the lucky expert witness.
Back to top
View user's profile Send private message
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 2388
Location: Yateley, Hants, UK

PostPosted: Mon Mar 06, 2017 11:57 pm    Post subject: Reply with quote

John,

It is useful to point out that even with 32-bit mode it is possible to define a single-dimension array bigger than memory.

Yes, lots of laptops do offer 8Gb, and desktops sometimes more. Most mainboards have a RAM limit that is 32Gb, and my point was that you go into a very costly area if you want more: firstly to buy the RAM, secondly to have a mainboard that supports it.

In the case I cited, the firm of engineers did not appreciate why there was a difference in the results, and my guess is that the software company didn't really understand it properly. The differences weren't in significant figures, as the algorithm works adequately at sliderule accuracy if you have unlimited time, but it was enough to create an issue exercising minds of highly paid lawyers and lots of them.

Incidentally, x87 is only 80-bit provided you don't put intermediate results back in RAM in REAL*8 variables. It's a hybrid. The point is that there is a potential for a FTN95 program to provide different results in 32-bit X87 mode and 64-bit SSE mode when really the only difference should be how big a problem you can run.

I have a mental model of when roundoff becomes important that isn't mathematically pure but helps. That is whether or not intermediate calculations have a physical meaning or not. So partway through inverting a matrix you can't say what the numbers are in physical units, so there roundoff might be critical, but if you can positively assert that such and such a calculation yields a force or moment, roundoff probably isn't so significant.

Real*4 still has a use, and in your structural engineering field it would be a mighty structure that didn't permit you to input a millimetre-accurate coordinate for a bolt hole or member end.

It's even better with REAL*6, which is essentially what I started out on with a real being formed from 2x 24bit words, but there are probably benefits associated with modern architectures. Perhaps.

Eddie

PS (7th March)
The 8Gb laptop illustrates one of my points even better than I realised. It is not possible to have more than 3 arrays bigger than 2Gb in this RAM. I suspect that your typical analysis has only one - the stiffness matrix. The point being that the facility to address everything in 64-bit mode seems to me to be rather pointless.
E
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Thu Mar 09, 2017 3:02 am    Post subject: Reply with quote

Eddie,

Without /64, the largest single array you can define is less than 2gb. With /64 it is best that the total memory usage is less than that installed. 8gb of installed memory provides for a 7gb array and works very well.
I have done a lot of /64 development on an 8gb pc.
I sometimes forget which PC I am using and run a problem requiring more than 8gb of allocated arrays, which is not easy to recover from, as when Windows goes to virtual memory, it is not a graceful exit; it just hangs for a long time.

John
Back to top
View user's profile Send private message
LitusSaxonicum



Joined: 23 Aug 2005
Posts: 2388
Location: Yateley, Hants, UK

PostPosted: Thu Mar 09, 2017 9:50 am    Post subject: Reply with quote

John,

I'm not knocking 8Gb. I've done very useful work on a whole lot less, and I suspect that you have too. The ultimate point is that one rarely needs a lot of arrays > 1Gb, 2Gb or 4Gb (take your pick) and therefore a compiler that allows everything to be addressed via 64-bit is somewhat nonsensical. Moreover, to even accommodate more than 16 such large(2Gb) structures, one moves into a different arena of cost, and even then the number only enlarges to 64 (32 at 4Gb, 16 at 8 and so on). If one such structure is (say) 16Gb or more on its own, the number of possible other structures drops dramatically.

What Silverfrost do is up to them, taking into account their commercial position, but it does strike me that to write a whole new compiler is more work than is strictly necessary than to implement 64-bit addressing for a small number of data structures (again, defining small as 1, 2, 16, 128 or whatever you choose). We can already see that there are incompatibilities between the FTN95 64 and 32 bit versions, new bugs and so on. Plus, there is the issue of different roundoff, possibly leading to some differences in results, caused by implementing a different maths approach.

Implementing SSE on one compiler and x87 on the other, seems to me to be an unnecessary (and rather undesirable) additional factor, although if Silverfrost want to do it, then it is their business.

It's a while since I last programmed and ran FE analyses, and getting more elements and nodes on 16Mb was a revelation. I found that there was a point when increasing mesh refinement did not improve results, as one simply got closer and closer to singular points (I was doing Laplacian stuff) and produced more nonsense than value.

E
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Thu Mar 09, 2017 10:49 am    Post subject: Reply with quote

Eddie wrote:
increasing mesh refinement did not improve results, as one simply got closer and closer to singular points


This is a very interesting point, especially regarding FE analysis. What you find for some problems is that as you refine the mesh near stress concentrations, you get high localised stress (eg a fillet weld, which solids modelling packages are good at) However the rules in the design standards do not cope with this very localised stress, so a design to the design standard is not a straight forward matter.

Eddie, I still use FTN95 /32 for a quick 100 line program, but anything larger is /64. (For quick small programs, it probably depends on what was the last setting I had in Plato)
Back to top
View user's profile Send private message
mecej4



Joined: 31 Oct 2006
Posts: 1885

PostPosted: Thu Mar 09, 2017 1:34 pm    Post subject: Reply with quote

There is one important feature of X64 that makes code generation more complicated than simply changing addresses from 32 to 64 bits: the dominance of IP-relative addressing, for code as well as data. A more rudimentary version of this was present as the "short" jumps in x86 code. The result of IP-relative addressing is that the addresses stored in the code may be 4-bytes or less. On Linux, it is a requirment that code intended to be put into a shared library be PIC (position independent code).

The transition of FTN95 to x64 has to note and capitalise on this feature.

See http://www.patents.com/us-6732258.html and http://www.nynaeve.net/?p=192 .
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> 64-bit All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group