Topic: 64-bit FTN95 in General

silverfrost

Posts: 286 Manchester

Back to Top

27 Oct 2014 8:30 #14948

In order to allow ClearWin+ users to run their code in 64-bit mode we developed a 64-bit ClearWin+ DLL that can be used with other FORTRAN compilers. This is included in FTN95 today and some users have made use of it to run their code in the CPU’s native 64-bit mode.

The 64-bit ClearWin+ DLL is a stop gap solution until we complete the work on our own 64-bit compiler. We plan to complete this work in 2015 and deliver a full 64-bit FORTRAN compiler.

-- Admin Silverfrost Limited

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

27 Oct 2014 8:52 #14949

Wow, that was the great great great news. Please do not lose your speed on fixing of possible bugs of current compiler! Will be glad to pay in advance and betatest

Wilfried Linder

Posts: 315 Düsseldorf, Germany

Back to Top

27 Oct 2014 11:01 #14951

These are very good news indead!

dpannhorst

Posts: 164 Berlin, Germany

Back to Top

27 Oct 2014 4:34 #14953

Hopefully it is not to late and you have not lost too much customers in the meantime to Intel!

Just in March 2012 other customers and me asked to hurry up the development of a 64 bit compiler (see topic 'Again: 64 bit Compiler' in this forum).

I hope this version will be available in early 2015. It is needed urgently!

Regards, Detlef Pannhorst

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Oct 2014 12:21 #14954

This is good news and I look forward to the updates. Should you need any testing, I would be pleased to help. I do have a few large memory testing programs, mainly related to numerical calculations.

64-bit is basically 32-bit and allocate.

One of the important changes that has accompanied 64 bit has been vector instructions. I use the SSE routines with FTN95, as these have allowed me to maintain performance in comparison to ifort and gFortran.

The convenience of maintaining existing clearwin code in a 64-bit environment will be a great improvement.

John

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

28 Oct 2014 12:39 (Edited: 28 Oct 2014 4:41) #14955

Quoted from JohnCampbell

64-bit is basically 32-bit and allocation John

This is exactly that rudiment of Intel Ifort design I want future 64bit compiler to avoid instead of slavishly copy it. Both static and dynamic allocations must have no hard limit. In few years may be a decade the 2GB static limit will be as archaic as 640K. And please no damn stack limitation either. If user has 8GB of RAM and writes

real*8 AAA(1000,1000,1000)
A(:,:,:) = random()
print*, AAA(1000,1000,1000)
end

the code should work. And if he has 8TB then

real*8 AAA(1000,1000,1000,1000)

should work without any adjustments too!

Would be also good to be compatible with Intel compiler LIB files and OpenMP if this will not delay the design of new compiler.

I do not know, but probably it has some sense to revive lazy allocation with the Virtual Common option since we are just at the beginning of potential 4 billion times increase of memory which will take 50-60 years to swallow

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Oct 2014 10:15 #14956

Dan,

With 64 bit code, the 2gb code restriction is due to the .exe format restriction. It is not a limit that will impact us. Using ALLOCATE in MODULES is an easy change to get use to. It encourages more flexibility in problem definition. I typically have just a few large arrays, which cause all the problems.

An area where FTN95 will need to pay more attention is with KIND, so that pointers and address references will need to change to Integer*8. LOC will change and I am not sure how CORE routines will change. It may be worth reviewing some of the F2003 and F2008 changes. Adopting ISO_Fortran_Env (2008) and ISO_C_Binding could be a useful way to future proof the changes required for ftn95 /64. I am not suggesting developing lots of the new features of F2008, but some of the changes do show a way of supporting both 32-bit and 64-bit in a more systematic way. I am sure we all have a wish list of new Fortran features, although getting FTN95 /64 available should be the first priority. (there are a few intrinsic functions I'd like available!)

John

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

29 Oct 2014 8:00 #14957

Thanks John. Perhaps you could remind us of some of these issues after the initial trials. Others hopefully will have already been addressed.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

29 Oct 2014 3:18 #14958

John, I currently use that exactly approach and it generally works OK. But this is because we are at the times when 2GB is considered a lot. It will be nothing as soon as next year after you start driving 64bit compiler. By the way I still do not like to mess with allocation/deallocation. And when your code growing and growing at some point all static allocation area will be used with the code not just arrays.

So who hardcoded 2GB limit, is it iFort or Microsoft? If it is hardcoded in the compiler we will end up with the same '640K is good for everyone'. Imagine how ridiculous this could be today. If this is Microsoft then i hope they will remove it in the future. Again if this is due to Intel Ifort restrictions and now all following it then my suggestion is that the new Silverfrost 64bit compiler should definitely avoid it and not hardcode the 2GB limit and of course remove all stack limits of former compiler (200MB or so) which made so much headache in the past

JohnCampbell

Posts: 2526 Sydney

Back to Top

29 Oct 2014 10:43 #14960

Dan,

My impression is that the 2gb limit is a Microsoft limit, due to the nature of the .exe format. There are two areas where the 2GB limit applies.

The 2gb limit on static variable allocation can be overcome by using ALLOCATE. This means that COMMON must be migrated to MODULE with ALLOCATE for larger arrays. I find this approach better and don't use COMMON in new code. The big change here is that EQUIVALENCE is no longer available, which has meant some changes for old non-standard code. The TRANSFER intrinsic can overcome some of these problems. I still have lots of mixed mode and mixed KIND error reports in subroutine calls, as I change the type of variable. I am gradually moving away from my old Fortran 66 record structures, but they all still work well.
The 2gb limit on code is a limit I don't expect to have any trouble with, as my code is no where near that limit. I doubt if you will have any trouble with this either.

One of the interesting changes with 64-bit is when you use more virtual memory than there is installed memory. You go back to using paged memory. Everything stops! It's horrendous! This is how we ran programs all the time in 70's and 80's and thought it was working well. Shows how quickly we forget and only remember the good times.

64-bit is not an excuse to be lazy, as poor memory and cache usage have a huge performance penalty, as we have seen with the equation solver examples last year. The virtual memory programming strategies are still very important.

John

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

30 Oct 2014 8:21 (Edited: 30 Oct 2014 8:33) #14961

John,

ALLOCATE is a bit different concept while it is obvious that static allocation being less capable still will never die if not make it die with 2GB restrictions. It is obvious that the array sizes will grow and grow and grow 2^32 times or 4 billion till the 64bit will be exhausted. Tiny fly still has memory size way larger than that 😃 So why kill it with stupid 2GB restriction? Static allocation is convenient and in many cases is completely sufficient for the code. Plus there are a lot of legacy programs which use it. Such programs will be used for a century at least. One of arguments against common blocks was that it is potentially susceptible to bugs. But exactly this compiler with its super diagnostics made this completely a non-issue.

Static allocation has potentially one killing application which was developed by this compiler developers for FTN77 and which becomes again extremely relevant with 64bits - virtual common. If your matrix is sparse you declare arrays the size you want till these gazillions of heaxabytes 64bit limit allows and compiler allocates only those memory cells which are actually used. That allows very convenient, transparent, simple methods of solution where you make easy maintainable and modifiable code which will have less bugs. You can not do that easily with ALLOCATE, that will require completely different much more tricky methods and sometimes there are no workarounds and you will end up buying tons of memory to run the code

Is virtual common in the supported feature list for 64bit compiler?

BTW if Microsoft made this 2GB restriction then it will be automatically lifted when there will be no 32 bit OSes and we have nothing here to argue about. I do not like that compiler developers make these restrictions.

PaulLaidler

Posts: 7975 Salford, UK

Back to Top

30 Oct 2014 8:29 #14962

Dan. This is a good question and one that I don't have an immediate answer for.

May I suggest that these questions be raised after the initial beta release so that we can focus on getting that done as soon as possible.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

30 Oct 2014 8:51 #14963

Paul, I do not know what path you guys pursue in 64bit compiler development -- is it same FTN95 with only workaround from 32 to 64bit or complete redesign from scratch. In last case it is important to plan new features very early in the development plan. I afraid that if 64bit compiler will miss such trendy things like parallelization with OpenMP or CUDA, or Ifort compatibility it will be much less used by mainstream users.

JohnCampbell

Posts: 2526 Sydney

Back to Top

30 Oct 2014 9:18 (Edited: 30 Oct 2014 10:42) #14964

Dan,

'declare arrays the size you want till these gazillions of heaxabytes'

What you are describing with a very large array which would take too long to analyse. I have recently solved a surveying problem for a 75km long x 500 metre wide navigation channel which covers an area of 60 km x 50km, with results reduced to 2 metre centres. There are about 60 bytes of information for each point; 750 million virtual points, but only 9 million active points. By mapping the active points to a virtual 2D page system, I reduce the storage to about 1% and importantly the calculation time by a similar factor. Sparse matrix approaches still have an important role in large problems, where often the performance savings become even greater. The luxury you are describing is not what you need !!

John

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

30 Oct 2014 10:40 #14965

What is too long to analyse? It is computer which analyses 😃 Matrices are almost empty by the way. Parallelization exists too. At the end computer have to tell us just one single number or the answer Yes or No

I have to use sparsity or do not even try to solve these tasks i run. The /VC compiler switch of FTN77 allowed me in 1990 run 2GB problem suitable only for supercomputers on Intel 386 PC with 1-4 MB or RAM which was a lot at that time. Having 1000x more RAM on PC was unimaginably at that time. Having billion times more memory is unimaginable right now too. But with the 64bit compiler with virtual common we can repeat similar jump into very distant future TODAY.

So using sparsity is important. My life would be easier if Fortran allowed options to declare the arrays Arr(i,j,k) say like these Arr(10000,10000,100) which contain large 10B number of elements to be of variable bandwidth with all the savings associated with this. That is to accomodate actual number of elements of real task.

For example Array with k=1 is Arr Arr(10000,10000,1) Array k=2 is (10000,10000,2) Array k=3 is (10000,10000,3) Array k=4 is only (10,10,4) Array k=5 is only (1,1,5) Array k=6 is larger (1000,1000,6) Array k=7 is also (1000,1000,7) and so on. Actual arrays i use have 4 and ideally for convenience would have 5 dimensions if 32bit allowed me such luxury. The 32 bit limit does not allow me to run comfortably literally any task i tried last decade or two.

Instead i have to declare array of full size Arr(10000,10000,1:3) and declare another Arr2(1000,1000,4:100) to fit all numbers, break the homogeneity of the code, making it way more complex and have tons of error due to that. All these arrays are still probably 99% empty.

JohnCampbell

Posts: 2526 Sydney

Back to Top

30 Oct 2014 11:26 #14966

Dan,

You could try something like the following, although I got a stack overflow with copies of test(k)%arr being placed on the stack with calls to report_size. This might be a problem for more general use of these arrays.

module test_mod
   type dan_array
     real*4, allocatable, dimension(:,:,:) :: arr
   end type dan_array
!
   type (dan_array) test(7)
!
   contains 

   subroutine report_size ( k, arr)
    integer*4 k
    real*4, dimension(:,:,:) :: arr
    write (*,10) ' Element ',k,' size=', size (arr), ' b1=',ubound(arr,1), ' b2=',ubound(arr,2), ' b3=',ubound(arr,3)
10  format (a,i0,a,b'z,zzz,zzz,zz#',3(a,b'zzz,zz#') )
   end subroutine report_size

end module test_mod

use test_mod
!
   integer k, sum_size
!
   allocate ( test(1)%arr(1000,1000,1) )
   allocate ( test(2)%arr(1000,1000,2) )
   allocate ( test(3)%arr(100,100,3) )
   allocate ( test(4)%arr(10,10,4) )
   allocate ( test(5)%arr(1,1,5) )
   allocate ( test(6)%arr(10000,1000,6) )
   allocate ( test(7)%arr(1000,1000,7) )
!
   sum_size = 0
   do k = 1,7
      if ( k/=6)  &
      call report_size ( k, test(k)%arr )   ! call fails as test(k)%arr is copied to stack
      write (*,*) 'Element',k,' size=', size (test(k)%arr), ubound(test(k)%arr,1), ubound(test(k)%arr,2)
      sum_size = sum_size + size (test(k)%arr)
   end do
   write (*,11) sum_size
11 format ('Total size = ',b'z,zzz,zzz,zz#')
!
   end

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

30 Oct 2014 8:12 #14967

John, Thanks for the demo. Before, like 10 years back, such Fortran-90 tricks did not work well, so i am very careful to move with them. What do you mean placed in the stack causing overflow? The same code but without allocate? Can you please also check if it overflows on Ifort/gFortran since you have tried many different Fortran compilers recently? I may use this code with the future FTNpro, it may save me on RAM 😃

JohnCampbell

Posts: 2526 Sydney

Back to Top

30 Oct 2014 9:46 #14968

The following change reports the memory address of test(k)%arr, indicating that a duplicate is being provided. gFortran reports the same address, indicating that this stack copy is not required. module test_mod type dan_array real*4, allocatable, dimension(:,:,:) :: arr end type dan_array ! type (dan_array) test(7) ! contains

   subroutine report_size ( k, arr)
    integer*4 k
    real*4, dimension(:,:,:) :: arr
    write (*,10) ' Element ',k,' size=', size (arr), ' b1=',ubound(arr,1), ' b2=',ubound(arr,2), ' b3=',ubound(arr,3)
    write (*,12) ' Start address= ',loc(arr(1,1,1)), loc(arr)
10  format (a,i0,a,b'z,zzz,zzz,zz#',3(a,b'zzz,zz#') )
12  format (a,2(b'z,zzz,zzz,zz#'))
   end subroutine report_size

end module test_mod

use test_mod
!
   integer k, sum_size
!
   allocate ( test(1)%arr(1000,1000,1) )
   allocate ( test(2)%arr(1000,1000,2) )
   allocate ( test(3)%arr(100,100,3) )
   allocate ( test(4)%arr(10,10,4) )
   allocate ( test(5)%arr(1,1,5) )
   allocate ( test(6)%arr(10000,1000,6) )
   allocate ( test(7)%arr(1000,1000,7) )
!
   sum_size = 0
   do k = 1,7
      if ( k/=6)  &
      call report_size ( k, test(k)%arr )   ! call fails as test(k)%arr is copied to stack
      write (*,12) ' Start address= ',loc(test(k)%arr(1,1,1))
      write (*,10) ' Element ',k,' size=', size (test(k)%arr), ' b1=',ubound(test(k)%arr,1), ' b2=',ubound(test(k)%arr,2)
      sum_size = sum_size + size (test(k)%arr)
   end do
   write (*,11) sum_size
10 format (a,i0,a,b'z,zzz,zzz,zz#',3(a,b'zzz,zz#') )
11 format ('Total size = ',b'z,zzz,zzz,zz#')
12  format (a,2(b'z,zzz,zzz,zz#'))
!
   end

After all these years of using FTN95, I still don't know how to change the stack size. can someone give an example of changing the stack. I compiled and linked with : ftn95 dan_array /link

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

31 Oct 2014 5:37 (Edited: 1 Nov 2014 6:49) #14969

Your intuition was right. That was easy, just remove the workaround condition for k=6 and link (minimum is stack:240000000 which is exactly the size of the largest k=6 array is enough with /nocheck option. With /debug /undef it has to be larger)

ftn95 arr.f95 slink arr.obj /stack:1000000000 /3gb

Why this large array goes to stack with /3gb by the way? Hope in the future 64bit compiler there will be no damn stack at all 😃

JohnCampbell

Posts: 2526 Sydney

Back to Top

1 Nov 2014 3:15 #14970

Dan,

I presume the stack value you gave is decimal. I could not find documentation that confirms this, as the example I found in Win32 platform > Using the linker > Reference > Interactive mode only gives the examples in hex. Do you know where hex octal and decimal syntax is described. I think I once suggested that 240m or 240000k should be supported in stead of 240000000, as I get lost in all the zeros.

Anyway, this does give an example of how to manage variable sizes and moving to 64-bit will not mean that we can ignore the memory usage. As I have described previously, when you run out of physical memory, everything appears to stop. When it first happens, you think it is like the blue screen system crash. I shut down the PC and then had to check all the disks!

I've found that moving from 2gb to 4gb then 8gb then 16gb does not dramatically change the types of problems you can solve, although it does make it a bit easier and a bit faster.

John