Topic: Allocate/Deallocate in General

davidb

Posts: 555 UK

Back to Top

9 Apr 2013 3:57 #12006

Quoted from JohnCampbell

The conclusion from this example is when you deallocate a large array, the memory it used is released from this process and can't be later used.

A Fortran Compiler doesn't have to carry out 'Garbage Collection' (google it) that would be needed for DEALLOCATE to release memory back to the heap (for future ALLOCATE or STACK use). Instead, its allowed to leave the memory unaccessible, and let the Operating System tidy up when the program run finishes.

The only Fortran compiler I know that does 'Garbage Collection' is NAG, which provides this as an option, though it increases run times when its enabled.

JohnCampbell

Posts: 2526 Sydney

Back to Top

9 Apr 2013 9:39 #12011

There appears to be two types of allocate.

small arrays (that fit) might be allocated on the heap/stack. (They certainly get addresses inside the existing memory footprint.)
Larger arrays are allocated elsewhere in available memory. For large arrays that do not fit on the 'heap', ALLOCATE still finds memory for them.

Certainly for the arrays that are allocated above address 2gb, these extend the memory footprint of the program. When they are released, the memory footprint changes. This can be observed when running task manager.

It appears to be that when these large arrays are released, DEALLOCATE initiates a change in the memory allocation, which can be observed in task manager. There must be something manageing this 'Garbage Collection'.

This is my present understanding of why I am getting the access violation after releasing large size arrays.

There is a similar concept with MODULE, where large arrays in a module are given their own address in the .map report. I'm not sure if they can be located from the main part of the module, or just identified in the ,map report.

Certainly FTN95's implementation of ALLOCATE and the ability to use up to 4gb of memory is a useful addition, (now with /debug).

It would be good if LOC and the address that CORE4 etc use could be integer8 or unsigned integer4, so that the addresses above 2gb could be more easily managed. I'm not sure how this could be improved. I shall expand on my use of q_new in the last post to see how this is being managed at present. (Could we have a BYTE type, say BYTE4. BYTE1 is also a useful option for colours. They could support == < > for comparison, + and - for abs value or mod of result)

John

JohnCampbell

Posts: 2526 Sydney

Back to Top

10 Apr 2013 3:43 #12014

The following example demonstrates the progressive allocation and use of large allocated arrays, then progressive release. I am using Win 7_64 OS.

You can track the memory usage in task manager. I have selected a number of memory usage measures, including: Working Set (memory) Peak Working Set (memory) Memory (Private Working Set) Commit Size PF Delta

This shows that memory is being allocated and then released. Resize also works, if the temporary array can be allocated.

      MODULE ARRAY_STORAGE 
      integer*4, parameter :: million = 1000000 
      integer*4 :: num_a = 500*million    ! 1.9gb 
      integer*4 :: num_i = 12000   !  sqrt(150*million)    ! 0.6gb 
!
!     REAL*8,    ALLOCATABLE :: ARRAY(:) 
      integer*4, ALLOCATABLE :: ARRAY(:) 
      real*4,    ALLOCATABLE :: a(:,:), b(:,:), c(:,:), d(:,:)
      integer*4 :: test_count = 0 
!
      CONTAINS

       subroutine sub_report (name, aa)
       character name*(*)
       integer*4 aa(:), i
       integer*8 j
!
       write (*,*) ' ARRAY ',name,' at ',loc(aa),' size ',size(aa)
       do i = 1,10
         write (*,'(i5,i8)') i, aa(i)
       end do
       i = loc(aa)  ; write (*,*) 'loc(aa) ', i
       i = core4(i) ; write (*,*) 'core4(i)', i
       j = loc(aa)  ; write (*,*) 'loc(aa) ', j
       j = core4(j) ; write (*,*) 'core4(j)', j
        if (j < 0) j = j + 2_4**32
       write (*,*) ' '
       end subroutine sub_report
!
       subroutine sub_report2 (name, aa)
       character name*(*)
       real*4 aa(:,:)
       integer*4 i
       integer*8 jloc, j
       external  jloc
!
       write (*,*) ' ARRAY ',name,' at ',loc(aa),' size ',size(aa)
       do i = 1,10
         write (*,'(i5,f12.2)') i, aa(i,1)
       end do
       i = loc(aa)  ; write (*,*) 'loc(aa) ', i
       i = core4(i) ; write (*,*) 'core4(i)', i
       j = jloc(aa) ; write (*,*) 'jloc(aa)', j
       j = core4(j) ; write (*,*) 'core4(j)', j
        if (j < 0) j = j + 2_4**32
       write (*,*) ' '
       end subroutine sub_report2
!
      END MODULE ARRAY_STORAGE 
!-------------------------------------------------------------------- 
      PROGRAM MAIN 
      USE ARRAY_STORAGE 
! 
         call test_allocate 
      call sub_allocate (0)
         call test_allocate 
      call sub_set 
         call test_allocate 
      call sub_print 
         call test_allocate 
!
      call sub_resize (num_a/2)
!      call sub_resize_big (num_a/2)
         call test_allocate 
!
      call sub_print 
      call sub_deallocate 
         call test_allocate 
! 
      END MAIN 
!-------------------------------------------------------------------- 

      subroutine sub_allocate (pass)
      USE ARRAY_STORAGE 
      integer*4 pass, stat 
      integer*8 jloc
      external  jloc
!
!  test of 4 smaller arrays
    if (pass == 0) then
!
      allocate (a(num_i,num_i),stat=stat)
         write (*,11) ' a allocated as size ', size(a)*4./(2.**20),' mb at ',jloc(a), stat 
       call sub_use (a, num_i)
       call sub_report2 ('a', a)
!
      allocate (b(num_i,num_i),stat=stat)
         write (*,11) ' b allocated as size ', size(b)*4./(2.**20),' mb at ',jloc(b), stat 
       call sub_use (b, num_i)
       call sub_report2 ('b', b)
!

JohnCampbell

Posts: 2526 Sydney

Back to Top

10 Apr 2013 3:45 #12015

!
      allocate (c(num_i,num_i),stat=stat)
         write (*,11) ' c allocated as size ', size(c)*4./(2.**20),' mb at ',jloc(c), stat 
       call sub_use (c, num_i)
       call sub_report2 ('c', c)
!
      allocate (d(num_i,num_i),stat=stat)
         write (*,11) ' d allocated as size ', size(d)*4./(2.**20),' mb at ',jloc(d), stat 
       call sub_use (d, num_i)
       call sub_report2 ('d', d)
!
      deallocate (d)
       call sub_use (c, num_i)
      deallocate (c)
       call sub_use (b, num_i)
      deallocate (b)
       call sub_use (a, num_i)
      deallocate (a)
!
    end if
!
!  Now main array
      ALLOCATE (ARRAY(num_a),stat=stat) 
      if (stat/=0) then 
         write (*,*) 'problem allocating array : stat=',stat 
         stop
      else 
         write (*,11) ' array allocated as size ', size(array)*4./(2.**20),' mb at ',loc(array) 
      end if 
11    format (a,f0.3,a,b'--,---,---,--#',i5) 
      end subroutine sub_allocate 

      subroutine sub_resize (num_new)
      USE ARRAY_STORAGE 
      integer*4 num_new
!
      integer*4 n
      integer*4 stat 
      integer*4, ALLOCATABLE :: temp_array(:) 
! 
      write (*,*) 'Resizing array' 
      num_a = size (array)
      n     = min (num_a, num_new)
      allocate (temp_array(n), stat=stat)
      if (stat/=0) then 
         write (*,*) 'problem allocating temp_array : stat=',stat
         stop
      else 
         write (*,11) ' temp_array allocated as size ', size(temp_array)*4./(2.**20),' mb at ', loc(temp_array) 
      end if
11    format (a,f0.3,a,b'--,---,---,--#') 
!
      temp_array(1:n) = array(1:n)
      deallocate (array)
!
      num_a = num_new  
      call sub_allocate (1)
!
      array(1:n)       = temp_array
      array(n+1:num_a) = 0                
      deallocate (temp_array)
!
      end subroutine sub_resize

      SUBROUTINE SUB_set 
      USE ARRAY_STORAGE 
      INTEGER*4 I 
! 
      write (*,*) 'Initialising array' 
      DO I=1,num_a 
        ARRAY(I) = REAL(I,KIND=2) 
      END DO 
      RETURN 
      END SUBROUTINE SUB_set 

      subroutine sub_print 
      USE ARRAY_STORAGE 
      integer*4 i 
! 
      write (*,*) 'First 10 values of array' 
      DO I=1,10 
        WRITE (*,*) ARRAY(I) 
      END DO 
      end subroutine sub_print 

      subroutine sub_deallocate 
      USE ARRAY_STORAGE 
! 
      write (*,*) 'Releasing array' 
      DEALLOCATE (ARRAY) 
      end subroutine sub_deallocate 

      subroutine test_allocate 
      USE ARRAY_STORAGE 
! 
      test_count = test_count+1 
      if (ALLOCATED (ARRAY) ) then 
         write (*,11) 'Test ',test_count,' ARRAY is allocated as ', size(array)*4./(2.**30),' gb at ',loc(array)
      else 
         write (*,11) 'Test ',test_count,' ARRAY is NOT allocated' 
      end if 
11    format (20x,a,i0,a,f0.3,a,b'--,---,---,--#') 
      end subroutine test_allocate 

      subroutine sub_resize_big (num_new)
      USE ARRAY_STORAGE 
      integer*4 num_new
!
      integer*4 i, p_old, p_new, ii
      integer*8 q_old, q_new, num_bits
! 
      write (*,*) 'Resizing array' 
!
      q_old = LOC (array) ; if (q_old < 0) Q_old = q_old + 2_4**32
      p_old = q_old
      write (*,*) 'core4(q_old)', core4(q_old)
!

JohnCampbell

Posts: 2526 Sydney

Back to Top

10 Apr 2013 3:49 #12016

      deallocate (array)
      write (*,*) 'core4(q_old)', core4(q_old)
!
      num_a = num_new  
      call sub_allocate (1)
!
      q_new = LOC (array) ; if (q_new < 0) Q_new = q_new + 2_4**32
      p_new = q_new
      write (*,*) Q_old, Q_new
      write (*,11) 'Moving array from ',Q_old,' to ',Q_new
11    format (a,b'--,---,---,--#',a,b'--,---,---,--#') 
!
      if (q_new == q_old) return
      if (Q_new < Q_old) then
         num_bits = 4
         do i = 1,num_a
            write (*,*) i, q_old, p_old, q_new, p_new
            ii = CORE4(P_old)
            write (*,*) 'ii = CORE4(P_old)', ii
            CORE4(p_new) = ii
            write (*,*) 'CORE4(p_new) = ii'
            CORE4(p_new) = CORE4(P_old)
            write (*,*) 'CORE4(p_new) = CORE4(P_old)'
            q_new = q_new + num_bits
            q_old = q_old + num_bits
            p_new = q_new
            p_old = q_old
         end do
      else
         num_bits = 4
         q_old = q_old + num_a*num_bits
         q_new = q_new + num_a*num_bits
         do i = 1,num_a
            q_new = q_new - num_bits
            q_old = q_old - num_bits
            p_new = q_new
            p_old = q_old
            CORE4(p_new) = CORE4(P_old)
!            DCORE8(p_new) = DCORE8(P_old)
         end do
      end if         
!
      end subroutine sub_resize_big

      subroutine sub_use (a, num)
      integer*4 num
      real*4    a(num,num)
!
      real*4, allocatable, dimension(:) :: b, c
      integer*4 i,j
!
      allocate ( b(num), c(num) )
      write (*,*) 'processing array', loc(b), loc(c)
!
      a = 0
        do i = 1,num
          b(i) = i*3-2
          call random_number (c(i))
        end do
!
      do j = 1,num/500
        do i = 1,num
          a(i,j) = dot_product (b, c) * c(i) / num
        end do
      end do
!
      end

      integer*8 function jloc (a)
      integer*8 j
      integer a
!
      j = loc (a)
      if (j < 0) J = J + 2_4**32
!
      jloc = j
      end

( It would be good if we could attach code examples as text files )

John

christyleomin

Posts: 155

Back to Top

7 Jun 2013 3:15 #12341

Hi John,

I tried compileing this code which you advised using ALLOCATABLE for large arrays.

However, it does not compile and gives em an error stating that:

Warning 1 In a call to JLOC from another procedure, the first argument was of type REAL(KIND=1), it is now INTEGER(KIND=3) C:\Users\localadmin\Desktop\FortranApplication1\FortranApplication1\FreeFormat1.F95 280 Error 2 FUNCTION 'JLOC' called with argument no 1 as a REAL(KIND=1) when a INTEGER(KIND=3) was expected (from SUBROUTINE 'ARRAY_STORAGE!SUB_REPORT2') LINK

I'm very curious to understand your approach...

christyleomin

Posts: 155

Back to Top

7 Jun 2013 5:09 #12344

John,

I solved the problem by declaring 'a' as real4 instead of integer4.

      integer*8 function jloc (a) 
      integer*8 j 
[b]      real*4 a [/b]! 
      j = loc (a) 
      if (j < 0) J = J + 2_4**32 
! 
      jloc = j 
      end

Is it possible to briefly explain what is the basic difference between allocating small and large arrays?Did your code above work without any problems or did you face any 'access violation errors'?

Christy

christyleomin

Posts: 155

Back to Top

8 Jun 2013 11:53 #12346

John, please can you explain what exactly you are trying to do in the routine sub_resize_big (num_new) ?

Gratefully appreciated...

Christy

JohnCampbell

Posts: 2526 Sydney

Back to Top

10 Jun 2013 11:53 #12353

Christy,

In sub_resize_big, I was trying to demonstrate resizing a large array in Win_32, but I think it was not a good solution. I was trying to retain the address of the old large array, to obtain the information from it. However, the problem is that when releasing the large array, the OS releases the memory from this process and so it is no longer available.

If you compile with no options you should get 3gb memory on Win7_64. /debug with a recent version of FTN95 also gives 3gb.

It is a good extension of memory capacity.

John

christyleomin

Posts: 155

Back to Top

10 Jun 2013 1:29 #12358

Thanks John. Ok, but with the help of the temporary array, you ahve achieved the resizing in sub_resize (num_new)

Right?

This does not seem a burden except an (just 'one' through the whole process) additioanl array-Isn't it?

You said:

In sub_resize_big, I was trying to demonstrate resizing a large array in Win_32, but I think it was not a good solution. I was trying to retain the address of the old large array, to obtain the information from it. However, the problem is that when releasing the large array, the OS releases the memory from this process and so it is no longer available.

Ok, that is whay you say it is not a good solution

If you compile with no options you should get 3gb memory on Win7_64. /debug with a recent version of FTN95 also gives 3gb.

I did not get thsi sentence- is this something different from what you said above?

Christy

JohnCampbell

Posts: 2526 Sydney

Back to Top

11 Jun 2013 12:26 #12362

Christy,

Sub_Resize does work, but you need to be able to store both the new and old array at the same time.

Sub_Resize_Big was an attempt to resize an array where the combination of old and new array was too big for the available memory. This approach did not work, for the reasons I described in the previous post. It relates to the memory pool being allocated to this process is removed when you DEALLOCATE the big array, so that memory can not be referenced.

You could always write out the old array to disk, resize and read in to the new smaller array, although this would probably defeat the purpose of the resize.

Being able to use up to 3.7gb of memory with FTN95 on 64-bit operating systems (Win 7_64) is a very good feature of FTN95, if you need more than 2gb of memory, although the maximum single array size is limited to 2gb. This option is not available if you use /check. Prior to FTN95 Ver 6.30, this option was not available when using /debug, but is now available. /debug is a very useful option that I use all the time, as it gives a trace-back if you have an error. ( For compute intensive loops, I try to locate them in a utility file or library, which I compile with /opt.) I would expect that 3.6gb would not be available in mixed language links. If you try the example I provided, compiling with /check, /debug or no compilation options, then you should see different memory available.

John

christyleomin

Posts: 155

Back to Top

11 Jun 2013 12:57 #12368

John,

Thanks for all this.

I'm just putting the conclusions of this whole discussion and it would be useful if you could correct;

If I use versions of FORTRAN older than FTN 95, maximum array size is 2GB (that could be allocated) and the memory availavle is also 2GB.

Hence, if I have an array of size 2GB and I want to resize- at one point of time I would have to store the 2GB array and the temporary array, this is not possible because the memory available is only 2GB (in versions of FORTRAN older than FTN 95).

2)If I use FTN 95, thoough I can allocate the maximum size of array as 2GB, I can use 3.6GB of memory-hence resizing may be possible if the combined size of the tempoarry array and old arary (i.e array to be resized) is less than 3.6GB.That is, in call arrays should not be usign memory grater than 3.6GB.

Am I right?

One question,

In your sub_resize we have an array named 'ARRAY' which is = 1.90 GB, in addition we ahve a temporary array which is 250 million size i.e each element of the array being 4 bytes that is = 0.95 GB.

Together, 0.95 +2 = 2.95 GB > 2GB. I tried this on W7 64 bit Fortarn 90 (not FTN 95) but it does it successfully. Why do you say that in Fortarn 90 we cqannot have an arary usign emmory graeter than 2GB?

Another point:

In your code num_a was = 500*million where million= 1000000

I got access to a Z800 HP computer in friends company where we tried to make num_a = 2000*million and weer successful in allocating as well as setting the array using FORTRAN 90.

Hence,I'm coming to believe that the largest array is a function of the amchine rather than FORTRAN evrsion.Please advise.

JohnCampbell

Posts: 2526 Sydney

Back to Top

12 Jun 2013 12:15 #12369

Christy,

If your other Fortran compiler is 32-bit, then to get in excess of 2gb, it must be implementing the /3gb operating system feature. Win 7_64 allows extending available memory above 2gb and with 64-bit OS there is more free memory between address 2gb and 4gb. Hence you get more than 3gb.

If you are using a 32-bit fortran compiler, I would check that you are not requesting an array larger than 2gb, as some (eg Lahey Ver 5.55) allocates have an integer overflow on the size and do not return an error in STAT= if selecting larger than 2gb. To confirm, you should check STAT= and then SIZE(array) to confirm you have obtained the array size you wanted.

3.6 gb in 2 arrays might not be possible, as that would require 2.0gb + 1.6gb arrays. As free memory is not contiguous, you will find that the second array might not be as big as 1.6 gb. You need to experiment with what you can get. The best strategy is to allocate the big arrays first, so that the smaller arrays don't take the big areas. I would doubt that you can reduce a 2gb array to 1.6gb on any 32-bit fortran compiler, by using a temporary array. Resizing very large arrays is probably not a good idea. Another issue about resizing, and also repeated ALLOCATE/DEALLOCATE is that it can lead to defragmentation of the available free memory pool, which can be another problem that is difficult to overcome.

If your other fortran compiler was a 64-bit compiler, then you can allocate arrays much larger than 2gb. BUT, you can only get arrays larger than 2gb via ALLOCATE and make sure that you do not get integer overflow calculating the size. Use INTEGER*8 to be sure. There is a new intrinsic function SIZEOF which returns the size in bytes. ( It would be good if this was available in FTN95, especially as KIND /= byte_size)

The key points are: FTN95 offers up to about 3.6 gb of memory for allocate arrays, although 2.0gb is the largest size you can get. This depends on what compiler options you use, but since Ver 6.30, it is available with /DEBUG. This can be a useful extension to memory capacity, before requiring a 64-bit compiler.

John

christyleomin

Posts: 155

Back to Top

12 Jun 2013 4:40 #12371

John,

Thanks a lot for all this again.

But FTN 95 is only available in 32 bit versions-isn't it?

I have also been experiencing that with 64 bit Fortran compiler, I have been able to allocate arrays much lalrger than 2GB.Yes-I have been using integer*8 in my experiements/investigations.

I'm investigating this as well- but do you have any idea about what can be the total size of all arrays (say array A, array B, array C,......) that can be allocated using ALLOCATABLE in 64 bit compiler versions? Is it related to the memory available or is it a function of compiler?

Christy

JohnCampbell

Posts: 2526 Sydney

Back to Top

12 Jun 2013 6:02 #12373

Christy,

The total size of arrays in 64-bit is not a straight forward question. I have run programs successfully with up to 24 gb, but the limit would be at least 128 gb, if that is even the limit. What you also need to consider is: All arrays larger than 2gb can only be defined via ALLOCATE. How much physical memory is installed. ( I have 12gb ) What is the size of the paging.sys file. ( mine is 24gb, using a SSD ) What is the limit for my pagefile.sys. Do I want my program to run in only physical memory. ( mostly yes ) Do I know how to program to optimise performance, if I use paging (virtual memory). ( You need to localise the memory usage so that page faults do not explode. ) My rule of thumb is to try and limit my memory size to about 80% of physical memory, allowing something for other proceses.

For Example: The following code is a bad approach for page faults: DO I=1,n DO j=1,n A(i,j) = ... END DO END DO

While the following sequential access is much better: DO j=1,n DO I=1,n A(i,j) = ... END DO END DO

However if you have: DO j=1,n DO I=1,n A(i,j) = B(i,j) + C(j,i) END DO END DO

You need to consider if B or C span more memory. An alternative might be (but not always): DO j=1,n row_c = C(j,:) DO I=1,n A(i,j) = B(i,j) + row_C(i) END DO END DO

These approaches also apply to FTN95_32 applications, where sequential use of memory is always preferred, as they improve the cache efficiency.

However, if it is a big data-set with few runs, it is often easier to code in-memory and try as best as possible to minimise the page faults.

There is a long history of sparse matrix techniques and 3.6gb can be a lot of memory to use.

John

dpannhorst

Posts: 164 Berlin, Germany

Back to Top

12 Jun 2013 1:07 #12377

John,

I think, the last summation could simply be written as (without any loop):

A = B + C

Also the initializations above could be written as:

A = 0. (or any other value)

Hopefully the compiler will use optimized code internally.

Regards,

Detlef

christyleomin

Posts: 155

Back to Top

12 Jun 2013 5:28 #12380

Can I ask something simple

Which is more advantageous in FOrtran

I store 10 arrays of diemnsions A(6,100000) OR
I store a 3D array (10,6,100000)

Thanks a lot

JohnCampbell

Posts: 2526 Sydney

Back to Top

13 Jun 2013 12:43 #12382

Detlef,

The point of my example was that, although A = B + C', having the inner loop where A & B are processed sequentially, but C is not, poses a dilemma as to how best to process this. If the computation was as simple as A = B + C', then providing Row_C might be only a marginal improvement ( each time you generate Row_C this demands a full memory scan of C), but if the computation was more complex within the inner loop, then the temporary row approach could be more significant. This can be applied to matrix multiplication. It all depends on the range of the I, J and K loops. The idea is to provide information for the calculation, so that memory is processed sequentially, or at least locally. { Eg C(1000000,10) could demand a lot of page faults to be processed, while if C’(10,1000000) was available, this would process much faster }

To take Christy, example of array_3d(10,6,100000), if this was processed as : do.i. do.j. do k = 1,100000 zz = fn (array_3D(i,j,k) this would be stepping through memory in steps of 60. Note 'zz = fn (...' is a general statement of processing this information and not just A = B + C

However, if it was: array_3d(100000,10,6), and if this was processed as : do.. do.. do k = 1,6 zz = fn (array_3D(i,j,k) this would be stepping through memory in steps of 1000000 and result in significantly more page faults as it was processed.

Christy, the more advantageous approach could be to process: array_3d(10,6,100000) do k = 1,100000 ! process information for each element k do j= .. do i = .. zz = fn (array_3D(i,j,k)

In this way memory is processed more sequentially, or at least locally.

A factor could be how often this group of loops is processed, as each pass needs the full amount of array_3d memory to be processed. When designing a data structure, the order of the subscripts can affect (paging) performance. However, often we process the data a number of different ways during it’s generation and use. Typically, you generate it once but use it many times, so you have to determine what order is effective most often. When going to virtual memory, what was previously a cacheing inefficiency can become a page fault nightmare. This is an issue for both 32-bit and 64-bit computation.

John