forums.silverfrost.com

JohnCampbell · Joined: 16 Feb 2006 Posts: 2623 Location: Sydney

Christy,

Sub_Resize does work, but you need to be able to store both the new and old array at the same time.

Sub_Resize_Big was an attempt to resize an array where the combination of old and new array was too big for the available memory. This approach did not work, for the reasons I described in the previous post. It relates to the memory pool being allocated to this process is removed when you DEALLOCATE the big array, so that memory can not be referenced.

You could always write out the old array to disk, resize and read in to the new smaller array, although this would probably defeat the purpose of the resize.

Being able to use up to 3.7gb of memory with FTN95 on 64-bit operating systems (Win 7_64) is a very good feature of FTN95, if you need more than 2gb of memory, although the maximum single array size is limited to 2gb.
This option is not available if you use /check.
Prior to FTN95 Ver 6.30, this option was not available when using /debug, but is now available.
/debug is a very useful option that I use all the time, as it gives a trace-back if you have an error. ( For compute intensive loops, I try to locate them in a utility file or library, which I compile with /opt.)
I would expect that 3.6gb would not be available in mixed language links.
If you try the example I provided, compiling with /check, /debug or no compilation options, then you should see different memory available.

John

christyleomin · Joined: 08 Apr 2011 Posts: 155

John,

Thanks for all this.

I'm just putting the conclusions of this whole discussion and it would be useful if you could correct;

1) If I use versions of FORTRAN older than FTN 95, maximum array size is 2GB (that could be allocated) and the memory availavle is also 2GB.

Hence, if I have an array of size 2GB and I want to resize- at one point of time I would have to store the 2GB array and the temporary array, this is not possible because the memory available is only 2GB (in versions of FORTRAN older than FTN 95).

2)If I use FTN 95, thoough I can allocate the maximum size of array as 2GB, I can use 3.6GB of memory-hence resizing may be possible if the combined size of the tempoarry array and old arary (i.e array to be resized) is less than 3.6GB.That is, in call arrays should not be usign memory grater than 3.6GB.

Am I right?

One question,

3) In your sub_resize we have an array named "ARRAY" which is = 1.90 GB, in addition we ahve a temporary array which is 250 million size i.e each element of the array being 4 bytes that is = 0.95 GB.

Together, 0.95 +2 = 2.95 GB > 2GB. I tried this on W7 64 bit Fortarn 90 (not FTN 95) but it does it successfully. Why do you say that in Fortarn 90 we cqannot have an arary usign emmory graeter than 2GB?

Another point:

In your code num_a was = 500*million where million= 1000000

I got access to a Z800 HP computer in friends company where we tried to make num_a = 2000*million and weer successful in allocating as well as setting the array using FORTRAN 90.

Hence,I'm coming to believe that the largest array is a function of the amchine rather than FORTRAN evrsion.Please advise.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2623 Location: Sydney

Christy,

If your other Fortran compiler is 32-bit, then to get in excess of 2gb, it must be implementing the /3gb operating system feature. Win 7_64 allows extending available memory above 2gb and with 64-bit OS there is more free memory between address 2gb and 4gb. Hence you get more than 3gb.

If you are using a 32-bit fortran compiler, I would check that you are not requesting an array larger than 2gb, as some (eg Lahey Ver 5.55) allocates have an integer overflow on the size and do not return an error in STAT= if selecting larger than 2gb.
To confirm, you should check STAT= and then SIZE(array) to confirm you have obtained the array size you wanted.

3.6 gb in 2 arrays might not be possible, as that would require 2.0gb + 1.6gb arrays. As free memory is not contiguous, you will find that the second array might not be as big as 1.6 gb. You need to experiment with what you can get. The best strategy is to allocate the big arrays first, so that the smaller arrays don't take the big areas.
I would doubt that you can reduce a 2gb array to 1.6gb on any 32-bit fortran compiler, by using a temporary array. Resizing very large arrays is probably not a good idea.
Another issue about resizing, and also repeated ALLOCATE/DEALLOCATE is that it can lead to defragmentation of the available free memory pool, which can be another problem that is difficult to overcome.

If your other fortran compiler was a 64-bit compiler, then you can allocate arrays much larger than 2gb.
BUT, you can only get arrays larger than 2gb via ALLOCATE and make sure that you do not get integer overflow calculating the size. Use INTEGER*8 to be sure. There is a new intrinsic function SIZEOF which returns the size in bytes. ( It would be good if this was available in FTN95, especially as KIND /= byte_size)

The key points are:
FTN95 offers up to about 3.6 gb of memory for allocate arrays, although 2.0gb is the largest size you can get.
This depends on what compiler options you use, but since Ver 6.30, it is available with /DEBUG.
This can be a useful extension to memory capacity, before requiring a 64-bit compiler.

John

christyleomin · Joined: 08 Apr 2011 Posts: 155

John,

Thanks a lot for all this again.

But FTN 95 is only available in 32 bit versions-isn't it?

I have also been experiencing that with 64 bit Fortran compiler, I have been able to allocate arrays much lalrger than 2GB.Yes-I have been using integer*8 in my experiements/investigations.

I'm investigating this as well- but do you have any idea about what can be the total size of all arrays (say array A, array B, array C,......) that can be allocated using ALLOCATABLE in 64 bit compiler versions? Is it related to the memory available or is it a function of compiler?

Christy

JohnCampbell · Joined: 16 Feb 2006 Posts: 2623 Location: Sydney

Christy,

The total size of arrays in 64-bit is not a straight forward question.
I have run programs successfully with up to 24 gb, but the limit would be at least 128 gb, if that is even the limit.
What you also need to consider is:
All arrays larger than 2gb can only be defined via ALLOCATE.
How much physical memory is installed. ( I have 12gb )
What is the size of the paging.sys file. ( mine is 24gb, using a SSD )
What is the limit for my pagefile.sys.
Do I want my program to run in only physical memory. ( mostly yes )
Do I know how to program to optimise performance, if I use paging (virtual memory). ( You need to localise the memory usage so that page faults do not explode. )
My rule of thumb is to try and limit my memory size to about 80% of physical memory, allowing something for other proceses.

For Example:
The following code is a bad approach for page faults:
DO I=1,n
DO j=1,n
A(i,j) = ...
END DO
END DO

While the following sequential access is much better:
DO j=1,n
DO I=1,n
A(i,j) = ...
END DO
END DO

However if you have:
DO j=1,n
DO I=1,n
A(i,j) = B(i,j) + C(j,i)
END DO
END DO

You need to consider if B or C span more memory.
An alternative might be (but not always):
DO j=1,n
row_c = C(j,:)
DO I=1,n
A(i,j) = B(i,j) + row_C(i)
END DO
END DO

These approaches also apply to FTN95_32 applications, where sequential use of memory is always preferred, as they improve the cache efficiency.

However, if it is a big data-set with few runs, it is often easier to code in-memory and try as best as possible to minimise the page faults.

There is a long history of sparse matrix techniques and 3.6gb can be a lot of memory to use.

John

dpannhorst · Joined: 29 Aug 2005 Posts: 165 Location: Berlin, Germany

John,

I think, the last summation could simply be written as (without any loop):

A = B + C

Also the initializations above could be written as:

A = 0. (or any other value)

Hopefully the compiler will use optimized code internally.

Regards,

Detlef

christyleomin · Joined: 08 Apr 2011 Posts: 155

Can I ask something simple

Which is more advantageous in FOrtran

1) I store 10 arrays of diemnsions A(6,100000)
OR
2) I store a 3D array (10,6,100000)

Thanks a lot

JohnCampbell · Joined: 16 Feb 2006 Posts: 2623 Location: Sydney

Detlef,

The point of my example was that, although A = B + C', having the inner loop where A & B are processed sequentially, but C is not, poses a dilemma as to how best to process this.
If the computation was as simple as A = B + C', then providing Row_C might be only a marginal improvement ( each time you generate Row_C this demands a full memory scan of C), but if the computation was more complex within the inner loop, then the temporary row approach could be more significant. This can be applied to matrix multiplication. It all depends on the range of the I, J and K loops.
The idea is to provide information for the calculation, so that memory is processed sequentially, or at least locally. { Eg C(1000000,10) could demand a lot of page faults to be processed, while if C’(10,1000000) was available, this would process much faster }

To take Christy, example of array_3d(10,6,100000), if this was processed as :
do.i.
do.j.
do k = 1,100000
zz = fn (array_3D(i,j,k)
this would be stepping through memory in steps of 60.
Note "zz = fn (..." is a general statement of processing this information and not just A = B + C

However, if it was:
array_3d(100000,10,6), and if this was processed as :
do..
do..
do k = 1,6
zz = fn (array_3D(i,j,k)
this would be stepping through memory in steps of 1000000 and result in significantly more page faults as it was processed.

Christy, the more advantageous approach could be to process:
array_3d(10,6,100000)
do k = 1,100000
! process information for each element k
do j= ..
do i = ..
zz = fn (array_3D(i,j,k)

In this way memory is processed more sequentially, or at least locally.

A factor could be how often this group of loops is processed, as each pass needs the full amount of array_3d memory to be processed.
When designing a data structure, the order of the subscripts can affect (paging) performance. However, often we process the data a number of different ways during it’s generation and use. Typically, you generate it once but use it many times, so you have to determine what order is effective most often.
When going to virtual memory, what was previously a cacheing inefficiency can become a page fault nightmare. This is an issue for both 32-bit and 64-bit computation.

John