|
forums.silverfrost.com Welcome to the Silverfrost forums
|
View previous topic :: View next topic |
Author |
Message |
DanRRight
Joined: 10 Mar 2008 Posts: 2852 Location: South Pole, Antarctica
|
Posted: Fri Mar 08, 2024 11:39 am Post subject: Insufficient virtual stack with 64bits |
|
|
Got run-time error ( with 0.5TB RAM )
Insufficient virtual stack (FTN95 /VSTACK <MB-value>)
64bit also needs manual stack control ??? How to use it? |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8003 Location: Salford, UK
|
Posted: Fri Mar 08, 2024 12:08 pm Post subject: |
|
|
Dan
Can you send me the code for a program that compiles and runs except for this runtime error. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2852 Location: South Pole, Antarctica
|
Posted: Fri Mar 08, 2024 12:46 pm Post subject: |
|
|
Sending it could be problematic...And making a demo could be almost impossible. That error appeared when i tried to load really large file of 100GB or so which also might demand even more RAM ( i expect with 64bits compiler we now got all limits off and allowed automatic unlimited increases with the code decide by itself. It is not likely the code might demand more than 1TB). During next few days i will investigate what caused this. Before i was able to load ~300GB even having 5x less real+virtual memory |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8003 Location: Salford, UK
|
Posted: Fri Mar 08, 2024 3:04 pm Post subject: |
|
|
FTN95 currently has a maximum "virtual" stack size of 8GB. This can be reduced but not increased by using /VSTACK <MB-value> on the FTN95 command line. The maximum value is automatically reduced when there is limited physical memory available.
The FTN95 "virtual" stack is used for so-called automatic arrays and for temporary arrays created by the compiler to handle array sections that are not contiguous or not known to be whole arrays.
With 0.5TB of RAM there is effectively no physical limit for this virtual stack so the limit is currently 8GB.
I have made a note that this needs to be reviewed. |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2578 Location: Sydney
|
Posted: Sat Mar 09, 2024 2:39 am Post subject: |
|
|
Dan,
What is Vstack and how does it differ from stack or heap ??
I know about Stack and Heap, but what is Vstack and why is it limited to 8 GBytes, when Heap gets all virtual memory ?
I have recently come across this "VSTACK" limit when trying to write/read 8.5 GByte records to a binary file. ( this is an ongoing issue for solving "Fails to save arrays > 4GB"
I used " read (lu) vector(1:nn)"
The problem is that this array section initiates a temporary copy of vector(1:nn)
Paul indicates this when he stated "The FTN95 "virtual" stack is used for so-called automatic arrays and for temporary arrays created by the compiler to handle array sections that are not contiguous or not known to be whole arrays."
So the solution is to avoid these temproary copies of these very large array sections. Use ALLOCATE wherever possible !!
In the above case my solution was to replace the read statement with a "F77" wrapper :
Code: |
iostat = read_stream_vector ( lu, vector, nn )
if (iostat /= 0 ) exit
...
integer function read_stream_vector ( lu, vector, nn )
! reads fortran unformatted sequential access records using stream access
use timer_info
integer*4 :: lu
integer*8 :: nn
integer*4 :: vector(nn)
integer*8 :: four = 4, num_bytes
integer :: iostat
real*4 :: gbytes
num_bytes = nn*four
gbytes = real(num_bytes) / 2.**30
seconds = delta_seconds ()
!z read (lu,iostat=iostat) vector(1:nn) ! this fails above 8 GBytes
!z read (lu,iostat=iostat) (vector(k),k=1,nn) ! this is very slow
read (lu,iostat=iostat) vector ! this worked OK
gb_sec = GB_per_sec ( gbytes )
write (*,12) ' reading record nn = ',nn,' : iostat = ',iostat, gb_sec,' GB/sec'
12 format (5x,a,i0,a,i0, 2x,f0.3,a )
read_stream_vector = iostat
end function read_stream_vector
|
The alternative " read (lu) vector " solved the problem, as the compiler now knows this is a contiguous vector in memory ( which it previously did not identify from the array section )
The other two alternatives I have commented out crashed or were far too slow.
I hope to post more about these large records soon, but I am achieving over 7 GBytes per second read rates on a PCIe SSD ( although the file is probably in the memory disk buffers )
The following is a trace of write, then read testing a 16 GByte vector, although the write rates are only about 1 GBy/sec. ( 4 GByte vector write is over 2 GBy/sec )
Code: | TEST 5 : array size 4294967358 : record size 17179869432 bytes : 16.000 GBytes
Unformatted Sequential WRITE
Array nn = 4294967358
generating vector of 16.000 GBytes : stat = 0
writing record 1 : iostat = 0 1.280 GB/sec
writing record 2 : iostat = 0 0.846 GB/sec
Unformatted Sequential READ
Array nn = 4294967358
reading record 1 : iostat = 0 3.800 GB/sec
reading record 2 : iostat = 0 3.899 GB/sec
Stream Access Header READ
Array nn = 4294967358
Header type -2 L = 17179869432 V = 1 Iostat = 0 header OK
Stream Access Sequential READ
Array nn = 4294967358
Record Header type -2 : Size = 17179869432 bytes
reading record nn = 4294967358 : iostat = 0 7.555 GB/sec
reading record nn = 4294967358 : iostat = 0 8.026 GB/sec
|
This works with unformatted sequential read/write, as it uses a new header type -2 : a 9-byte header/trailer.
I will post more soon when Paul confirms this is supported in the released FTN95 Ver 9.0x compiler |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8003 Location: Salford, UK
|
Posted: Sat Mar 09, 2024 9:13 am Post subject: |
|
|
John
I think that all of this should work with version 9.02 that can be downloaded from the Support "Sticky Post".
This "virtual" stack is the one that is created and used by FTN95 for 64 bit automatic arrays and compiler generated temporary arrays. It currently has an upper limit size of 8GB but this will be reviewed.
The stack is is generated by a system call to VirtualAlloc and this call is built into the startup code for the user's executable by SLINK64. As a result this call is not currently visible in an /EXPLIST listing. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2852 Location: South Pole, Antarctica
|
Posted: Sat Mar 09, 2024 12:30 pm Post subject: |
|
|
I use 9.02
The place where i got this error is
Code: | dVolumeCell = XCellsize * dyCellsize * dZCellsize +1.d-30
DensityE3D(:,:,:) = DensityE3D(:,:,:) / dVolumeCell |
Dimensions 1280 x 1280 x 2000 = 3,276,800,000, or just 25GB.
It is almost like "640K which is good for everyone"
Paul, please remove this limit! There is no cellphones with 8GB already. And supercomputers use Petabytes of RAM |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2578 Location: Sydney
|
Posted: Sat Mar 09, 2024 1:52 pm Post subject: |
|
|
Dan,
could you try:
dVolumeCell = XCellsize * dyCellsize * dZCellsize +1.d-30
DensityE3D = DensityE3D / dVolumeCell
ie remove the "array section"
and let me know what happens ?
Also you could try
dVolumeCell = 1.0 / ( XC0ellsize * dyCellsize * dZCellsize +1.d-30 )
DensityE3D = DensityE3D * dVolumeCell
Do you enable avx instructions for this time consuming calculation.
Increasing Vstack size allocation might be a problem for others without with 0.5TB RAM. Increasing the Vstack size reduces the virtual memory pool address size, but perhaps not the virtual memory allocation ?
It would not be an issue for physical memory usage, but might reduce the available virtual memory size ?
I could check, as Gfortran puts bigger memory address strides for stack and heap, compared to FTN95. I have not identified the Vstack address in memory maps. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2852 Location: South Pole, Antarctica
|
Posted: Sun Mar 10, 2024 8:38 am Post subject: |
|
|
John,
1) What "problems for others" you are scaring here if i will increase stack just for myself (if this limit is ever needed)? Debugger will tell you about your problem place. If there will be a problem it will be your code problem not the compiler problem like now
2) I have also such places: how to eliminate array section here for example?
Code: | do k=2, nActualAtomicSpeciesPresent
DensitySpecies(:,:,:,1) = DensitySpecies(:,:,:,1) + DensitySpecies(:,:,:,k)
enddo |
3) Were AVX vector instructions included into FTN95? |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2578 Location: Sydney
|
Posted: Sun Mar 10, 2024 1:46 pm Post subject: |
|
|
Dan,
1) The available virtual memory (not address space) on the physical memory + paging space can sometimes be a limit on x64. This may not be a problem with a larger Vstack, as virtual memory is only allocated if the memory address ( memory pages) are set a value.
Something like "DensitySpecies(:,:,:,1) = 0" could allocate a lot of memory pages.
2) "F77 wrappers" are a great way to avoid temporary arrays.
The following could avoid the problem
Code: | Real :: DensitySpecies(ni,nj,nk,nz)
integer*8 :: num
num = ni*nj*nk
do k=2, nActualAtomicSpeciesPresent
call add_species_k ( DensitySpecies(1,1,1,k), DensitySpecies(1,1,1,1), num )
end do
...
subroutine add_species_k ( from, to, num )
real :: from(*), to(*)
integer*8 :: num, j
do j = 1,num
to(j) = to(j) + from(j)
end do
end subroutine add_species_k |
3) for AVX instructions see noteson64bitftn95.txt for more info.
You could replace
call add_species_k ( DensitySpecies(1,1,1,k), DensitySpecies(1,1,1,1), num )
with
Code: |
num = ni*nj*nk
do k=2, nActualAtomicSpeciesPresent
call axpy4@ ( DensitySpecies(1,1,1,1), DensitySpecies(1,1,1,k), num, 1.0 )
end do
|
Note :
1) num must be integer*8
2) If the vectors are large, you will still struggle with memory access speeds/bandwidth for AVX instruction speed. It may not scale up *32 for avx256, but should be considerably faster.
Let me know how it goes. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2852 Location: South Pole, Antarctica
|
Posted: Mon Mar 11, 2024 4:50 am Post subject: |
|
|
This is a club of workarounders. Besides mecej4 no one reports any problems and bugs. And suggestions you'll not hear from absolutely anyone. If the company would not move since FTN77 by itself, all would still actively making workarounds in F77. Even AVX is a workaround. Do you know what is needed to change in the code to include AVX with gFortran or Intel? Nothing, just add compilation switch
Do gFortran and Intel also have 8GB limit? |
|
Back to top |
|
|
PaulLaidler Site Admin
Joined: 21 Feb 2005 Posts: 8003 Location: Salford, UK
|
Posted: Mon Mar 11, 2024 8:58 am Post subject: |
|
|
FTN95 uses AVX in some contexts and no switch is required.
I have already said that this particular limit will be reviewed so I expect that it will at least be increased and possibly made configurable. Your request for no limit is uninformed.
Personally I find your comments disrespectful and hence counter productive because they could discourage others from using FTN95. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2852 Location: South Pole, Antarctica
|
Posted: Mon Mar 11, 2024 12:01 pm Post subject: |
|
|
Sorry, by my words you can feel that the rhetoric between two camps of penguins is heating up in my Antarctica so the ice is melting . By the way, it is considered pro- not counter productive, and usually encourage and not discourage.
Anyway, from my side i apologize if my words sound offending, because, to be 100% objective, the workarounds also could be useful sometimes.
gFortran:
Up to the full RAM memory + swap 933 GB no any limits were found |
|
Back to top |
|
|
JohnCampbell
Joined: 16 Feb 2006 Posts: 2578 Location: Sydney
|
Posted: Thu Mar 14, 2024 12:07 am Post subject: Re: |
|
|
DanRRight wrote: | gFortran:
Up to the full RAM memory + swap 933 GB no any limits were found |
What stack size are you selecting in Gfortran? Have you been able to exceed the 512 MByte limit I have assumed ?
I expect you have succeeded as Gfortran is not using temporary arrays.
Have you tested any of the strategies I suggested for avoiding temporary arrays with FTN95 ?
These are incredible memory sizes you have available ! It was not long ago that 933 GB disk files were unachievable !
When I went to 64 GBytes of installed memory I changed my disk files into allocatable derived type memory arrays.
Now performance is dictated by memory to cache transfer delays ! Unfortunately we can not allocate cache usage. |
|
Back to top |
|
|
DanRRight
Joined: 10 Mar 2008 Posts: 2852 Location: South Pole, Antarctica
|
Posted: Thu Mar 14, 2024 8:28 am Post subject: |
|
|
I am sure you've heard that no one already optimizes codes by hand anymore, compilers do that better than average programmer. Programmer has just to write clear code, not a spaghetti nightmare. And in Soviet Russia already the codes optimize programmers.
gFortran with -O3 -march=native switch optimizes this code to AVX speeds without any workarounds. It even optimizes to the same speeds your Fortran77 workaround above, the FTN95 non-standard AVX@ of course it could not swallow. And yes it goes without crashes, exhausting 0.5 TB RAM it takes Swap. No any stacks. FTN95 stops at 8 GB.
Code: | integer, parameter :: i=1000, j=1000, m=1000, n=1
Real, allocatable :: DensitySpecies(:,:,:,:)
integer*8 :: idim, nnn
k=1
do nn=0,7
nnn = 2**nn
idim = nnn * i * j * m * n
print*,'=====================', nn, nnn
write(*,'(A, 5i7)') 'Size GB, Size i,j,m,n=', 4*idim/1000000000, nnn*i,j,m,n
call cpu_time(t1)
allocate(DensitySpecies(nnn*i,j,m,n), stat=ierr )
if(ierr.ne.0) print*, '====ierr=', ierr
call cpu_time(t2)
print*,'Allocation time= ', t2-t1
DensitySpecies = 123
call cpu_time(t1)
DensitySpecies(:,:,:,1) = DensitySpecies(:,:,:,1) + DensitySpecies(:,:,:,k)
call cpu_time(t2)
print*,'END section :::, time= ', t2-t1
deallocate(DensitySpecies)
enddo
END |
There exists such song "This is California, Baby"...Here on the same block are AMD, Intel, Apple, Google, Western Digital etcetcetc, here server chips can be found almost on a city dump and beaches. I found few Genoas and made a supercomputer. Soon 3nm Turins will be on the dumps, will increase to 400 cores |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|