forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Insufficient virtual stack with 64bits
Goto page 1, 2  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> 64-bit
View previous topic :: View next topic  
Author Message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Fri Mar 08, 2024 11:39 am    Post subject: Insufficient virtual stack with 64bits Reply with quote

Got run-time error ( with 0.5TB RAM Smile )

Insufficient virtual stack (FTN95 /VSTACK <MB-value>)

64bit also needs manual stack control ??? How to use it?
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7924
Location: Salford, UK

PostPosted: Fri Mar 08, 2024 12:08 pm    Post subject: Reply with quote

Dan

Can you send me the code for a program that compiles and runs except for this runtime error.
Back to top
View user's profile Send private message AIM Address
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Fri Mar 08, 2024 12:46 pm    Post subject: Reply with quote

Sending it could be problematic...And making a demo could be almost impossible. That error appeared when i tried to load really large file of 100GB or so which also might demand even more RAM ( i expect with 64bits compiler we now got all limits off and allowed automatic unlimited increases with the code decide by itself. It is not likely the code might demand more than 1TB). During next few days i will investigate what caused this. Before i was able to load ~300GB even having 5x less real+virtual memory
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7924
Location: Salford, UK

PostPosted: Fri Mar 08, 2024 3:04 pm    Post subject: Reply with quote

FTN95 currently has a maximum "virtual" stack size of 8GB. This can be reduced but not increased by using /VSTACK <MB-value> on the FTN95 command line. The maximum value is automatically reduced when there is limited physical memory available.

The FTN95 "virtual" stack is used for so-called automatic arrays and for temporary arrays created by the compiler to handle array sections that are not contiguous or not known to be whole arrays.

With 0.5TB of RAM there is effectively no physical limit for this virtual stack so the limit is currently 8GB.

I have made a note that this needs to be reviewed.
Back to top
View user's profile Send private message AIM Address
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Sat Mar 09, 2024 2:39 am    Post subject: Reply with quote

Dan,

What is Vstack and how does it differ from stack or heap ??

I know about Stack and Heap, but what is Vstack and why is it limited to 8 GBytes, when Heap gets all virtual memory ?

I have recently come across this "VSTACK" limit when trying to write/read 8.5 GByte records to a binary file. ( this is an ongoing issue for solving "Fails to save arrays > 4GB"

I used " read (lu) vector(1:nn)"

The problem is that this array section initiates a temporary copy of vector(1:nn)

Paul indicates this when he stated "The FTN95 "virtual" stack is used for so-called automatic arrays and for temporary arrays created by the compiler to handle array sections that are not contiguous or not known to be whole arrays."

So the solution is to avoid these temproary copies of these very large array sections. Use ALLOCATE wherever possible !!

In the above case my solution was to replace the read statement with a "F77" wrapper :
Code:

       iostat = read_stream_vector ( lu, vector, nn )
         if (iostat /= 0 ) exit
...
    integer function read_stream_vector ( lu, vector, nn )

   !  reads fortran unformatted sequential access records using stream access

     use timer_info
     integer*4 :: lu
     integer*8 :: nn
     integer*4 :: vector(nn)

     integer*8 :: four = 4, num_bytes
     integer   :: iostat
     real*4    :: gbytes

     num_bytes = nn*four
     gbytes    = real(num_bytes) / 2.**30

       seconds = delta_seconds ()
!z       read (lu,iostat=iostat) vector(1:nn)          ! this fails above 8 GBytes
!z       read (lu,iostat=iostat) (vector(k),k=1,nn)    ! this is very slow
     read (lu,iostat=iostat) vector                    ! this worked OK

     gb_sec  = GB_per_sec ( gbytes )
     write (*,12) '  reading record  nn = ',nn,' : iostat = ',iostat, gb_sec,' GB/sec'
  12 format (5x,a,i0,a,i0, 2x,f0.3,a  )   

     read_stream_vector = iostat

    end function read_stream_vector


The alternative " read (lu) vector " solved the problem, as the compiler now knows this is a contiguous vector in memory ( which it previously did not identify from the array section )
The other two alternatives I have commented out crashed or were far too slow.

I hope to post more about these large records soon, but I am achieving over 7 GBytes per second read rates on a PCIe SSD ( although the file is probably in the memory disk buffers )

The following is a trace of write, then read testing a 16 GByte vector, although the write rates are only about 1 GBy/sec. ( 4 GByte vector write is over 2 GBy/sec )

Code:
 TEST 5 : array size 4294967358 : record size 17179869432 bytes : 16.000 GBytes

Unformatted Sequential WRITE
   Array nn = 4294967358
      generating vector of 16.000 GBytes : stat = 0
      writing record 1 : iostat = 0  1.280 GB/sec
      writing record 2 : iostat = 0  0.846 GB/sec

Unformatted Sequential READ
   Array nn = 4294967358
      reading record 1 : iostat = 0  3.800 GB/sec
      reading record 2 : iostat = 0  3.899 GB/sec

Stream Access Header READ
   Array nn = 4294967358
      Header type -2 L = 17179869432 V = 1 Iostat = 0  header OK

Stream Access Sequential READ
   Array nn = 4294967358
       Record Header type -2 : Size = 17179869432 bytes
       reading record  nn = 4294967358 : iostat = 0  7.555 GB/sec
       reading record  nn = 4294967358 : iostat = 0  8.026 GB/sec

This works with unformatted sequential read/write, as it uses a new header type -2 : a 9-byte header/trailer.
I will post more soon when Paul confirms this is supported in the released FTN95 Ver 9.0x compiler
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7924
Location: Salford, UK

PostPosted: Sat Mar 09, 2024 9:13 am    Post subject: Reply with quote

John

I think that all of this should work with version 9.02 that can be downloaded from the Support "Sticky Post".

This "virtual" stack is the one that is created and used by FTN95 for 64 bit automatic arrays and compiler generated temporary arrays. It currently has an upper limit size of 8GB but this will be reviewed.

The stack is is generated by a system call to VirtualAlloc and this call is built into the startup code for the user's executable by SLINK64. As a result this call is not currently visible in an /EXPLIST listing.
Back to top
View user's profile Send private message AIM Address
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Sat Mar 09, 2024 12:30 pm    Post subject: Reply with quote

I use 9.02
The place where i got this error is

Code:
    dVolumeCell = XCellsize * dyCellsize  * dZCellsize +1.d-30
    DensityE3D(:,:,:) = DensityE3D(:,:,:) / dVolumeCell


Dimensions 1280 x 1280 x 2000 = 3,276,800,000, or just 25GB.

It is almost like "640K which is good for everyone"

Paul, please remove this limit! There is no cellphones with 8GB already. And supercomputers use Petabytes of RAM
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Sat Mar 09, 2024 1:52 pm    Post subject: Reply with quote

Dan,

could you try:
dVolumeCell = XCellsize * dyCellsize * dZCellsize +1.d-30
DensityE3D = DensityE3D / dVolumeCell
ie remove the "array section"

and let me know what happens ?

Also you could try

dVolumeCell = 1.0 / ( XC0ellsize * dyCellsize * dZCellsize +1.d-30 )
DensityE3D = DensityE3D * dVolumeCell
Do you enable avx instructions for this time consuming calculation.

Increasing Vstack size allocation might be a problem for others without with 0.5TB RAM. Increasing the Vstack size reduces the virtual memory pool address size, but perhaps not the virtual memory allocation ?

It would not be an issue for physical memory usage, but might reduce the available virtual memory size ?

I could check, as Gfortran puts bigger memory address strides for stack and heap, compared to FTN95. I have not identified the Vstack address in memory maps.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Sun Mar 10, 2024 8:38 am    Post subject: Reply with quote

John,
1) What "problems for others" you are scaring here if i will increase stack just for myself (if this limit is ever needed)? Debugger will tell you about your problem place. If there will be a problem it will be your code problem not the compiler problem like now

2) I have also such places: how to eliminate array section here for example?

Code:
      do k=2, nActualAtomicSpeciesPresent
        DensitySpecies(:,:,:,1)    = DensitySpecies(:,:,:,1) + DensitySpecies(:,:,:,k)
      enddo


3) Were AVX vector instructions included into FTN95?
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Sun Mar 10, 2024 1:46 pm    Post subject: Reply with quote

Dan,

1) The available virtual memory (not address space) on the physical memory + paging space can sometimes be a limit on x64. This may not be a problem with a larger Vstack, as virtual memory is only allocated if the memory address ( memory pages) are set a value.
Something like "DensitySpecies(:,:,:,1) = 0" could allocate a lot of memory pages.

2) "F77 wrappers" are a great way to avoid temporary arrays.
The following could avoid the problem
Code:
   Real :: DensitySpecies(ni,nj,nk,nz)
   integer*8 :: num

  num = ni*nj*nk
 do k=2, nActualAtomicSpeciesPresent
  call add_species_k ( DensitySpecies(1,1,1,k), DensitySpecies(1,1,1,1), num )
 end do
...
  subroutine add_species_k ( from, to, num )
   real :: from(*), to(*)
   integer*8 :: num, j
    do j = 1,num
      to(j) = to(j) + from(j)
    end do
  end subroutine add_species_k


3) for AVX instructions see noteson64bitftn95.txt for more info.

You could replace
call add_species_k ( DensitySpecies(1,1,1,k), DensitySpecies(1,1,1,1), num )
with
Code:

  num = ni*nj*nk
 do k=2, nActualAtomicSpeciesPresent
   call axpy4@ ( DensitySpecies(1,1,1,1), DensitySpecies(1,1,1,k), num, 1.0 )
 end do

Note :
1) num must be integer*8
2) If the vectors are large, you will still struggle with memory access speeds/bandwidth for AVX instruction speed. It may not scale up *32 for avx256, but should be considerably faster.

Let me know how it goes.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Mon Mar 11, 2024 4:50 am    Post subject: Reply with quote

This is a club of workarounders. Besides mecej4 no one reports any problems and bugs. And suggestions you'll not hear from absolutely anyone. If the company would not move since FTN77 by itself, all would still actively making workarounds in F77. Even AVX is a workaround. Smile Smile Smile Do you know what is needed to change in the code to include AVX with gFortran or Intel? Nothing, just add compilation switch

Do gFortran and Intel also have 8GB limit?
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 7924
Location: Salford, UK

PostPosted: Mon Mar 11, 2024 8:58 am    Post subject: Reply with quote

FTN95 uses AVX in some contexts and no switch is required.

I have already said that this particular limit will be reviewed so I expect that it will at least be increased and possibly made configurable. Your request for no limit is uninformed.

Personally I find your comments disrespectful and hence counter productive because they could discourage others from using FTN95.
Back to top
View user's profile Send private message AIM Address
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Mon Mar 11, 2024 12:01 pm    Post subject: Reply with quote

Sorry, by my words you can feel that the rhetoric between two camps of penguins is heating up in my Antarctica so the ice is melting Smile. By the way, it is considered pro- not counter productive, and usually encourage and not discourage.
Anyway, from my side i apologize if my words sound offending, because, to be 100% objective, the workarounds also could be useful sometimes.

gFortran:
Up to the full RAM memory + swap 933 GB no any limits were found
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2554
Location: Sydney

PostPosted: Thu Mar 14, 2024 12:07 am    Post subject: Re: Reply with quote

DanRRight wrote:
gFortran:
Up to the full RAM memory + swap 933 GB no any limits were found


What stack size are you selecting in Gfortran? Have you been able to exceed the 512 MByte limit I have assumed ?

I expect you have succeeded as Gfortran is not using temporary arrays.

Have you tested any of the strategies I suggested for avoiding temporary arrays with FTN95 ?

These are incredible memory sizes you have available ! It was not long ago that 933 GB disk files were unachievable !
When I went to 64 GBytes of installed memory I changed my disk files into allocatable derived type memory arrays.
Now performance is dictated by memory to cache transfer delays ! Unfortunately we can not allocate cache usage.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2816
Location: South Pole, Antarctica

PostPosted: Thu Mar 14, 2024 8:28 am    Post subject: Reply with quote

I am sure you've heard that no one already optimizes codes by hand anymore, compilers do that better than average programmer. Programmer has just to write clear code, not a spaghetti nightmare. And in Soviet Russia already the codes optimize programmers. Smile

gFortran with -O3 -march=native switch optimizes this code to AVX speeds without any workarounds. It even optimizes to the same speeds your Fortran77 workaround above, the FTN95 non-standard AVX@ of course it could not swallow. And yes it goes without crashes, exhausting 0.5 TB RAM it takes Swap. No any stacks. FTN95 stops at 8 GB.
Code:
integer, parameter :: i=1000, j=1000, m=1000, n=1
Real, allocatable :: DensitySpecies(:,:,:,:)
integer*8 :: idim, nnn

k=1
do nn=0,7
  nnn = 2**nn
  idim = nnn * i * j * m * n

  print*,'=====================', nn, nnn
  write(*,'(A, 5i7)') 'Size GB, Size i,j,m,n=', 4*idim/1000000000, nnn*i,j,m,n

  call cpu_time(t1)
  allocate(DensitySpecies(nnn*i,j,m,n), stat=ierr )
  if(ierr.ne.0) print*, '====ierr=', ierr
  call cpu_time(t2)
  print*,'Allocation time= ', t2-t1

  DensitySpecies = 123

  call cpu_time(t1)
  DensitySpecies(:,:,:,1)    = DensitySpecies(:,:,:,1) + DensitySpecies(:,:,:,k)
  call cpu_time(t2)
  print*,'END section :::, time= ', t2-t1

  deallocate(DensitySpecies)
enddo

END




There exists such song "This is California, Baby"...Here on the same block are AMD, Intel, Apple, Google, Western Digital etcetcetc, here server chips can be found almost on a city dump and beaches. I found few Genoas and made a supercomputer. Soon 3nm Turins will be on the dumps, will increase to 400 cores
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> 64-bit All times are GMT + 1 Hour
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group