forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Bug in SCC 3.88
Goto page Previous  1, 2, 3, 4
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2551
Location: Sydney

PostPosted: Sun May 16, 2021 6:20 am    Post subject: Reply with quote

Dan,

I know it is a long time since you asked
"the reading speed drops twice from 4 GB/s to 2GB/s. Any ideas why?"

I think it is because you are using CPU time, rather than elapsed time.

Anyway, I modified your code to :
also report elapsed time ,
use ALLOCATE to remove stack problem, and
report times for initialising and testing the arrays.

I would be interested if you still get similar performance with the new NVMe PCIe 4 "haddrives"
Code:
! compilation for 32bit case :
!
!  Revisions:
!   use ALLOCATE arrays to use HEAP instead of STACK
!   include Elapse_Time > SYSTEM_CLOCK to report actual run time
!   scan C == A to show a minimal processing time
!
! ftn95 unfRead.f95 /opt /set_error_level error 298 /no_truncate /nocheck /silent >equiv_FTN95
! slink unfRead.obj /stack:1200000000
!
  integer, parameter :: ix = 18, iy=800000, iz=20
!!  integer, parameter :: BSIZ = 4 * ix * iy * iz

!zz real*4 A(ix,iy,iz), C(ix,iy,iz)
  real*4, allocatable, dimension(:,:,:) ::  A, C

!  integer*2          :: hndl, ecode
!  integer*8          :: nbytes = BSIZ, nread
  integer            :: iostat, iix,iiy,iiz, ne
  real*4             :: t1,t2, e1,e2, Mb

! adding or removing this line changes nothing
! equivalence (A,Buf)     ! converts initial real*4 data to character*1
! **** Removing this useless line decreases reading speed 2x ****
!z  equivalence (C,bufRead) ! converts loaded  character*1 data real*4

  Mb = 4.*real(ix)*real(iy)*real(iz) / (1024.**2)  ! MBytes

  call elapse_time (e1)
  allocate ( A(ix,iy,iz), stat=iostat )
   write (*,10) 'A',iostat,size(A), Mb
  allocate ( C(ix,iy,iz), stat=iostat )
   write (*,10) 'C',iostat,size(C), Mb
 10 format (a,' allocated : STAT= ',i5,' Size =',i10,' bytes', f8.2,' MBytes')
!............. setting up data ...............

!$OMP PARALLEL DO PRIVATE (iix,iiy,iiz) SHARED (A)
  do iiz=1,iz
   do iiy=1,iy
    do iix=1,ix
     A(iix,iiy,iiz) = iix
    enddo
   enddo
  enddo
!$OMP END PARALLEL DO

  call elapse_time (e2)
  Write (*,11) 'A initialised : A=',A(1,1,1), A(2,1,1), (e2-e1)
  write (*,13) Mb/max(1.e-6,(e2-e1)),' MB/sec'
 11 format (/a,2f6.0, f8.3,' sec')

!............writing to disk..................

  OPEN (UNIT=275,FILE='Y.bin',STATUS='unknown',FORM='UNFORMATTED',err=990)

  call cpu_time(t1)
  call elapse_time (e1)

  write (275) A            !   WRITE A to file

  call cpu_time(t2)
  call elapse_time (e2)
  close (275)

  write (*,12) 'Time for writing file of size MB: ',Mb, t2-t1, e2-e1,' sec'
  write (*,13) Mb/max(1.e-6,(t2-t1)),' MB/sec'
  write (*,13) Mb/max(1.e-6,(e2-e1)),' MB/sec'
 12 format (/A,2x,f7.0, 2F7.3,A)
 13 format ('Estimated throughput = ',6x,F6.0,A)

! ...........reading from disk................

  OPEN (UNIT=257,FILE='Y.bin',STATUS='old',FORM='UNFORMATTED',err=995)

  call cpu_time(t1)
  call elapse_time (e1)

  read (257) C             !   READ C from file

  call cpu_time(t2)
  call elapse_time (e2)
  close (257)

  write (*,12) 'Time for reading file of size MB: ',Mb, t2-t1, e2-e1,' sec'
  write (*,13) Mb/max(1.e-6,(t2-t1)),' MB/sec'
  write (*,13) Mb/max(1.e-6,(e2-e1)),' MB/sec'

! ...........testing for errors................


Last edited by JohnCampbell on Sun May 16, 2021 6:42 am; edited 1 time in total
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2551
Location: Sydney

PostPosted: Sun May 16, 2021 6:21 am    Post subject: Reply with quote

Code:
! ...........testing for errors................

  call elapse_time (e1)
  ne = 0
!$OMP PARALLEL DO PRIVATE (iix,iiy,iiz) SHARED (A,C) REDUCTION(+ : ne)
  do iiz=1,iz
   do iiy=1,iy
    do iix=1,ix
     if ( A(iix,iiy,iiz) /= C(iix,iiy,iiz) ) ne = ne+1
    enddo
   enddo
  enddo
!$OMP END PARALLEL DO
  call elapse_time (e2)
  write (*,11) 'Checking if C == A', C(1,1,1), C(2,1,1), e2-e1
  write (*,*) ne,' errors'
  write (*,13) Mb/max(1.e-6,(e2-e1)),' MB/sec'
  goto 10000

!................. errors ......................
990 Print*, 'Error opening file Y.BIN for writing'
goto 10000
995 Print*, 'Error opening file Y.BIN for read'
goto 10000
 
10000 continue
end

 subroutine elapse_time (sec)
   real*4    :: sec
   integer*8 :: clock, rate
   call system_clock (clock, rate)
   sec = dble(clock) / dble(rate)
 end subroutine elapse_time
 


The option is identified to use mult-threading to improve processing performance outside the I/O, but there are still limitations on what can be gained from the high rates of SSD.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Sun May 16, 2021 8:28 am    Post subject: Reply with quote

1) Your program

RAMDRIVE
------------
A allocated : STAT= 0 Size = 288000000 bytes 1098.63 MBytes
C allocated : STAT= 0 Size = 288000000 bytes 1098.63 MBytes

A initialised : A= 1. 2. 0.797 sec
Estimated throughput = 1379. MB/sec

Time for writing file of size MB: 1099. 0.375 0.391 sec
Estimated throughput = 2930. MB/sec
Estimated throughput = 2813. MB/sec

Time for reading file of size MB: 1099. 0.328 0.344 sec
Estimated throughput = 3348. MB/sec
Estimated throughput = 3196. MB/sec

Checking if C == A 1. 2. 0.672 sec
0 errors
Estimated throughput = 1635. MB/sec



NVMe
-----------
A allocated : STAT= 0 Size = 288000000 bytes 1098.63 MBytes
C allocated : STAT= 0 Size = 288000000 bytes 1098.63 MBytes

A initialised : A= 1. 2. 0.813 sec
Estimated throughput = 1352. MB/sec

Time for writing file of size MB: 1099. 0.609 0.609 sec
Estimated throughput = 1803. MB/sec
Estimated throughput = 1803. MB/sec

Time for reading file of size MB: 1099. 0.344 0.344 sec
Estimated throughput = 3196. MB/sec
Estimated throughput = 3196. MB/sec

Checking if C == A 1. 2. 0.688 sec
0 errors
Estimated throughput = 1598. MB/sec

2) The previous program (with FTN95-specific READF@/WRITEF@ instead if standard READ/WRITE) shows 2x larger read speeds. Both timers, mine or yours show the same time for RAMDRIVE.

A= 1.00000 2.00000
Time for writing file of size MB : 320 0.109 s
Estimated throughput = 2926. MB/s
Time for writing file of size MB2: 320 0.109 s
Estimated throughput = 2926. MB/s

Time for reading file of size MB: 320 0.047 s
Estimated throughput = 6827. MB/s
Time for reading file of size MB: 320 0.047 s
Estimated throughput = 6827. MB/s
Checking if C=A 1.00000 2.00000

but very different write speed for NVMe

A= 1.00000 2.00000
Time for writing file of size MB : 320 0.078 s
Estimated throughput = 4096. MB/s
Time for writing file of size MB2: 320 0.188 s
Estimated throughput = 1707. MB/s

Time for reading file of size MB: 320 0.047 s
Estimated throughput = 6827. MB/s
Time for reading file of size MB: 320 0.047 s
Estimated throughput = 6827. MB/s
Checking if C=A 1.00000 2.00000

RAMdrive speed is possible to overclock 1.5 times though


Last edited by DanRRight on Sun May 16, 2021 6:08 pm; edited 1 time in total
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2551
Location: Sydney

PostPosted: Sun May 16, 2021 1:04 pm    Post subject: Reply with quote

Dan,

You are achieving much higher throughput values than I am getting, even for SSD drives. The O/S file buffers and SSD buffers may be helping (which is good but confusing)

Did you change the other tests to elapse_second ?

I think the performance rate for checking is important, as that is a data handling rate for the most trivial test.

Most large files I read are text files where the data is comma or space seperated survey data. The first processing is a data validation phase, where I check if the data value is outside an acceptable range; to be omitted from the data set if invalid. (64-bit certainly helps with being able to store all the data in memory for validation and then later use.)
I actually go through a process of "multiple data assessments", as I better understand what is rejected data vs what is significant outliers and refine the way I adjust then use the data.

If your main use is with HDF5 files, which I guess is a structured binary file, then you are dealing with much larger data sets, which is a problem I have not addressed. It certainly appears to be that your NVMe drive is good for your projects.

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Sun May 16, 2021 9:19 pm    Post subject: Reply with quote

Yes, i added second timer like you have done. Both mostly show the same except one case (see 1707 number?)

I can not find anyone who succeeded to read HDF5 files in Fortran. There was one who is now retired. There exist HDF5 libraries for couple other Fortran compilers. I'd be also happy if someone with any other compilers for Windows (better with Fortran for faster writing) created small tool which will read HDF5 file in its language and transform it to usual binary file. HDF group made some handy tool HDFdump which is doing that but it is doing that with 1/3000 speed because is written in C++. Our people made such tool using Python which shows same superslow speeds. I am screwed up so much that in some cases just the read goes the entire day. If FTN95 could read HDF5 files this would take 60 seconds

I contacted HDF Groups and suggested them to use Fortran for faster write in this transformation tool or better to make the HDF5 library for FTN95 specifically so the tool would be not needed but they are clueless and useless in these matters

Good about HDF5 is that any output from Fortran or any other language could be read by any other language. Plus of course the read/write speed is crazy fast, specifically write, probably in 20-30 GB/s range, like if you are writing from RAM into the same RAM directly instead of any harddrives, RAMdrives or NVMes
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2551
Location: Sydney

PostPosted: Mon May 17, 2021 2:30 am    Post subject: Re: Reply with quote

DanRRight wrote:
I am screwed up so much that in some cases just the read goes the entire day. If FTN95 could read HDF5 files this would take 60 seconds


Dan, you are known for exageration !!

How big is the dataset that you are reading? Does it span multiple many terrabyte drives ? Please answer !

Perhaps you need a server hardware with more tuned disk I/O ports. PCIe 4.0 x16 : a Gen 4 expansion card or slot with a 16-lane configuration plus 64 or 128 GBytes of memory may do better, but this is hardware tuning. (what hardware/motherboard supports multiple PCIe 4.0 drives? My latest PC SSD is only PCIe 3.0)

My file I/O now takes place in seconds or rarely minutes so I am not familiar with your problem.

Most (all?) Fortran compilers rely on a C?? interface so I don't think Fortran or C would be much different.

Using Fortran stream/transparent I/O would appear to be a likely requirement for HDF5. (The software buffers would be tuned to the hardware technology and motherboard memory installed)

If you are seriously taking a long time to read the "dataset" then the likely cause would be significant random I/O. The most likely improvement would be drives with larger memory buffers or more memory for file buffers for Windows O/S. You could reconstruct the file indexes in memory, but that is significant change to the software.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Mon May 17, 2021 3:35 am    Post subject: Reply with quote

Exaggeration?

10-20 TB output in single run sometimes going for a week on supercomputer. I reduce it decreasing to 0.5-1 TB

Speed of data extraction from HDF5 into binary is around 10-20MB/sec (into ASCII around 3MB/s) . Speed of reading of extracted data is 10x higher but not yet as high as NVMe because some minor calculations go at the same time not often needed

Do the math. Up to a day for extraction and reading. You go to sleep and morning it may finish or not

If i would read HDF5 directly without extraction i'd read it with ~7GB/s speed.
1-2 minutes
Back to top
View user's profile Send private message
JohnCampbell



Joined: 16 Feb 2006
Posts: 2551
Location: Sydney

PostPosted: Tue May 18, 2021 11:26 am    Post subject: Reply with quote

Dan,

You are quoting a significant range of performance.

For 20 TB of information, if we use a NVMe drive and achieve 3 GB/sec, that equates to 1.85 hours to read the info.

However, if you are only getting 20 MB/sec with HDF5 that would take 11.6 days to read.
What does this HDF5 offer, apart from a long lunch ?

I would expect that a Stream/Transparent writing from the super-computer would be a likely data source, then reading on NVMe would be much simpler.
You would need some error recovery and possibly multiple files, but my very limited experience of large datasets has been a very simple data structure.

Multiple files could offer multi process reduction for simplified mult-thread emulation.

If "data extraction from HDF5 into binary is around 10-20MB/sec", HDF5 does not look like a good option !!
( Is this due to the HDF5 data management overhead ? )
"is doing that with 1/3000 speed because is written in C++" does not look like a valid reason, as C++ and Fortran should perform similarly.

The closest experience I must have had to HDF5 must be KDF9 Algol in 1973, which was the slowest computer I ever used. At least that is what I remember?

John
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2813
Location: South Pole, Antarctica

PostPosted: Tue May 18, 2021 2:27 pm    Post subject: Re: Reply with quote

JohnCampbell wrote:

If "data extraction from HDF5 into binary is around 10-20MB/sec", HDF5 does not look like a good option !!
( Is this due to the HDF5 data management overhead ? )
"is doing that with 1/3000 speed because is written in C++" does not look like a valid reason, as C++ and Fortran should perform similarly


This is some additional tool supplied with the HDF5 which has slow speed and is written in C++, as i was told by customer service rep. I am forced to use it to transform HDF5 files into binary because i can not read them directly - we have no FTN95 compatible library. Some features from Fortran Standard 2008 need to be there. May be they are already implemented in FTN95 but this needs time to investigate
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Goto page Previous  1, 2, 3, 4
Page 4 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group