Silverfrost Forums

Welcome to our forums

Fails to save arrays > 4GB

20 Jul 2023 12:45 #30461

Dan,

So the answer is the latest FTN95 does not crash with your latest example.

There are 2 problems with this test example which you should also consider.

  1. CPU_TIME is not appropriate for estimating disk I/O performance. This can be clearly seen from your method 2 where the processor is waiting for the IO operations to complete. This is demonstrated by different cpu time vs wall clock time which I included in my modified example.

  2. The order of disk tests is also an issue as repeating the test changes the disk and memory environment. For method 2 read, the last write operation uses the identical (large) buffer as the previous write, so the information is already in the IO buffer and apparently the Windows I/O recognises this and does not do any read. This GByte/sec is not a realistic read rate estimate.

Stream I/O satisfies the portability of FTN95 and Gfortran.

Perhaps the next horizon is endianess !!

20 Jul 2023 11:41 #30463

Paul, Based on your timings i was scratching the head on what kind of computers do you run this test? I do not know if in the nature yet existed such a computer 😃. Aliens may have one which run Test1 with such huge boost. This was because of caching ? But i have not seen such tremendous effect of caching on Test1, all tests ran well below 1GB/s. Was there some secret compiler keys used?

Trying to allocate GB of RAM : 8.80000000000
Allocation success
Trying to save the data Method 1
Write OK. Speed of write Method 1 = 1.36699
=====================
================ N O W R E A D ====================
READ OK. Speed of read Method 1 = 3.04432
=============================
Trying to save the data Method 2
Write OK. Speed of write Method 2= 6.54884
=====================
================ N O W R E A D ==================
READ OK. Speed of read Method 2 = 1.23239
======================
**** PAUSE: File LargeFile.dat created OK

If this is timer problem may be would be good to introduce some new more realistic timer.

21 Jul 2023 5:51 #30464

Dell Inspiron 27 7710 All-in-One

Processor 12th Gen Intel(R) Core(TM) i7-1255U 1.70 GHz Installed RAM 16.0 GB (15.7 GB usable) System type 64-bit operating system, x64-based processor Pen and touch No pen or touch input is available for this display

Windows 11 Home

512 GB Solid State Drive (M.2 SSD) + 1 TB Serial ATA (SATA) Intel Iris Xe Graphics

21 Jul 2023 7:35 (Edited: 21 Jul 2023 8:26) #30465

I have no explanation for your numbers vs my AMD 5950X processor, 128GB RAM and WD850X NVMe storage besides you have DDR5 vs my DDR4:

https://nanoreview.net/en/cpu-compare/intel-core-i7-1255u-vs-amd-ryzen-9-5950x

But I have never seen any even super-duper fast memory could give more than 10% difference.

Have you modified my test? I see some signs of editions

Do anybody here have 12-13th generations of Intel Core processors and see such speeds with Method 1 ? Or AMD 5950X/7950X with DDR5 and PCIe 5.0 ? Use my test from page 4 above, nothing else

21 Jul 2023 10:42 #30466

Quoted from DanRRight If this is timer problem may be would be good to introduce some new more realistic timer.

There is already a timer that solves the timer problem : SYSTEM_CLOCK

I have provided a modified version of your test that shows the ratio of CPU time to wall clock time and also tests a variety of block sizes.

https://www.dropbox.com/s/kxk7e0z1fbiuyq4/read_write2.f90?dl=0

https://www.dropbox.com/s/exqqpwzsb0efiwg/stream_tests.log?dl=0

You were incorrectly estimating much higher transfer rates because the cpu was waiting for the IO buffer delays.

With your old method 1, you were writing 200 million records of 44 bytes, which is 200 million writes. That is why it was so much slower. It is not hard to do fewer writes of larger records, as I have demonstrated in this example.

The method 2 read performance is very interesting, as there is actually no read from the file, as the file buffers contain this last record from the write. So not a true read test.

It is not a timer problem but a test problem.

25 Jul 2023 5:53 #30476

Paul, Your test results raised few questions because they are very unusual for the Test1. What memory type your PC has? Is it DDR4 or DDR5? Is bus PCIe 4 or PCIe 5? What storage type NVMe or harddrive? Did you edit my test on Page 4? Can you repeat my test one more time also measuring time by hand watch(approximately is also OK) or can you use SYSTEM_CLOCK (not adding to the test but substituting mine ) as JohnC claims this is problem with the timer (this claims look doubtful for the results of Test1 because test lasts for a long time and due to that the different timers differences have to be small)?

25 Jul 2023 6:51 #30477

My machine is 'out of the box' as described above. I don't recall needing to change your code.

With my limited knowledge in this area I think that the main issue is that it has '512 GB Solid State Drive (M.2 SSD)' and 16 GB of RAM.

Also '12th Gen Intel(R) Core(TM) i7-1255U 1.70 GHz',

25 Jul 2023 12:13 #30478

Dan,

Just publish a test that uses system_clock, rather than cpu_time, as if the processor is idle for I/O interupts, you just get a wrong result. It is BS to persist with a cpu_time test.

A 12th gen motherboard may benefit from PCIe 4.0 NVMe connection.

25 Jul 2023 9:26 #30479

cpu_time - bad timer?

26 Jul 2023 7:03 #30480

Quoted from DanRRight cpu_time - bad timer?

Why ask the question again. You have been quoting disk I/O transfer rate as GBytes per second, not bytes per used processor cycle.

Clearly, ignoring the time when the processor is waiting for the disk IO availability should not be excluded from the estimate of disk I/O transfer rate.

The other problem with these tests is we are reporting the rate for transferring information to or from the operating system file buffers. We don't know how well this equates to the disk reading or writing speeds. I am not aware of how to measure that the read or write action includes buffers have been emptied.

For method 2 : read; we are basically retrieving the same block of information that was just written. The OS should recognise this is still available in the buffer and so no disk access is required. Unless the disk buffers are not big enough. The 2 pc's I use have 32 gb or 64 gb of memory, so my tests never exhaust the disk buffers. Paul's pc with 16 GB memory might exhaust the buffers and so report a lower read rate.

Yet another problem is there are also buffers in the SSD drives, so we can get very different times if we overflow the SSD (faster memory) buffers.

All considered, it is very difficult to know what is being reported, which I think the SSD manufacturers rely on when reporting rates.

I also have a HP notebook with an SSD but only 8 GB memory> This reports much lower transfer rates, which is not surprising.

I think we can conclude writing 2.e8 buffers of 44 bytes is much slower than one large block, but there is possubly/probably a middle ground block size that better suits the Fortran unformatted read/write library.

Write then read tests always benefit from pre-charged disk buffering. A different test could be reading a terrabyte stream file, but the logistics of doing the test can be difficult. And what would it show ?

Remaining puzzled by the results !!

26 Jul 2023 8:15 #30483

Quoted from JohnCampbell

Write OK. Speed of write Method 1 = 0.360 0.359 GB/sec READ OK. Speed of read Method 1 = 0.462 0.455 GB/sec

Huge improvement on your new method vs old method, 0.360 vs 0.359, right ? What numbers i was interested to see asking Paul to repeat the test? Method1 because they show 2x improvement versus mine

Why i asked this? Because there are couple major differences between his PC and mine PC: processor type and DDR4 vs DDR5 type of memory. I never seen 2x differences with the same generation processors and no matter how fast memory i used i never got more than 10-20% differences before. If some specific processor/memory type combination plays such huge difference that could potentially improve my numerical simulations speed 2x if i switch. So with Method 1 I was actually not interested with the speed of harddrive (i will use for that Method2) but the speed of some other computer sybsystem this test abruptly revealed

Why i was not interested with your 'improvements'? Because my Method 1 runs for a long time and there will be no differences between any timers. Method 2 is much faster but it will be stopped by the speed of PCIe/drive speed itself anyway and i also need it to read/write only huge files where test runs for a long time.

If you are interested with general timer improvement try to convince Silverfrost to create new timer. Though FTN95 needs other much more important things to improve to be a compiler of 21th century. But when something needs the improvements here you quietly vote with your feet for other compilers where all that things are already realized

27 Jul 2023 3:19 #30484

Dan,

I don't think it is too difficult to understand the difference between elapsed time and CPU allocated time. Clearly the difference is more significant with Method 2, where there are more disk wait delays.

In my previous results I linked in Posted: Fri Jul 21, 2023 8:42 pm, I tried to in better identify this problem, by also reporting % CPU usage as a proportion of elapsed time.

============ Trying to save the data Method 2 ============
Method 2 write  6.7683 el  2.9687 cp    43.86 %
Write OK.  Speed of write  Method 2=   2.964   1.300 GB/sec

============ Now read Method 2 ============
Method 2 read   2.0316 el  1.6719 cp    82.29 %
READ OK. Speed of read   Method 2 =   5.264   4.332 GB/sec

In my most recent post I tried to explain the many reasons why there are problems with testing IO performance and also explain how the use of CPU_TIME significantly overestimated the Method 2 performance.

Because there is a lot more CPU usage for Method 1, the alternative rate estimates are not so inconsistent, which I expected you would have understood.

The only improvement for FTN95 would be to provide an integer8 function rdtsc_ticks () ( and integer8 function rdtsc_tick_rate () ), although this is not required for these tests. This should be provided by all compilers !

10 Feb 2024 8:29 #31087

I have tidied up a program that uses stream I/O to replicate FTN95 unformatted sequential access files for larger records.

This approach takes the 1-byte (records up to 240 bytes long) else -1+4 bytes approach of FTN95 and extends to -1 -1 +8 bytes header/footer for records larger than 2^31 -9 bytes.

This test creates a test file of 12 GBytes. ( max record size about 3 gbytes ) I ran it on a 500 GByte NVME SSD drive, so it may be slower on a slower drive.

Hopefully it demonstrates the 1-4-8 record header approach that could be a possible extension for FTN95

Alternatively, this example might be adapted to the Gfortran / Ifort approach of 2^31-9 long sub-records.

Basically Stream I/O is a very flexible platform for creating file record structures.

I thought this example may be of interest to those wanting to manage larger file records.

Let me know if you like or don't like this approach ?

https://www.dropbox.com/scl/fi/n3d61rw15occbvs9vl1ik/read_large_record_v3.f90?rlkey=5h9d5kip1oaj6wzt7364xhg5w&dl=0

https://www.dropbox.com/scl/fi/r25m5a99k90950847rrxe/read_large.log?rlkey=l5kuymezc1lfbezwsln9zl0go&dl=0

9 Mar 2024 4:57 #31239

I have made progress on testing write / read of large records, now up to 25 gBytes. These now work for Unformatted sequential read write and also for stream read/write.

They should work on the next release of FTN95 Ver 9.0? For records larger than 2 GTBytes, the unformatted sequential header/trailer is now 9 bytes, where byte 1 = -2 for an 8-byte size value.

I have also been looking at the I/O speeds for PCIe SSDs ( I think mine is a Ver 3 ) The rates they quote are a bit misleading.

On my PC which has 64 GBytes of physical memory, If I do a write then read test for a 32 GByte file, I can get write speeds up to 2.8 GBytes/sec, unformatted read up to 4.0 GBytes/sec and stream read over 7.5 GBytes/sec. These high read speeds are basically because the file is stored in the memory disk buffers.

If I split this test program into 3 seperate programs or increase the file size to more than 64 GBytes, the performance declines:

For write speed reduces as the file size increases, from 3.0 GBy/sec for the first record, to 0.8 GBy/sec for most others. This is due to overflowing the available memory buffers and also the SSD internal buffers.

For sequential read, speed starts at 0.28 GBy/sec for the first record, then to 0.9 GBy/sec. This is due to limited available memory buffers.

For stream read, speed starts at 2.15 GBy/sec for the first record, then declines gradually to 1.5 GBy/sec.

These read speeds are much less than the case where reads followed writes and file size was much less than physical memory. this is due to limited available memory buffers.

On the plus side: if the file is buffered in memory by the OS, stream re-read can achieve great rates in moving the data from OS memory to program memory ( over 8 GBy/sec in some cases ) On ite -ve side: if the SSD buffers are full or OS memory buffers are not pre-loaded, transfer rates are far lower than the disk technology quote. Buffering is great when it works!

In these tests I tested records from 1 GByte to 25 GBytes, so large records do now work. However, in these tests the I/O lists I am using are very basic: 'write (lu, iostat=iostat ) vector' or 'read (lu,iostat=iostat) vector' so more complex I/O lists may be different. Even 'read (lu,iostat=iostat) vector(1:nn)' crashes with a Vstack error, while 'read (lu,iostat=iostat) (vector(k),k=1,nn)' is much slower for a well buffered case.

We can now save arrays much larger than 4 GBytes. I will post the program that sucessfully tests up to 25 GByte records and 66 GByte files. If using stream read, it is also relatively easy to read FTN95, Gfortran or Ifort unformatted sequential records, providing the I/O list is not too complex. Files created with Stream I/O should be much more portable.

The Vstack error at 8 GBytes was a surprise so there could be more surprises at greater sizes ?

Please login to reply.