Silverfrost Forums

Welcome to our forums

Using threads

13 Dec 2017 9:39 #20985

Thanks to Ken for providing an example of using multi-threading.

I have taken this and produced a program that tests multi-threading and demonstrates:

how to vary the number of threads and call in a DO loop

DO loop approach allows for variable number of threads.

use the same thread-safe routine for multiple threads, using local private variables and shared variables in a module.

how to transfer a unique argument to each thread call.

how to share work between threads to improve performance

The attached example works for both 32-bit and 64-bit applications.

I hope it could be helpful for others to create useful solutions.

John

https://www.dropbox.com/s/5u4ojctkshhq87z/ken_thread_test4.zip?dl=0

13 Dec 2017 1:57 #20986

The failure of 64 bit CLOCK@ (and DCLOCK@) has now been fixed for the next release of clearwin64.dll.

5 Apr 2019 1:27 #23434

Ken,

Have you made any progress with the threading ?

It would be good to be able to use a DO loop to manage the number of available threads. We could have a DO for the threads available or tasks to perform and then associate the thread number with each task. I am having a bit of a problem managing the private do index for each thread.

Interested to hear how you proceeded.

John

5 Apr 2019 9:22 #23439

John,

Afraid this got put on the back burner last year when I was involved in some work for an arbitration - which consumed all my time. Thereafter I decided it was time for a change in the direction of my career, so I gave up work at the end of 2018. Now have my own one man business up and running and I am presently focusing on new clients - with some success 😄 , but not yet found the time to come back to this, although I do have a long list of 'what happens if' scenarios I need to test. I will get back to this - once the business clears the Director's Loan and pays me a dividend (July hopefully).

Ken

30 Aug 2020 10:40 #26296

Paul, Ken and others,

I have posted a multi-thread example using !$OMP in http://forums.silverfrost.com/viewtopic.php?t=4297&start=15

I am wondering how well this may be reproduced in the FTN95 parallel processing approach. The basic !$OMP PARALLEL DO loop approach is a minimal approach for doing parallel processing. The code example is: call omp_set_num_threads (4) ! !$OMP PARALLEL DO & !$OMP& SHARED ( block_array_records, max_blocks ) & !$OMP& PRIVATE ( i ) & !$OMP& SCHEDULE (DYNAMIC) do i = 1, max_blocks if ( block_array_records(i)%block_size ⇐ 0 ) cycle ! call process_block ( i, block_array_records(i)%block_size, block_array_records(i)%block ) ! end do ! i !$OMP END PARALLEL DO

In this approach, a DO loop is processed using multiple threads. “i” : the DO loop index, is a special private variable, unique to each thread/process, so that I, the DO index, has a different memory address for each thread/process. I have struggled with this in my FTN95 testing. “Max_blocks” defines the loop count, so a variable defines the number of thread events to be processed, while “call omp_set_num_threads (4)” defines the number of threads that process these events. To package each event, the task is processed through “call process_block” An important distinction in OpenMP is between SHARED and PRIVATE variables. This could be managed in FTN95 and OpenMP via call process_block by having all shared variables/arrays as arguments to the routine, while all private variables/arrays are declared as local in the called routine (except for Private 'i'). Can FTN95 allow a general routine like process_block, with flexibility in the arguments. Returned values can be either in the shared arrays or as a shared accumulator. !$OMP& REDUCTION(+ : n_dot) accumulation could be emulated via an argumrent array n_dot(Max_blocks) to return values and sum after the end of the loop. The argument 'block_array_records(i)%block_size' provides a unique address to the routine, using 'i'. While processing, we need to know both the loop counter “I” and the thread id “id = omp_get_thread_num ()” The allocation of threads to each loop itteration is via loop '!$OMP& SCHEDULE (DYNAMIC)' in this the next itteration 'i' is allocated to the next available thread, when they become available. SCHEDULE (STATIC) is an alternative where each loop itteration 'i' has a pre-defined thread 'id'. These two alternative thread allocation cases would be necessary for load management between threads.

Are you able to comment on how some of these approaches are available or may be available in multi-core processors with 64 bit FTN95.

30 Aug 2020 1:08 #26297

John

Sorry but my knowledge of this subject is very limited.

31 Aug 2020 8:57 #26299

John, this is a variation on one of the examples for the parallel processing approach. Unlike Gfortran, with FTN95 you cannot simply define a section of code to be executed in parallel. So all serial code prior to the parallel section must be within the IF( .not. IsSlaveProcess@()) THEN ...... END IF block.

It took me ages to get this example to work this way, and then I went off to do something else and never came back to it.

    program main
    implicit none
    INCLUDE <windows.ins>
    DOUBLE PRECISION start_time,end_time,sum
    double precision duration, sum1  
    DOUBLE PRECISION,allocatable::partial_answer(:)
    INTEGER(kind=4) ID
    INTEGER(kind=4) k
    integer(kind=4) :: np=4, i, j
    
!>> TEST TO FIND MAIN PROCESS.  Note if IF/ENDIF is commented out, the subroutine is called NP times
     IF( .not. IsSlaveProcess@()) THEN
        call set_parameters(np)
     ENDIF

!>>   Start np-1 additional tasks. ID will be returned thus:
!>>   Master task ID=0 
!>>   Slave task ID=1,2,3 in the different processes        

      ID=GetParallelTaskID@(np-1)    !##
      IF(ID .eq. 0) print*, 'Number of processors', np

!>>   Allocate a shared array. The string 'AUTO' couples the ALLOCATE with the parallel task mechanism    
      ALLOCATE(partial_answer(np),SHARENAME='shared_stuff')
      CALL TaskSynchronise@()

!>>   Time the task using wall clock elapsed time    
      CALL dclock@(start_time)
      sum=0d0

!>>   All np processes compute the sum in an interleaved fashion   
      k = 10000000000_4 - ID
      WHILE(k > 0)DO
        sum = sum + k        
        k = k - np
      ENDWHILE

!>>   Copy the partial sum into the array shared between the processes    
      partial_answer(ID+1)=sum
      CALL TaskSynchronise@()
      CALL dclock@(end_time)
      IF(ID==0)THEN
!>>     We are the master task, so print out the results and the timing    
        sum1 = 0.d0
        do i = 1, np
          sum1 = sum1 + partial_answer(i)
        end do
        PRINT *,'Sum=',sum1
        duration=end_time-start_time
        PRINT *,'Parallel computation time = ',duration
      ENDIF
      CALL TaskSynchronise@()

!>>   Kill off the slave process    
      IF(ID .ne. 0) STOP

      DEALLOCATE(partial_answer)

  END PROGRAM

  subroutine set_parameters(np)
  implicit none
  integer(kind=4), intent(out) :: np
10  write(6,*)
    write(6,*) 'Enter number of processors to use'
    read(5,*) np
    if (np .lt. 1) goto 10
  end set_parameters
19 Apr 2022 10:18 #28922

I was wondering if there has been any progress in this? Has anyone used parallelisation successfully in ftn95?

15 Jan 2024 9:23 #30955

Was this fixed for /64 ?

15 Jan 2024 10:45 #30956

Dan

As far as I know there are no outstanding issues on this subject (no bugs that need fixing).

Please login to reply.