To run the programs in parallel this compiler has one neat trick related to NET which i'd like to use. Here are two small programs which illustrate the approach. One is straightforward, it uses separate subroutines for each thread and works fine, it gives you speedup almost proportional to amount of processors. And another, of more general form, simpler and smaller and where you can divide the workload on arbitrary amount of threads, and hence much more practical, by unknown reason is unstable.
What is in the program - just the DO loops doing some fake simulation 200M times. First do loop works on single processor and gives you estimation of usual single-threaded CPU time. Then in the first program, which works fine, i explicitly start 4 threads (my PC has 4-core processor, and though it can run 8 independent threads the speedup is not 8 times of course because of only 4 floating point units per CPU) each thread takes 1/4 of the DO loop cycles. In the second program, which is unstable, I use another DO loop which creates and starts 4 threads using one single subroutine where the workload is automatically divided on needed amount of threads (i took 4 by mentioned above reason). While first program sustains any torture, but is not how parallel programs have to be written with 10, 100 or more processors, with the second I can not even get notification that first thread is started...
Please run both programs and tell me what you get. Any hints of making the second example stable are appreciated.
/* Code is almost 100% standard Fortran, except just one new command LOCK to avoid conflicts of threads which does not need explanation and calls to the couple FTN95 library functions. Also, after playing a bit with the code you can decrease first cycle 10 times so it will not annoy you. By the way the example for just two threads is here https://forums.silverfrost.com/Forum/Topic/1894&postdays=0&postorder=asc&highlight=f2003&start=30
! Multithreading example parallel4a.f95
! Dan R Right 2012
!
! Watch in Task Manager how four threads run1,... run4
! grab four processor cores working in parallel
!
! Compilation: ftn95 parallel4a.f95 /clr /link /multi_threaded
!
include <clearwin.ins>
EXTERNAL run1, run2, run3, run4
common /abc_/kEnded1, kended2, kended3, kended4
common /abc2_/itotal
common /threads_/nThreads
character*1 cha
!..... straight non-threaded run
print*,' Run w/o threads started'
nThreads=1
call clock@ (time_start)
itotal = 0
d=2.0
do i=1,200000000/nThreads
d=alog(exp(d))
itotal = itotal +1
enddo
call clock@ (time_finish)
print*, 'Elapsed time without threads=', time_finish-time_start, itotal
!...multithreaded
nThreads = 4
call clock@ (time_start)
itotal = 0
CALL CREATE_THREAD@(run1,21)
CALL CREATE_THREAD@(run2,22)
CALL CREATE_THREAD@(run3,23)
CALL CREATE_THREAD@(run4,24)
!...wait till all threads end
do while (kEnded1==0.or.kEnded2==0.or.kEnded3==0.or.kEnded4==0)
call sleep1@(0.1)
enddo
call clock@ (time_finish)
print*, 'Elapsed time with threads=', time_finish-time_start, itotal
print*, 'Enter any key+Enter to exit'
read(*,*) cha
END
!=============================================================
subroutine run1()
include <clearwin.ins>
common /abc_/kEnded1, kended2, kended3, kended4
common /abc2_/itotal
common /threads_/nThreads
lock; print*,'Thr.1 started'; end lock
d =2.0
itot =0
kEnded1=0
do i=1,200000000/nThreads
d=alog(exp(d))
itot = itot+1
enddo
lock; itotal = itotal +itot ;end lock
kEnded1=1
lock; print*,'Thr.1 ended' ; end lock
end