replica nfl jerseysreplica nfl jerseyssoccer jerseyreplica nfl jerseys forums.silverfrost.com :: View topic - Two multithreading programs
forums.silverfrost.com Forum Index forums.silverfrost.com
Welcome to the Silverfrost forums
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Two multithreading programs
Goto page 1, 2  Next
 
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General
View previous topic :: View next topic  
Author Message
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Thu Apr 25, 2013 5:59 pm    Post subject: Two multithreading programs Reply with quote

To run the programs in parallel this compiler has one neat trick related to NET which i'd like to use. Here are two small programs which illustrate the approach. One is straightforward, it uses separate subroutines for each thread and works fine, it gives you speedup almost proportional to amount of processors. And another, of more general form, simpler and smaller and where you can divide the workload on arbitrary amount of threads, and hence much more practical, by unknown reason is unstable.

What is in the program - just the DO loops doing some fake simulation 200M times. First do loop works on single processor and gives you estimation of usual single-threaded CPU time. Then in the first program, which works fine, i explicitly start 4 threads (my PC has 4-core processor, and though it can run 8 independent threads the speedup is not 8 times of course because of only 4 floating point units per CPU) each thread takes 1/4 of the DO loop cycles. In the second program, which is unstable, I use another DO loop which creates and starts 4 threads using one single subroutine where the workload is automatically divided on needed amount of threads (i took 4 by mentioned above reason).
While first program sustains any torture, but is not how parallel programs have to be written with 10, 100 or more processors, with the second I can not even get notification that first thread is started...

Please run both programs and tell me what you get. Any hints of making the second example stable are appreciated.

/* Code is almost 100% standard Fortran, except just one new command LOCK to avoid conflicts of threads which does not need explanation and calls to the couple FTN95 library functions. Also, after playing a bit with the code you can decrease first cycle 10 times so it will not annoy you.
By the way the example for just two threads is here http://forums.silverfrost.com/viewtopic.php?t=2176&postdays=0&postorder=asc&highlight=f2003&start=30

Code:

! Multithreading example parallel4a.f95
! Dan R Right 2012
!
! Watch in Task Manager how four threads run1,... run4
! grab four  processor cores working in parallel
!
! Compilation: ftn95 parallel4a.f95 /clr /link /multi_threaded
!
  include <clearwin.ins>
  EXTERNAL run1, run2, run3, run4
  common /abc_/kEnded1, kended2, kended3, kended4 
  common /abc2_/itotal
  common /threads_/nThreads
  character*1 cha

!..... straight non-threaded run

  print*,' Run w/o threads started'
  nThreads=1
  call clock@ (time_start)
     itotal = 0
     d=2.0

      do i=1,200000000/nThreads
       d=alog(exp(d))
       itotal = itotal +1   
      enddo

  call clock@ (time_finish)
  print*, 'Elapsed time without threads=', time_finish-time_start, itotal

!...multithreaded
  nThreads = 4
  call clock@ (time_start)
 
    itotal = 0
    CALL CREATE_THREAD@(run1,21)
    CALL CREATE_THREAD@(run2,22)
    CALL CREATE_THREAD@(run3,23)
    CALL CREATE_THREAD@(run4,24)

 !...wait till all threads end
  do while (kEnded1==0.or.kEnded2==0.or.kEnded3==0.or.kEnded4==0)
    call sleep1@(0.1)
  enddo



  call clock@ (time_finish)

  print*, 'Elapsed time with threads=', time_finish-time_start, itotal
  print*, 'Enter any key+Enter to exit'
  read(*,*) cha
  END

!=============================================================

  subroutine run1()
  include <clearwin.ins>
  common /abc_/kEnded1, kended2, kended3, kended4 
  common /abc2_/itotal
  common /threads_/nThreads

     lock;   print*,'Thr.1 started'; end lock
     d      =2.0
     itot   =0
     kEnded1=0

      do i=1,200000000/nThreads
       d=alog(exp(d))
       itot = itot+1
      enddo

     lock; itotal = itotal +itot  ;end lock
     kEnded1=1
     lock;   print*,'Thr.1 ended' ; end lock
  end



Last edited by DanRRight on Thu Apr 25, 2013 8:42 pm; edited 17 times in total
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Thu Apr 25, 2013 6:01 pm    Post subject: Reply with quote

...continue...END OF THE FIRST PROGRAM

Code:


!-----------------------------------------
  subroutine run2()
  include <clearwin.ins>
  common /abc_/kEnded1, kended2, kended3, kended4 
  common /abc2_/itotal
  common /threads_/nThreads

     lock;   print*,'Thr.2 started' ;end lock
     d      =2.0
     itot   =0
     kEnded2=0

      do i=1,200000000/nThreads
       d=alog(exp(d))
       itot = itot+1
      enddo

     lock; itotal = itotal +itot  ;end lock
     kEnded2=1
     lock;   print*,'Thr.2 ended' ;end lock
  end

!---------------------------------------------
  subroutine run3()
  include <clearwin.ins>
  common /abc_/kEnded1, kended2, kended3, kended4 
  common /abc2_/itotal
  common /threads_/nThreads

     lock;   print*,'Thr.3 started' ;end lock
     d      =2.0
     itot   =0
     kEnded3=0

      do i=1,200000000/nThreads
       d=alog(exp(d))
       itot = itot+1
      enddo

     lock; itotal = itotal +itot  ;end lock
     kEnded3=1
     lock;   print*,'Thr.3 ended' ;end lock
  end

!---------------------------------------------
  subroutine run4()
  include <clearwin.ins>
  common /abc_/kEnded1, kended2, kended3, kended4 
  common /abc2_/itotal
  common /threads_/nThreads

     lock;   print*,'Thr.4 started' ;end lock
     d      =2.0
     itot   =0
     kEnded4=0

      do i=1,200000000/nThreads
       d=alog(exp(d))
       itot = itot+1
      enddo

     lock; itotal = itotal +itot  ;end lock
     kEnded4=1
     lock;   print*,'Thr.4 ended' ;end lock
  end


Last edited by DanRRight on Thu Apr 25, 2013 6:31 pm; edited 1 time in total
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Thu Apr 25, 2013 6:03 pm    Post subject: Reply with quote

HERE IS SECOND PROGRAM

Code:

! Multithreading example parallelN.f95
! Dan R Right 2013
!
! Watch in Task Manager how threads
! grab processor cores working in parallel
!
! Compilation: ftn95 parallelN.f95 /clr /link /multi_threaded
 
  include <clearwin.ins>
  EXTERNAL runN
  parameter (nThrMax=8)
  common /threads_/nEmployedThreads, iCurrThread, kEnded(nThrMax)
  common /abc1_/itotal1, itotal2, itot(nThrMax)
  character*1 cha

 !................ straight non-threaded run

  kEnded(:) = 1
  itot(:)   = 0

  print*,' Run w/o threads started'
  nEmployedThreads=1
  iCurrThread = 1

  call clock@ (time_start)
     itotal1 = 0
     itot(iCurrThread)=0
     d=2.22

      do i=1,200000000/nEmployedThreads
       d=alog(exp(d))
       itotal1 = itotal1 + 1   
       itot(iCurrThread) = itot(iCurrThread) + 1
      enddo

  call clock@ (time_finish)
  print*, 'Elapsed time without threads=', time_finish-time_start
  print*, 'Checksum: totals using two methods = ', itotal1, itot(iCurrThread)

!...................multithreaded .............

  nEmployedThreads = 4
  call clock@ (time_start)
 
    itotal1 = 0
    itotal2 = 0
    itot(:) = 0

   do iCurrThread = 1, nEmployedThreads
    CALL CREATE_THREAD@(runN,20+iCurrThread)
   enddo

 !...wait till all threads end
  do while (minval(kEnded)==0)
    call sleep1@(0.1)
  enddo

  call clock@ (time_finish)

  print*, 'Elapsed time with threads=', time_finish-time_start
  print*, 'Checksum: totals using three methods = ', itotal1, itotal2, sum(itot)
  print*, 'Enter any key+Enter to exit'
  read(*,*) cha
  END

! =====================================================================
  subroutine runN()
  include <clearwin.ins>
  parameter (nThrMax=8)
  common /threads_/nEmployedThreads, iCurrThread, kEnded(nThrMax)
  common /abc1_/itotal1, itotal2, itot(nThrMax)

     ithr = iCurrThread
     lock;   print*,'Thr. started = ', ithr ; end lock

     d      =2.22
     itot  (ithr)=0
     itot2       =0
     kEnded(ithr)=0

      do i=1,200000000/nEmployedThreads
       d=alog(exp(d))
       itot(ithr) = itot(ithr)+1
       itot2      = itot2 + 1
      enddo

     lock; itotal1 = itotal1 + itot(ithr)  ;end lock
     lock; itotal2 = itotal2 + itot2       ;end lock
     lock; print*, 'Thr.  ended = ', ithr  ;end lock

     kEnded(ithr)=1
  end
Back to top
View user's profile Send private message
davidb



Joined: 17 Jul 2009
Posts: 560
Location: UK

PostPosted: Thu Apr 25, 2013 7:32 pm    Post subject: Reply with quote

In your second program, you should pass the thread number as an argument to the subroutine runN? This is probably what is wrong.

If I can, I will post a working example of multi-threading with FTN95 in a few days; just a little busy at the moment.
_________________
Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Thu Apr 25, 2013 8:13 pm    Post subject: Reply with quote

Thanks for the critical point. Yes, adding integer argument to RunN (m) can reveal the thread handle (the one which is = 20+iCurrThread), which could be used for thread counting.

When I added argument m to RunN and changed one line
ithr = iCurrThread

to
ithr = m-20

all started working OK. So any guess why iCurrThread passed via common was wrong? Is there a chance that anything passed via common could be wrong ? When we clear that and formulate some simple rules what's to do and what's not i think anyone will be able safely write parallel programs using FTN95
Back to top
View user's profile Send private message
davidb



Joined: 17 Jul 2009
Posts: 560
Location: UK

PostPosted: Thu Apr 25, 2013 10:35 pm    Post subject: Reply with quote

Data in common will probably be Shared amongst the threads, so it isn't possible for each thread to have different sets of common variables.

On the other hand, arguments passed as arguments, and local variables in the subroutine, will be stored on the stack. There should be a different stack for each thread.

However, this isn't the point. The subroutine runN requires an integer argument is passed to it. If you leave this out you will be potentially corrupting the call stack frame at each call and the behaviour you wil get will be unpredictable.

You may find it helpful to read up on the difference between shared and private variables in OpenMP as the distinction used there is what you need to be aiming for with your own codes using FTN95. For example, you will learn it is OK for each thread to read data in shared variables, but to write requires the use of locks.
_________________
Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Fri Apr 26, 2013 1:37 am    Post subject: Re: Reply with quote

davidb wrote:
Data in common will probably be Shared amongst the threads, so it isn't possible for each thread to have different sets of common variables.


That transfer of data via common block used that exactly property. Again, as it is seen from the code above, the DO loop sets the variable iCurrThread which is in the common block, then launches the thread with RunN which also has this common block inside, the thread picks up this value, assignes it to local variable ithr to keep it unchanged for this specific thread for further use and what we have got was unbelievable: the variable was immediately different from one it was sent!
So the reason most probably as you have mentioned was corruption due to absent argument in RunM though the first code was working OK without this argument no matter how many threads used. Or the reason was different because the iCurrThread variable is still passed via common with wrong value? But (I am still curious) may be someone got the problem even with the first program example above?

Update: Looks like threads are launched by the DO loop and the code inside the threads start running absolutely independently, no one waits for another and we can not tell in which order. So i suspect this is the reason of wrong iCurrThread: the DO loop succeeds to launch two threads before the first thread picks up the variable.
Back to top
View user's profile Send private message
davidb



Joined: 17 Jul 2009
Posts: 560
Location: UK

PostPosted: Fri Apr 26, 2013 7:16 pm    Post subject: Reply with quote

You cannot pass the thread number in COMMON. When you run subroutines using different threads, they run "concurrently" in PARALLEL, or at least overlap with each other in time. This is the whole point of using multiple threads on different processors or cores - to get some parts of your code running in parallel.

In your second code, each iteration of the loop sets up one thread and sets up iCurrThread (the thread number) in COMMON. The first iteration of your loop calls runN which enters an expensive DO Loop (i=1, 200000000). At the start of this loop, the value of iCurrThread in memory is correct. However, before this loop can complete, a second call to runN is made with a different thread. This second call changes the value of iCurrThread whilst the first thread is still running runN which leads to incorrect results. The problem is made worse when the third and fourth threads start up.

The only way to fix this is to pass the thread number as an argument to the subroutine you want to call. Each thread will then pass a different value on the stack and each call of runN can safely run in parallel.

Your code is a little bit unusual in the way you have partitioned the work between threads. I have made another Post showing how to do this in a more general way, which people may find useful.

I have run your modified code 2 and it does give me an improvement, but I only have 2 cores here. I would be interested to see what performance you are getting with 1 thread, 4 threads and 8 threads, where the last is using hyperthreading.
_________________
Programmer in: Fortran 77/95/2003/2008, C, C++ (& OpenMP), java, Python, Perl
Back to top
View user's profile Send private message
jalih



Joined: 30 Jul 2012
Posts: 196

PostPosted: Sat Apr 27, 2013 4:26 pm    Post subject: Re: Reply with quote

davidb wrote:

Your code is a little bit unusual in the way you have partitioned the work between threads.

It's a little bit hard to follow also... Some multithreaded matrix multiplication test would be interesting.

I wrote a simple multithreaded 1000x1000 sized square matrix multiplication test using PL/I-language, available here with source and binary.
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Sat Apr 27, 2013 5:54 pm    Post subject: Reply with quote

Well, here is the same code #2 after simplification, beautification and syntactic sugar together with the runtime results. Total CPU time is elapsed time * amount of threads. Computer - i7 laptop, 2.2GHz, 4 cores, 8 threads set in BIOS, Windows7.

It can not be simpler and more self-explanatory than that. Code creates independent threads running subroutine runN (iThrHandle) using CALL CREATE_THREAD@(runN,20+i) where iThrHandle = 20+i is thread handle and follows the moment they finish when all flags kThreadEnded are =1. Exactly the same has been generally done by any other parallelization methods like OpenMP, CUDA etc when the threads could be independent. Add into subroutine RunN any specific task you want, Monte-Carlo, matrix operations, parallelizing DO loops etc partitioning them the way you want. Set when start the thread kThreadEnded(ithr) = 0 and make it =1 when finished. That's all. I set parameter nThrMax=8, use your own CPU cores/threads count. You may also consider adding or removing tiny delay when launching the threads (see the line call sleep1@(0.02) ) if "threads jam" (?) conflict takes place when amount of threads is very large (why? this is the question to FTN95 developers. This is second time i find necessity for tiny delay with NET, first time it was for Clearwin+ GUI Run-Pause-Stop code, it was posted here 3-4 years ago).

And the results as a function of # of threads are great, the speedup on 4 cores is like on almost 8-core CPU. Isn't this amazing?

1 Elapsed time, total CPU time= 15.6210 15.6210
2 Elapsed time, total CPU time= 7.94200 15.8840
4 Elapsed time, total CPU time= 4.07800 16.3120
5 Elapsed time, total CPU time= 3.39700 16.9850
6 Elapsed time, total CPU time= 2.98200 17.8920
8 Elapsed time, total CPU time= 2.49800 19.9840

Code:

! Compilation: ftn95 parallelN4.f95 /clr /link /multi_threaded
! Code is yours. Use it at your own risk
!
 
  include <clearwin.ins>
  EXTERNAL runN
  parameter (nThrMax=8)
  common /threads_/nEmployedThreads, kThreadEnded(nThrMax)
  character*80 getenv@

  write(*,*) 'Processor ', getenv@('PROCESSOR_IDENTIFIER')
  READ(getenv@('NUMBER_OF_PROCESSORS'),*) n_processorsTotal
  write(*,*) ' Max number of threads=', n_processorsTotal

1 print*,' Enter number of parallel threads <= 8'
  read(*,*)   nEmployedThreads
  if(nEmployedThreads.lt.1.or.nEmployedThreads.gt.8) nEmployedThreads=4

  call clock@ (time_start)

!...set a flag of thread finished
  kThreadEnded(:)=1

  do i = 1, nEmployedThreads
     CALL CREATE_THREAD@(runN,20+i)
     call sleep1@(0.02)
  enddo

!...wait till all threads finish
  do while (minval(kThreadEnded)==0)
    call sleep1@(0.1)
  enddo

  call clock@ (time_finish)

  time = time_finish-time_start
  time2= time * nEmployedThreads
  print*, 'Elapsed time, total CPU time=', time, time2

  goto 1

  END

! =======================================================
  subroutine runN (iThrHandle)
  include <clearwin.ins>
  parameter (nThrMax=8)
  common /threads_/nEmployedThreads, kThreadEnded(nThrMax)

  ithr = iThrHandle-20
  kThreadEnded(ithr) = 0

  lock;   print*,'Started thread # ', ithr ; end lock

  d      =2.22

  do i=1,200000000/nEmployedThreads
    d=alog(exp(d))
  enddo

  lock;     print*, 'Ended   thread # ', ithr  ;     end lock
  kThreadEnded(ithr)=1
  end
Back to top
View user's profile Send private message
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Sun Jun 23, 2013 9:42 pm    Post subject: Reply with quote

Wondering why this functionality is available only in NET...Would be great if Silverfrost made it for regular x86 mode.
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 8210
Location: Salford, UK

PostPosted: Mon Jun 24, 2013 8:30 am    Post subject: Reply with quote

I have this as a high priority on my list of things to do.
Back to top
View user's profile Send private message AIM Address
DanRRight



Joined: 10 Mar 2008
Posts: 2923
Location: South Pole, Antarctica

PostPosted: Tue Jun 25, 2013 8:22 am    Post subject: Reply with quote

THAT deserves BIG THANKS!
Back to top
View user's profile Send private message
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 8210
Location: Salford, UK

PostPosted: Wed Jul 03, 2013 3:52 pm    Post subject: Reply with quote

I have uploaded a new beta version of salflibc.dll to http://www.silverfrost.com/beta/salflibc.exe

This contains new Win32 thread functions illustrated in the following sample.
The new routines are described in the file cwplus.enh that is included in the download.

Code:
module threadMod
 c_external start_thread@    '__start_thread' (REF,REF):integer*4
 c_external wait_for_thread@ '__wait_for_thread' (VAL)
 c_external lock@            '__lock' (VAL)
 c_external unlock@          '__unlock' (VAL)
 integer,parameter::IO_LOCK = 42    !Any value. Your choice
contains
 subroutine threadFunc(count)
  integer count,start
  if(count < 0) return  !Illustrates an abort
  start = count
  do while(count > 0)
    call sleep1@(1.0)
    call lock@(IO_LOCK)
    print*, "threadFunc ", start, count
    call unlock@(IO_LOCK)
    count = count - 1
  end do
 end subroutine threadFunc
end threadMod

program Threads
use threadMod
integer hThread(3)
hThread(1) = start_thread@(threadFunc, 6) !Run for 6 seconds
hThread(2) = start_thread@(threadFunc, 3) !Run for 3 seconds
hThread(3) = start_thread@(threadFunc, 7) !Run for 7 seconds
call wait_for_thread@(hThread(2))
call wait_for_thread@(0)
end
Back to top
View user's profile Send private message AIM Address
PaulLaidler
Site Admin


Joined: 21 Feb 2005
Posts: 8210
Location: Salford, UK

PostPosted: Thu Jul 04, 2013 6:28 am    Post subject: Reply with quote

I have edited the above sample program and removed a call to ExitThread that was not needed because the wrapper function in the library makes this call.
Back to top
View user's profile Send private message AIM Address
Display posts from previous:   
Post new topic   Reply to topic    forums.silverfrost.com Forum Index -> General All times are GMT + 1 Hour
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group