Topic: Thread Pool API in General

jalih

Posts: 192

Back to Top

15 Jun 2014 3:05 #14204

I wrote a simple DLL with FTN95 callable wrapper functions for using the Thread Pool API.

This should simplify application multithreading code. I will post the DLL with source code as soon as I have written some example code.

jalih

Posts: 192

Back to Top

15 Aug 2014 4:47 #14447

Test project for thread pool wrapper available here.

tp.mba file contains the minibasic source code for the wrapper DLL and might be useful when figuring out the function parameters.

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

16 Aug 2014 7:18 #14450

Jalih, looks like you have upgraded to 8 cores/threaded processor PC:-) Please compare how this scales with number of threads from 1 to 8, I'm away from PC and though can control it and run everything from the phone but still this is not convenient job (need VR glasses for that and virtual keyboard+mouse probably)

jalih

Posts: 192

Back to Top

18 Aug 2014 9:33 #14452

Quoted from DanRRight Jalih, looks like you have upgraded to 8 cores/threaded processor PC:-)

I wish that would be the case, but no I am still using my six years old PC.

Please compare how this scales with number of threads from 1 to 8.

I have not done any timings. Accurate timing of multithreaded code has it's difficulties. I only tried with Clock@() function and that resulted erroneus result and program hang.

With correct use, I expect thread pool to perfrom quite well. It can reduce overhead a lot and makes managing multithreaded code easy. You can use default application thread pool for simple work, so you only have to create and submit workitems, wait for work callbacks to complete and close workitems. Also process can make use of multiple thread pools.

jalih

Posts: 192

Back to Top

18 Aug 2014 2:44 #14453

Another thread pool sample.

The sample above is a ClearWin+ application that uses application default thread pool to run update and drawing code inside separate thread. This could be used as a simple game template.

JohnCampbell

Posts: 2526 Sydney

Back to Top

19 Aug 2014 5:09 #14456

Jalih,

SYSTEM_CLOCK is based on QueryPerformanceCounter, which is a good real time or elapsed timer. While there can be errors in the timers between different CPU's, the error is typically less than the accuracy of the timer call.

Lately, I have been testing gFortran for OMP programming, before venturing into Clearwin_64. This manages all the stack issues and the replication of private variables. One of the disadvantages of OMP is the overhead of the multiple thread initialisation. A suitable OMP code must have sufficient computation for each thread to make them effective. I have also found that memory access speed and utilisation of cache are a significant influence on performance. The general rule for multi-loop code structures is to use OMP on the outer loop and vector instructions on the inner loop. A single simple DO loop, such as Dot_Product below is not suitable for OMP coding, due to the thread overheads.

!  Simple Dot_Product using !$OMP
!   (not recommended, due to OMP overheads)
!
!$OMP  PARALLEL DO PRIVATE (i), SHARED (a,b,n),   &
!$OMP& REDUCTION(+ : s)
      s = 0
      do i = 1,n
         s = s + a(i) * b(i)
      end do
!$OMP END PARALLEL DO

What is the advantage of the multi-threading approach you are describing ? Does it's simpler thread structure have reduced thread overhead to overcome some of these OMP issues ?

John

DanRRight

Posts: 2877 South Pole, Antarctica

Back to Top

20 Aug 2014 3:23 (Edited: 20 Aug 2014 9:01) #14461

We have

FTN95 for NET parallelization https://forums.silverfrost.com/Forum/Topic/2239&start=15
Two parallel designs by Jalih
Paul's parallel design
John's approach using OpenMP but that's on different compiler

Which design is most efficient when scaling to multiple independent threads ? The example for NET above uses long run independent threads, the overhead does not matter there. By unknown reason it produced the best parallel scaling, way better then we could expect. Specifically on 4core 4770k with 8 threads it gives acceleration closer to 8 then 4, while other methods give 3-4

jalih

Posts: 192

Back to Top

20 Aug 2014 8:53 #14463

Quoted from JohnCampbell What is the advantage of the multi-threading approach you are describing ? Does it's simpler thread structure have reduced thread overhead to overcome some of these OMP issues ?

Probably the most OpenMP implementations use thread pools. I think the idea is to minimize overhead by not creating and destroying threads for each parallel region. Pool of workers is created at the first parallel region and these threads exist for the duration of program execution. The threads are not destroyed untill the last paraller region is executed.

Basically threads in the pool are queued and wait for work to become available. After thread has processed work, it then returns to the queue to get more work.