Silverfrost Forums

Welcome to our forums

DANGLING FORTRAN POINTER

18 Nov 2017 5:02 #20805

I get an error message, dangling fortran pointer, with a section of code using mixed I*1 and character data. All arrays initialised to zero. They are allocated in the calling routine.

Case (1)

    do i=0,2
     do j=0,windows(id)%npix-1
      do k=0,windows(id)%nl-1

! *** FAILS HERE images1(i,j,k)=char(windows(id)%lut(ichar(images1(i,j,k)),i)) enddo call temporary_yield@ enddo enddo

I have not any pointers.

I have a version of the program, compiled in 2015, that works.

Anyone have an idea of what's wrong?

19 Nov 2017 1:18 #20808

Paul,

There is probably a bug in the compiler related to char or ichar, but to make it easier I would certainly try to clean it up: I would also change the DO loop order, so that memory is processed sequentially. For large images1 arrays this would be significant.

I assume you are transforming the colour palette, based on lut(0:255,0:2) Although the following is more verbose, I am sure it would not have a performance penalty.

  type win_dim
     integer npix
     integer nl
     integer lut(0:255,0:2)  ! colour palette transformation ?
  end type win_dim
  type (win_dim) :: windows(2)

  integer, parameter :: mx = 500
  integer, parameter :: my = 300
  character images1(0:2,0:mx,0:my), ch, cc(0:2)

  integer id, i,j,k, ic,jc
!  
  id = 1
!  
  do k=0,windows(id)%nl-1 
    do j=0,windows(id)%npix-1 
! *** FAILS HERE images1(i,j,k)=char(windows(id)%lut(ichar(images1(i,j,k)),i)) 
      do i=0,2 
        ch = images1(i,j,k)           
        ic = ichar(ch)
        jc = windows(id)%lut(ic,i)
        images1(i,j,k) = char(jc)
      end do 
    end do 
    call temporary_yield@
  end do 

 end

I am wondering if you could change ch to cc(0:2) and lut(0:2,0:255) and replace the inner loop with array syntax, if that would help, although as I am assuming LUT is a transformation lookup array, that may not be easy; perhaps use : call LUT_transform ( images1(:,j,k), id )

20 Nov 2017 9:01 #20813

Paul

Please supply details of the user-defined type in this code so that the fragment can be compiled and run on its own.

20 Nov 2017 2:18 #20817

Dear John, Paul Thanks for answering my query. Yes, LUT is a colour table with 3 bands. I am trying to ersurrect some Fortran code that I had running in 2015; nowit won't work. I had suspected a compiler fault, but I am puzzled by the fact that the section of code that I quoted is now working and an image is written to the screen. Now it fails a few lines later with the dangling pointer error.

I will report back if your suggestions make any difference.

I was under the impression that the order in which arrays are accessed was important (if not fundamental) when data were being transferred from disc to physical memory in the days of virtual memory. Now the physical memory on my PC is 16Gb and there is presumably no penalty in accessing the contents in any order.

Many thanks,

Paul

Quoted from JohnCampbell Paul,

There is probably a bug in the compiler related to char or ichar, but to make it easier I would certainly try to clean it up: I would also change the DO loop order, so that memory is processed sequentially. For large images1 arrays this would be significant.

I assume you are transforming the colour palette, based on lut(0:255,0:2) Although the following is more verbose, I am sure it would not have a performance penalty.

  type win_dim
     integer npix
     integer nl
     integer lut(0:255,0:2)  ! colour palette transformation ?
  end type win_dim
  type (win_dim) :: windows(2)

  integer, parameter :: mx = 500
  integer, parameter :: my = 300
  character images1(0:2,0:mx,0:my), ch, cc(0:2)

  integer id, i,j,k, ic,jc
!  
  id = 1
!  
  do k=0,windows(id)%nl-1 
    do j=0,windows(id)%npix-1 
! *** FAILS HERE images1(i,j,k)=char(windows(id)%lut(ichar(images1(i,j,k)),i)) 
      do i=0,2 
        ch = images1(i,j,k)           
        ic = ichar(ch)
        jc = windows(id)%lut(ic,i)
        images1(i,j,k) = char(jc)
      end do 
    end do 
    call temporary_yield@
  end do 

 end

I am wondering if you could change ch to cc(0:2) and lut(0:2,0:255) and replace the inner loop with array syntax, if that would help, although as I am assuming LUT is a transformation lookup array, that may not be easy; perhaps use : call LUT_transform ( images1(:,j,k), id )

20 Nov 2017 2:34 #20818

Now the physical memory on my PC is 16Gb and there is presumably no penalty in accessing the contents in any order.

There is now a penalty in transferring between memory and cache. These cache transfers are managed in pages (64kbytes ?) so if you are addressing 1 byte, the whole page is transferred. If you step all over memory, there are a lot of 64k page transfers.

do I = 1,l
  do j = 1,m
    do k = 1,n
      byte = g_array(k,j,i)  ! this is good sequential memory access for Fortran
      byte = b_array(I,j,k)  ! this is skipping all over memory
    end do
  end do
end do

You should try testing this for an arrays of say 100mb and time the two different access sequences. real4, allocatable :: b_array(:,:,:) real4, allocatable :: g_array(:,:,:) ! integer4 :: l = 15 integer4 :: m = 3000 integer4 :: n = 3000 integer4 :: i,j,k real4 :: byte real4 :: elapse_sec, t1

    allocate ( g_array(n,m,l) )
    t1 = elapse_sec ()
    do I = 1,l 
      do j = 1,m 
        do k = 1,n 
          g_array(k,j,i) = i+j+k ! this is good sequential memory access for Fortran 
        end do 
      end do 
    end do
    byte = 0
    do I = 1,l 
      do j = 1,m 
        do k = 1,n 
          byte = byte + g_array(k,j,i)  ! this is good sequential memory access for Fortran 
        end do 
      end do 
    end do
    t1 = elapse_sec () - t1
    write (*,*) t1, ' good_array', byte
    deallocate ( g_array )
!
    allocate ( b_array(l,m,n) )
    t1 = elapse_sec ()
    do I = 1,l 
      do j = 1,m 
        do k = 1,n 
          b_array(i,j,k) = i+j+k ! this is skipping all over memory  
        end do 
      end do 
    end do
    byte = 0
    do I = 1,l 
      do j = 1,m 
        do k = 1,n 
          byte = byte + b_array(i,j,k)  ! this is skipping all over memory  
        end do 
      end do 
    end do
    t1 = elapse_sec () - t1
    write (*,*) t1, ' bad_array', byte
    deallocate ( b_array )
!
  end

  real*4 function elapse_sec ()
    integer*4 tick, rate
    call system_clock ( tick, rate )
    elapse_sec = real(tick) / real(rate)
  end function elapse_sec
20 Nov 2017 3:11 #20819

Paul

If you would like us to investigate and fix a potential bug then we would need a sample program from you that demonstrates the failure at runtime. As far as I can see John has not provided such a program - or am I missing something?

20 Nov 2017 10:46 #20821

Paul L, Yes, my first example does not address the compiler problem and the code is not complete, as the arrays are not defined.

Paul M, You may want to consider the example below, where I have tested more options with character arrays, although the results are the same. I was looking at the order of DO sizes, but not much effect. character1, allocatable :: b_array(:,:,:) character1, allocatable :: g_array(:,:,:) ! integer4 :: i,j,k, l,m,n, test, big real4 :: byte real*4 :: elapse_sec, t1

    big = 12000
    do test = 1,3

      select case (test)
        case (1)
          l = big ; m = big ; n = big ; l = 3
        case (2)
          l = big ; m = big ; n = big ; m = 3
        case (3)
          l = big ; m = big ; n = big ; n = 3
      end select

      allocate ( g_array(n,m,l) )
      t1 = elapse_sec ()
      do I = 1,l ; do j = 1,m ; do k = 1,n 
         g_array(k,j,i) = char(i+j+k)        ! this is good sequential memory access for Fortran 
      end do     ; end do     ; end do
  
      byte = 0
      do I = 1,l ; do j = 1,m ; do k = 1,n 
         byte = byte + ichar(g_array(k,j,i))  ! this is good sequential memory access for Fortran 
      end do     ; end do     ; end do
      t1 = elapse_sec () - t1
      write (*,*) t1, ' good_array', byte
      deallocate ( g_array )

      allocate ( b_array(l,m,n) )
      t1 = elapse_sec ()
      do I = 1,l ; do j = 1,m ; do k = 1,n 
         b_array(i,j,k) = char(i+j+k)        ! this is skipping all over memory  
      end do     ; end do     ; end do
  
      byte = 0
      do I = 1,l ; do j = 1,m ; do k = 1,n 
         byte = byte + ichar(b_array(i,j,k))  ! this is skipping all over memory  
      end do     ; end do     ; end do
      t1 = elapse_sec () - t1
      write (*,*) t1, ' bad_array ', byte
      deallocate ( b_array )

    end do
!
  end

  real*4 function elapse_sec ()
    integer*4 tick, rate
      call system_clock ( tick, rate )
      elapse_sec = real(tick) / real(rate)
  end function elapse_sec
21 Nov 2017 8:59 #20823

Quoted from John-Silver I'd read about the need to reverse the intuitive looping order when I first read up about changes between F77 and F90/95.

I am not sure what you mean by 'intuitive'. The idea is to process memory sequentially, so the following is good: do I = 1,l ; do j = 1,m ; do k = 1,n byte = byte + ichar(g_array(k,j,i)) ! this is good sequential memory access for Fortran end do ; end do ; end do

while the following is bad: do I = 1,l ; do j = 1,m ; do k = 1,n byte = byte + ichar(b_array(i,j,k)) ! this is skipping all over memory
end do ; end do ; end do

I suspect you are saying the following is intuitive do I; do j; do k ; array(I,j,k)

.. but it is not; the same principles of sequential memory access applied for virtual memory and now for cached memory. No change of approach from F77 on a mini to F90 on a cached multiprocessor.

Other problems come when using multiple arrays which have different subscript orders. This can often occur when arrays are used for multiple phases of an analysis.

I have just been reading chapter 5 of 'Using OpenMP' by Chapman, Jost & van der Pas, which discusses some of the issues of using a processor with cache. Unfortunately with typically 3 levels of cache their explanation is still a bit simplified.

It is good your testing showed that the speed savings (5x) are comparable to other approaches like multi-threading, where cache to memory transfers become even more of a bottleneck for performance.

John

22 Nov 2017 2:08 #20842

Many thanks for all that interesting stuff. Looks as if I will need to change a large number of multiple do-loops! The differences in time are remarkable.

Re the previous correspondence on the three level do loop that used char and ichar to define the loop parameters: I'll leave that on hold for a while. I now get errors occurring elsewhere in the program to do with window handles. This is all very strange as I wrote the win-handle stuff years ago and lots of students have used it without problems. There must be something that is apparently acting in a random way to generate errors. I will continue digging and will report back when or if I find something.

Paul

Quoted from JohnCampbell

Quoted from John-Silver I'd read about the need to reverse the intuitive looping order when I first read up about changes between F77 and F90/95.

I am not sure what you mean by 'intuitive'. The idea is to process memory sequentially, so the following is good: do I = 1,l ; do j = 1,m ; do k = 1,n byte = byte + ichar(g_array(k,j,i)) ! this is good sequential memory access for Fortran end do ; end do ; end do

while the following is bad: do I = 1,l ; do j = 1,m ; do k = 1,n byte = byte + ichar(b_array(i,j,k)) ! this is skipping all over memory
end do ; end do ; end do

I suspect you are saying the following is intuitive do I; do j; do k ; array(I,j,k)

.. but it is not; the same principles of sequential memory access applied for virtual memory and now for cached memory. No change of approach from F77 on a mini to F90 on a cached multiprocessor.

Other problems come when using multiple arrays which have different subscript orders. This can often occur when arrays are used for multiple phases of an analysis.

I have just been reading chapter 5 of 'Using OpenMP' by Chapman, Jost & van der Pas, which discusses some of the issues of using a processor with cache. Unfortunately with typically 3 levels of cache their explanation is still a bit simplified.

It is good your testing showed that the speed savings (5x) are comparable to other approaches like multi-threading, where cache to memory transfers become even more of a bottleneck for performance.

John

23 Nov 2017 12:28 #20847

Quoted from John-Silver These type of tips would have come in useful running to the limit on 32Mb of memory on an IBM mainframe in early90's !!!!

John, if only you were a bit older. Running a 3-loop your 'intuitive' way on a disk based virtual memory mini could have been the difference between a few seconds and tens of minutes. The delay gave you time to RTFM and re-write the code. I do recall in 1977 when I started an analysis, saw it was going slow, read a text book (no internet!), rewrote the analysis, compiled, ran and got the correct answer (well at least the same); all before the first program had finished. That was on a Pr1me mini. I would not bother looking for that F77 > F90 advice site, as it was probably written by a Cxx expert who did not bother to learn Fortran array index order. The C array convention is to store in the reverse index order.

What is 'intuitive' ?

John

23 Nov 2017 1:38 #20848

I hope future compilers will be able to virtualize this problem so that it will be not important in which order index goes.

23 Nov 2017 2:41 #20849

Dan,

You may hope for loop order optimisation, but, I have seen recent benchmark tests with FORALL that show this is not being achieved; for multiple compilers. When you are using multiple arrays with different subscript order, the best order can be a bit confusing to predict, but with simple loops, like above, it could be achieved.

A good example (I have posted previously) is matrix multiplication, where a change of approach from dot_product to daxpy can reduce memory access delays. Future compilers may be able to provide these solutions. gFortran has certainly improved MATMUL at Ver 7.2.0, but I have not seen documentation of what was done.

John

Please login to reply.