Topic: Matrix sparsitiy pattern in eps format in General

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

30 Jul 2012 7:41 #10525

A while ago I was looking for a matrix spasity viewer outside Matlab. Furthermore, the result in eps format will be a help.

The following example code write the spasity pattern to an eps file - some usefull tipps on eps/ps can be found at: PS-FORMAT

program drspy
    implicit none
    integer,dimension(50,50) :: a
    integer :: i,j,m,n

    m = size(a,1)
    n = size(a,2)
    a = 0
    do j=1,n
        do i=1,j
            a(i,j) = 1;
        enddo
    enddo
    call spy(a,m,n)
end program drspy

subroutine spy(a,m,n)
    implicit none
    integer,dimension(m,n) :: a

    real,parameter :: cell_size=10.0
    real,parameter :: font_size=0.8*cell_size
    real,parameter :: r= 0.35*font_size
    integer,parameter :: io=33

    real :: x_lo,x_hi,y_lo,y_hi
    real :: scale,x,y
    integer :: m,n,i,j
    
    x_lo = 0;
    y_lo = 0;
    x_hi = n*cell_size;
    y_hi = m*cell_size;

    open(io,file='spy.ps');
    write(io,'(A)') '%!PS-Adobe-1.0'
    if (cell_size*real(n) > 595.0) then
        scale = 595.0/(cell_size*real(n));
        write(io,'(A,4I5)') '%%BoundingBox:',0,0,595,595
        write(io,'(2F13.6,A)') 0.0,0.0,'translate'
        write(io,'(2F13.6,A)') scale,scale,'scale'
    else
        write(io,'(A,4F13.6)') '%%BoundingBox',x_lo,y_lo,x_hi,y_hi
    endif
    write(io,'(2F13.6,2x,A)') x_lo,y_lo,'newpath moveto'
    write(io,'(2F13.6,2x,A)') x_hi,y_lo,'lineto'
    write(io,'(2F13.6,2x,A)') x_hi,y_hi,'lineto'
    write(io,'(2F13.6,2x,A)') x_lo,y_hi,'lineto'
    write(io,'(2F13.6,2x,A)') x_lo,y_lo,'lineto'
    write(io,'(A)') 'stroke'
    do j=1,n
        do i=1,m
            if (a(i,j) > 0) then
                x = x_lo + (real(j) - 0.5)*cell_size;
                y = y_hi - (real(i) - 0.5)*cell_size;
                write(io,'(3F13.6,2I5,2x,A)') x,y,r,0,360,'arc closepath'
                write(io,'(A)') 'gsave fill grestore'
                write(io,'(A)') 'stroke'
            endif
        enddo
    enddo
    write(io,'(A)') 'showpage'
    write(io,'(A)') '%%EOF'
    close(io);    
    return
end subroutine spy

[URL=http://imageshack.us/photo/my-images/38/spyfx.jpg/][/URL]

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

13 Aug 2012 11:17 #10604

Bandwidth and profile reduction algorithms are indeed a very interessing topic. Often the memory management is explained. If I compare different algorithms, how can I check the memory required.

Question: Are there some ways that FTN offers which make it possible to 'look' inside the memory - to be honest, I have no idea where to start.

Some useful websites which simplifies the study:

1.) Test vectors by Everstine; 2.) published results in his paper; and 3.) bandwidth and profile reduction algorithm

JohnCampbell

Posts: 2526 Sydney

Back to Top

14 Aug 2012 7:24 #10607

If you are interested in profile reduction algorithms, then I would suggest that for tree based approaches you should look at the method proposed by Scott Sloan. It was proposed in the mid 80's, while I think RCM is a simpler algorithm, from 70's. If you are looking for a simple approach, then RCM is very easy to code, once you have a tree definition of the conectivity. The Sloan algorithm is a two stage approach, which first searches for extreme nodes then develops a node order to minimise the profile.

The key to efficiency is a suitable data structure for defining the connectivity. Stick to simple indexed tables of joints (nodes) and elements (links) and avoid complex tree TYPE definitions. The joint table and element table are two complimentary structures.

I also use a very simple non-tree based alternative approach which works well for space structures: simply sort the nodes in the X, Y or Z direction and test that node order ( X is X + .01 Y + .01 Z; Y is Y + .01X + .01 Z...) It is a good (robust) backup that covers a lot of the failings of tree based approaches, especially when highly connected nodes are involved or large variation in element sizes (local mesh refinement).

The test of optimisation is based on either:

minimising the storage size of the matrix, or
minimising the operations count for reducing the matrix. Both are similar.

The hard part is generating a possible node order, then it is very easy to apply the size test. That is why the sort approach is so good.

John

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

14 Aug 2012 7:37 #10608

John, thanks for the advice. I also have a version by Sloan - he actually uses the same matrices (see Table II) as mentioned in my previous post. This makes it very effective to compare the different approaches.

I think to compare the time will be quite easy. However, how do I compare the actual memory required by the processor?

Do you have a way to visualise the matrices? I use the post script method mentioned above - is very fast and ready to insert in my documents. Matlab offers the spy function. However, this requires an extra script (though easy) and Matlab takes much longer to display the graphics than Ghostview for example.

JohnCampbell

Posts: 2526 Sydney

Back to Top

14 Aug 2012 8:07 #10609

You could draw the matrix, using a %gr window, drawing a vertical line for each equation for the profile. This would scale easier. While the matrix can initially be sparse between the profile and the diagonal, it will usually fill up during the reduction. My field is structural / geotechnical engineering for 3d beam, shell and solid element combinations, with typically about 100k to 200k equations. Other field models might be different. Drawing will show the profile variation. You could draw the profile and the initial non-zeros to see what it looks like. I hope this answers your question, although I am confused by your need to look at memory or 'actual memory required by the processor'. Isn't memory required = the count of numbers inside the profile x bytes per number ? If you are looking to problems that exceed the memory available, then the out of core profile crout solver is fairly robust. See 'Large Capacity Equation Solver for Structural Analysis' by Mondkar and Powell, Computers and Structures 1974 or more recent versions by RL Taylor. This is now all dated but very robust. As you can see I did most of this in late 70's and 80's.

John

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

14 Aug 2012 11:34 #10611

Isn't memory required = the count of numbers inside the profile x bytes per number ?

John, I think I missed the x bytes per number. In the papers they refer to less memory if one uses integer*2 (certainly compiler dependent). If I understand the comments in their paper then for the matrices at the time can indeed save memory. 8 bits → -128 .. 127 16 bits → -32768 .. 32767 32 bits → -2147483648 .. 2147483647

But, today memory has another meaning than in those days 😃

The RCM looks like quite a few lines of code.

JohnCampbell

Posts: 2526 Sydney

Back to Top

14 Aug 2012 12:00 #10612

I'm not sure if I agree with where you are going with this reduced bits approach. For the profile calculation: If you have more than 32768 nodes or elements then you need integer4 for your tables to be able to record the tree structure. If you have a matrix with more than 32768 elements then you need integer4 to be able to calculate the addresses. Further, If you have a matrix with more than 2^{31 elements then you need integer*8 to be able to calculate the addresses. (You need to be able to calculate a bad profile which can be worse than 2}31 elements) I use integer4 arrays for the bandwidth minimiser and integer8 for the matrix size calculation. For the matrix storage: the size of each number in the matrix depends on the precision required. It would have to be a very well behaved set of equations to be able to use real4. I use real8 for all calculations. (I tried real10 once but did not get significantly better results, as I was not able to use real 10 for all element matrices.) I'd be surprised if you could use less than integer4 for the matrix profile definition or real8 for the matrix calculation. The only difference in what memory means now is that the limits are a bit bigger. There is still a ceiling with 32-bit if you don't use out of core, as 2^28 is the maximum size real8 array you can define with ftn95. I currently have a limit at 8gb, as I still use integer4 for the matrix addressing.

John

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

14 Aug 2012 3:15 #10614

We only use 2D field problems (eletrical). Since we have rotating objects and because of our air gap element, we get quite a large initial bandwidth and profile. In a previous thread Ihave uploaded a typical example: profiles

At present I try to break the program up in logical parts, i.e. at present I 'isolate' the profile reduction and pack it into a module.

I am looking forward to finilise the comparison between Gibbs et al and Sloan.

JohnCampbell

Posts: 2526 Sydney

Back to Top

14 Aug 2012 11:52 #10615

You can always find problem styles that are not suited to a particular ordering approach, but in general the Sloan algorithm is more reliable than either RCM or Gibbs, as it includes a weighting approach for adding nodes to the “front”. I would recommend that you also try the sorting approach, as it is very simple to implement. It provides an ordering that is usually reasonable, for very little effort. Simply calculate the weighting for each node, which is X + .01* Y or Y + .01 * X and use an indexed quick sort, such as DSORT@. Then apply your profile test and use this as a reference to be improved. How does the air gap element affect the connectivity? If it is a general element with broad connectivity, you might need to delay the equations associated with this interface to the end, creating a V shaped profile. With profile solvers, it is the average and not the maximum profile which is important. It all depends on how many equations define the air gap linkage between the 2 parts of the model. This is like a substructure approach for the two sections of the model. Some ideas that might work ?

John

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

16 Aug 2012 10:03 #10625

Hi John,

thanks for your advice. You have addressed some aspects which I will consider - especially to use the sorting approach as a reference.

I do this as a sort of side track. For the moment everything is working fine. For me to make improvements I first have to understand the process. For this reason I identified some parts - the renumbering of the nodes is one topic. We have little documentation of the code.

Many online examples are based on structural engineering problems - little is available for electrical machines. Even though the solving process is the same, I find it difficult to work through the examples.

We use for example the direct solver from FEAPpv

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

17 Aug 2012 8:22 #10644

The FEAPpv is based on a well written book on basic FEM.

Working through the papers of Evertine, Gibbs and Sloan one needs to take care of the (exact) definitions of bandwidth, profile and wavefront. Especially the definition of wavefront is still not clear to me. All my references define the same thing in diffrent ways (this is also a comment in the paper by Evertine).

One can define a lower and upper bandwidth for the lower and upper triangle respectively. The wavefront seems to me like the 'row bandwidth' of the upper triangle.

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

19 Aug 2012 9:55 #10652

The little eps-format 'viewer' works fine and saves a lot of time. Using LaTeX I can write the necessary files to include the graphics (including the eps files) without any effort - no additional scripts of programs.

The figure below is automatically generated. This makes it a little bit easier to visualize the results 😄

[URL=http://imageshack.us/photo/my-images/59/reduce.png/][/URL]

JohnCampbell

Posts: 2526 Sydney

Back to Top

20 Aug 2012 1:37 #10653

Thanks for the book pdf. It is one of the main texts for FE. I have the 3rd edition.

The reference to “bandwidth, profile and wavefront” were the three main forms of equation storage for direct equation solution in 70's and 80's. My preference is for profile, which is always better than bandwidth (fixed band) and simpler than wavefront. It is well suited to vector instructions but not easy to program for multi-processor approach. RCM is best suited to fixed bandwidth, while Sloan is not, as the approach can provide a larger maximum bandwidth, but smaller average profile.

I noticed that you also posted 'Invalid dimension with assumed size array' which uses a 2D array which implies fixed bandwidth. You would be much better to store the upper half of the matrix as a 1D vector with a index that points to the diagonal. eg: integer4 matrix_pointer(0:neq) real8 matrix_values(*) where matrix_pointer(ieq) is locational of diagonal in matrix_pointer and matrix_pointer(ieq)-matrix_pointer(ieq-1) is the profile size for equation ieq. You can define these as allocable arrays in a module, defining matrix_pointer when you know the number of equations and matrix_values at the end of the re-ordering. (this assumes that the matrix is symmetric, although could be modified for a non-symmetric matrix that doesn't require re-ordering (partial pivoting) to improve the solution stability.

Your use of the LaTeX graphics implies to me you have a small problem to solve (20 nodes ?) If it is that small, I'd use a sort and not worry about all the code for Sloan. Your previous mesh plots were much bigger problems ?? The graphics of before and after for each approach does help to understand their relative strengths and weaknesses.

John

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

20 Aug 2012 6:35 #10655

Hi John,

thanks for the 1D storage advice. Sloan uses the same storage method in is publication. I was actually considering to implement it - need it anyway to test the algorithm of Sloan. However, Gibbs et al requires a two-dimensional (NDSTK) as explained in their code:

C  EXPLANATION OF INPUT VARIABLES--
C    NDSTK-     CONNECTION TABLE REPRESENTING GRAPH.
C               NDSTK(I,J)=NODE NUMBER OF JTH CONNECTION TO NODE
C               NUMBER I.  A CONNECTION OF A NODE TO ITSELF IS NOT
C               LISTED.  EXTRA POSITIONS MUST HAVE ZERO FILL.
C    NR-        ROW DIMENSION ASSIGNED NDSTK IN CALLING PROGRAM.
C    IOLD(I)-   NUMBERING OF ITH NODE UPON INPUT.
C               IF NO NUMBERING EXISTS THEN IOLD(I)=I.
C    N-         NUMBER OF NODES IN GRAPH (EQUAL TO ORDER OF MATRIX).
C    IDEG-      MAXIMUM DEGREE OF ANY NODE IN THE GRAPH.

The code I use is our complete solver. I have isolated the part where the profile is reduced. Using smaller problems means one can count the entries in the figures (as shown above). But you are correct - the two 'visual' examples I have has 19 and 24 nodes. As a reference I use the same 30 tables as presented in the papers by Sloan and Evertine. This 'test vectors' are really nice to have! I can produce the same results.

The assumed sized arrays are a bit unsatisfactory - the actual code works. In one example the debugger works and in another is does not. We recently bought two 'extra' FTN compilers.

JohnCampbell

Posts: 2526 Sydney

Back to Top

20 Aug 2012 8:18 #10656

Last year (10-11), I experimented with transfering most of my large arrays to allocatable in a module and found this to work very well. Now, with FTN95 Ver 6.3 and Win 7_64, I can compile with /debug and get 3.7gb of available memory, which works very well. I had to map out where the arrays could be allocated and then convert to either using the arrays via USE module or transfering them via subroutine calls. I could send some examples if you require, as I'd recommend the change. It all depends on how you can control the restructure of the program approach.

John

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

22 Aug 2012 9:02 #10666

Hi John,

thanks for your offer! I will come back to that. For the moment my roadmap is only to clean up and document what we have. After that I would really like to improve the code - in fact I already have some ideas where one can improve.

Our solver uses the ACTCOL from Zienkewich. I actually bought a German translation of the third edition to get hold of the Fortran IV code in chapter 24.

In the fifth edition the explanation and example of in the direct solver part is still the same (p. 610). However, I think that the profile storage has change a little bit. In the new version the diagonal elements are stored first and then the rest of the elements in each column. In our version each column is stored as it is, i.e. the diagonal terms are spread through the array.

I have also updated my eps format to plot the elements in the lower triangle blue and the ones in the upper triangle green 😄

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

22 Aug 2012 6:54 #10669

Hi John,

one further remark: initially I thought I can easily just replace GPS with Sloan. The adjacency list is formed in row-major - as required by GPS. As usual is everything catastrophe 😃

Anyway, here is an example with the colors for lower and upper triangle. For the profile solver one actually needs to calculate the column heights. In our program practically the row widths are determined - works due to symmetry. However, is sometimes confusing if you look for something specific and find the opposite.

[URL=http://imageshack.us/photo/my-images/214/reduce.png/][/URL]

JohnCampbell

Posts: 2526 Sydney

Back to Top

23 Aug 2012 12:57 #10671

The data structures you choose for the re-ordering algorithm are very important for the efficiency of the algorithm. The secret to efficiency is to have a number of node and element status arrays, so you don't have to calculate an updated status in inner loops.

My algorithm starts with 5 basic vectors. Based on the element definitions you can readily generate: IELEM(num_elements+1) ! which is the pointer to the first node in a list vector of all nodes connected to each element. LNODE() is this list of nodes connected to each element. Based on these two vectors there is a complimentary list structure for nodes: INODE(num_nodes+1) ! which is the pointer to the first element in a list of all elements connected to each node. LELEM() is this list of elements connected to each node. NFREE(num_nodes) ! which is the number of freedoms(equations) associated with each node. The size of LNODE and LELEM are the same. You can filter these lists to remove:

all nodes with no freedoms
all elements which are not connected to multiple free nodes. This reduces the size of the problem, although you need to keep 2 vectors that point to the original node and element numbers if you compress the numbering to only those active nodes and elements. ( that’s 7 vectors) You also need further vectors for:
the node weightings
the element weightings
the best node order
the next possible node order
the next possible element order
status vectors for each node and element as to how they have progressed through the front.

That’s 14 vectors and I use a few extra as I go along. The good thing is these are all 4-byte vectors and the storage required is insignificant compared to the storage required for the final equation matrix. The use of ALLOCATE works very well with this, as all vectors can be sized before you start. (The size of LNODE can be scanned from the element connectivity info before you start, using IELEM to first store the count of nodes as you proceed, noting active nodes and elements. It can be all Fortran standard conforming, removing all the memory management tricks of F77)

After 30 years, I'm still changing the code from time to time, most recently to be standard conforming F95. Your graphics has got me to thinking that I should place it in a %gr window and draw the order for each iteration as I go through the list of possible starting nodes. It might identify different ways of defining the node and element weights. Generating the starting node list is the other half of this very interesting auto-ordering problem !

John

jjgermis

Posts: 400 Nürnberg, Germany

Back to Top

27 Aug 2012 7:37 #10696

Hi John,

thanks for your ideas. This is very exciting! This motivates me to finalize my 'cleaning and document the code' project. Once everthing is working 'again' I am going to try all this out. With the working copy I can test the differences between changes.

I liked you comment on codes changes 😉 Our air gap code I change all the time - one always find new possibilties in the programming language.

What is your experience on not programming code in colum-major. For example: Fortran is a column-major language. The GPS code is programmed in a rather row-major. For small programms one will note notice the difference. However, for large meshes this could perhaps be an issue. However, I do not know wheter this is the case with current computers - and reprogramming the code just to check is to much effort. Do you have any experiene on this?

JohnCampbell

Posts: 2526 Sydney

Back to Top

28 Aug 2012 12:53 #10698

You asked:

What is your experience on not programming code in colum-major. For example: Fortran is a column-major language. The GPS code is programmed in a rather row-major. For small programms one will note notice the difference. However, for large meshes this could perhaps be an issue. However, I do not know wheter this is the case with current computers - and reprogramming the code just to check is to much effort. Do you have any experiene on this?

I have nearly 40 years experience investigating efficiency of numerical calculations with the compiler available at the time. Unfortunately, as compilers change, the experience becomes obsolete, but thinking in terms of trying to address memory sequentially, (as columns in Fortran) is a good idea, but should not be taken to extremes. Your question could relate to a number of issues about the order of addressing a 2D array or sequential use of vitrual memory ? Using an addressing sequence such as DO i = 1,n DO j = 1,m array(j,i) = end do end do This offers a number of possible benefits, but is definately not essential. Possible advantages to take include: You can replace the inner loop, such as using DOT_PRODUCT or some other array syntax. The sequential addressing of memory improves performance. This might become significant, such as for cacheing to the processor or for large arrays for virtual memory. ( the description 'large' relates to the amount of physical memory installed on the PC ) Consider the multiplication of 2 2D matrices : [C] = [A] x [B] This can be written more 'efficiently' if [A] is stored as [A]' (transpose)

The other main issue to consider is documentation and readibility. If the algorithm on which the calculation is based, is row based, it might make sense to write the code so that it looks like the algorithm. ( or change the algorithm so that it looks column based). Documenting and using more readable names for variables reduces bugs and makes the code easier to support. Coming back to code that is confusing wastes a lot of time.

My approach is to break the calculation up into smaller procedures, using the procedure structure to document the code flow and try to optimise the inner routines. Key inner procedures may consist of only 3 or 4 lines of calculation.

You are correct to identify that column based coding is more suitable for Fortran. The best way to address this is when you are defining the data structure you are using. Over the last 10 years, I have experimented with TYPE data structures. I use these extensively for data tables, but tend to retain (column centric) arrays for matrix calculations.

If your arrays are small, say up to A(100,100), if you go to all the effort of changing the index order, you will probably find no run time improvement. The benefit that will come from reviewing the data structures, will be to improve the way you can address the solution you are trying to achieve and probably solve other related problems.

I hope this mix of ideas helps !

John