forums.silverfrost.com

JohnCampbell · Joined: 16 Feb 2006 Posts: 2617 Location: Sydney

Eddie,

I have a recent problem, which has a 2.1gb stiffness matrix, which I run on my dual processor PC. If I monitor it on windows task manager, most of the time it is doing disk I/O shifting in and out blocks of the stiffness matrix, but when they are in memory the task manager shows 50% cpu usage.
If I could access more than 2.1gb, as a single array then most of the I/O delays would go away and if I could access the second processor it would run a lot faster when the problem is in memory. It takes a significant amount of time to transfer 2.1 gb between memory and disk !
Sitting there watching the solver run gives you ample time to consider some pretty obvious what if's.
I did have an assembler dot product years ago, which was soon obsolete. I would not know where to start to approach the problem of driving multiple processors. My existing pc's are dual core while quad cores are next. The 3gb switch is only a small step to 64 bit.
My algorithm is a direct solver that I wrote in the 70's. It was a good algorithm then but now appears obsolete, compared to more modern itterative solvers.
There are lots of areas to improve, but lots of other problems to solve.

regards John

LitusSaxonicum · Posted: Wed Feb 27, 2008 1:16 am Post subject:

John,

Have you considered substructuring? If each substructure runs in core (to use an expression that shows my age!), then the overall run time could be better than doing the whole problem in one go notwithstanding the overhead of setting up the individual problems. (By not swapping out to disk at all). If you can run each substructer as a separate program, you have more chance of exploiting multiple cpus

If you absolutely have to go to hard disk, then do you have a raid array? You can set up Raid 0 for a disk used for swap files without any conscience about lack of security. The other alternative is a solid state "hard disk". However, raid arrays are easy and cheap to set up. In the UK, you can get SATA300 hard disks for about GBP30 each, and so a 2 or 4 disk raid costs very little. (You aren't looking for max capacity). It depends on how many free SATA channels you have. Most motherboards these days have SATA raid support, so the cost is only the hard disks. If you don't have free SATA channels, you can get add-on cards, but they are expensive.

If the raid array isn't the "boot drive", then you can set it up and load drivers without reinstalling Windows.

Eddie

JohnCampbell · Joined: 16 Feb 2006 Posts: 2617 Location: Sydney

Eddie,

Thanks for your thoughts.

There is not much difference between substructuring and partitioning the stiffness matrix in blocks. In the end, both require large amounts of I/O.

I'm not sure about raid arrays. Do these improve the I/O transmition rates?
Certainly the idea of a solid state hard disk has promise. It could be like having 64-bit addressing on a 32-bit system. Unfortunalely I have not seen any or know of the I/O transmition rates they can achieve. I wonder when my memory sticks will become solid state. Not much appears to move in that small volume in my Sony micro valt tiny.

I'll pass on your ideas to our IT support and see if I can get somewhere. It's certainally something to consider.

My emulation of random access files with variable length records uses integer*4 addressing of 4 byte words, so my next file size limit is not far off. I will probably go to an integer*8 address but that's a while off.

Thanks again for your ideas.

regards John

IanLambley · Joined: 17 Dec 2006 Posts: 506 Location: Sunderland

A couple of questions:

1. Are you talking about Choleski decomposition and forward/backward substitution?

2. Does your solver implement a profile method to minimise storage?

3. What is the bandwidth of the stiffness matrix?

Regards

Ian

LitusSaxonicum · Posted: Thu Feb 28, 2008 8:14 pm Post subject:

A reply to John,

Basically a RAID 0 array with 2 disks puts half of each record on disk 1, and half, simultaneously, on disk 2. You'd think that was twice as fast, but it isn't, quite. It is better with SATA disks than PATA, because the former (newer standard) has one controller per disk, and no shared data path, whereas PATA (older standard) has 2 disks per controller and a shared datapath. I always wondered how P parallel (8 bits at a time) could be slower than S serial (one bit at a time), but the speeds of parallel are held low to minimize crosstalk, which isn't a problem with serial. Even if you add in the problem of splitting up the records or recombining them, the 2 disk RAID array still comes out much faster in RAM to single hard disk (or vice versa) transfer rate. You can have multiple hard disks doing this, and they do get faster still, although there are both diminishing returns, a problem of cost, another problem of physical space, yet more problems of heat generation and cooling, and power demand, etc etc. The sweet spot appears to lie between 1 and 4 disks in the array.

Don't get confused with other RAID arrangements which may simply write the same data on two disks at the same time, and the ability to take out / add in disks to the array, which requires data redundancy.

I've never benchmarked it, but I have tried virtually identical computers - one with 2x250 Gb Seagates in a Raid array and one with a single 500 - the Raid array computer significantly faster, for example, loading thumbnails of 200 digital photos to all intents instantly. For what it is worth, the slower machine had a GBP1500 QUadro video card, so that wasn't slowing it down!

Eddie

JohnCampbell · Joined: 16 Feb 2006 Posts: 2617 Location: Sydney

Ian,

I'm showing my age with these answers !
The solver I am using is based on a skyline solver, which I think I first got from a paper by Graeme Powell of UCB in about 1976.
The bandwidth optimiser is based on the similar methods of Hoit and Sloan in about 1982. Neither method works very well for all problems and I am finding my "Campbell" algorithm of also sorting the nodes in the x,y or z direction with a quick sort returns the smallest profile in most cases.
The problems I solve are probably best described as mid-sized finite element problems, as I still generate the models with my own primitive techniques, using a fortran model generator. The latest problem is 150,000 equations, an average profile of 1,800 and a peak profile of 9,900 equations. Back when I developed most of my FE code these problem sizes would have been considered huge, however modern commercial FE packages now produce problems much larger.
I still find a select few problems where my approaches and understanding of the methods still apply.

So the answers to your questions are :

[quote="IanLambley"]A couple of questions:

1. Are you talking about Choleski decomposition and forward/backward substitution? YES

2. Does your solver implement a profile method to minimise storage? YES for both storage and re-ordering

3. What is the bandwidth of the stiffness matrix? Typically as above, the latgest problem I've solved is about twice as large and most now are 1gb plus. All use 64bit. I tried 80bit but that gave no significant improvement for a model where round-off looked a problem.

Regards

John

LitusSaxonicum · Posted: Fri Feb 29, 2008 4:01 pm Post subject:

John,

Now I understand what the problem is. It isn't an odd disk access from time to time, but lots.

If the mainboard in your PC has IDE connectors spare, and the mainboard has RAID support, you can run a RAID 0 array with two old hard disks. Your won't get much of a performance boost, but there will be some. Cost nearly nil. If you have SATA connectors spare, - and again, with mainboard RAID support, you will get a performance boost by running a RAID 0 array. This applies both with SATA I and II hard disks. Cost c. GBP 100. Double or treble if you need a RAID controller card.

Whatever SATA support you have, a fast SATA drive on its own will help. Most modern drives are 7200 rpm. A 10,000 rpm WD Raptor hard drive on its own would give you some improvement: two in a RAID 0 array a lot. They are correspondingly more expensive than 7200 rpm drives - GBP70 for 36 Gb, GBP90 for 74 Gb. (cf GBP 30 for 160 Gb standard drive).Two or more WD Raptor drives in a RAID array will be even faster. My guess here is about half your present tun time, assuming your PC only has a standard PATA or SATA drive.

If you don't have SATA and RAID support, then you can buy a RAID ccontroller card.

To go to 15,000 rpm, you need SCSI. Then you need a SCSI RAID card as well as the SCSI drives. This could get expensive - it's already out of my league. Drives could be GBP500+ each!

I checked prices today with a UK supplier of mail order hardware: www.scan.co.uk and www.aria.co.uk - I imagine real prices are globally equivalent.

Regards

Eddie

IanLambley · Joined: 17 Dec 2006 Posts: 506 Location: Sunderland

John,

Have you tried a wavefront/frontal solver?

Regards

Ian

JohnCampbell · Joined: 16 Feb 2006 Posts: 2617 Location: Sydney

Ian,

I did most of this development work a long time ago. I never liked frontal solvers as I saw them to be more complex and with little computational benefit. There was certainly more benefit to be gained from a good bandwidth or profile optimiser.
For itterative solutions, especially shifted subspace eigen solvers, where there are many load cases to repeadidly solve, I don't think the fromtal solver is as suitable, whereas having a reduced profile stiffness matrix is more suitable.
The other big drawback of frontal solvers was once the "triangle" of active equations could not be stored in memory, it all got a bit slow, whereas with the profile solver, having 2 blocks or 10 blocks to store the active triangle makes little difference; you just cycle through the earlier blocks.
Back in the 70's and 80's when overlays were used and there was only space for limited amounts of code in memory, it was good to have a compact reduction code for a pc.
I think some packages persist with frontal solvers, but to me any of their benefits were very limited. It was more of an advertising gimic to say you used a frontal solver.
There are probably people out there who may disagree, but for the range of beam/shell problems I solved, frontal solvers appeared to provide limited benefit.
I must admit, that seeing what modern packages can do with itterative solvers and large numbers of equations, makes my direct skyline solver show it's age.

Regards John

JohnHorspool · Joined: 26 Sep 2005 Posts: 270 Location: Gloucestershire UK

I thought that these days the major FE systems use a multi-frontal solver when confronted with large problems to solve, as here with Lusas,

see:-

http://www.lusas.org/products/options/fast_solvers.html

It was my understanding that both Abaqus and MSC/Nastran also use this solver technology (though I may be wrong), the output from these packages makes no mention of iterations when solving large tet meshes linearly.

LitusSaxonicum · Posted: Sun Mar 02, 2008 11:26 pm Post subject:

Hi John H,

There's no doubt that the right algorithm helps enormously. Depending of how much of a software developer you are, and what resources you can call on, it may be worth reprogramming to make old code work better. Me, I'm an academic and I am on a 2-year mission to make my programs (which work fine in DOS!) into a Windows application. Speed isn't an issue for me - I was getting 3 hour run times on a mainfram in 1973, now I can't even time it with a stopwatch!

Sometimes, one is stuck with the Fortran code one has (and the compiler one uses, and, I suppose, the machine architecture). Then, the problem is what can be done that takes little time, little money and produces the most benefit for the least of both. I'm not much of a fan of getting a faster cpu. Since I build my own computers, it almost always means a new mainboard and RAM, sometimes a videocard and other odds and ends. If there is a faster cpu for the rig one has, then it is usually a bit disappointing how little improvement you get.

Dual or multiple core cpus only speed up multithreaded applications.

The cheapest options seemed to me to be in the arena of speeding up a simple routine which is called zillions of times, or getting improved hard disk performance - given what John C told us about his problem. On reflection, the RAID array of fast disks seems the best and cheapest option here .... so much so that I think I'll do it for myself!

Regards

Eddie

JohnHorspool · Joined: 26 Sep 2005 Posts: 270 Location: Gloucestershire UK

Hi Eddie and John C,

I know that this thread is really about solvers. But it may be of interest that a post processor which I originally wrote in the early 80's on a VAX with tektronix displays is still used today by myself and work colleagues, compiled with FTN95 and running on 32bit windows machines these days. We find that having access to a Linux 64bit machine with eight processors and 32 GB ram for solving, the limiting factor is the size of model that the PCs can handle. We have two very well known FE solvers on the Linux machine, both of which come with their own pre and post processors running on PCs. We don't use their pre-processors to generate the models. Similarly we don't use their post-procecessors either ! The simple fact is, that for the larger models the commercial PC based programs both crash with memory exhaustion problems, yet my post-processor compiled with FTN95 runs just fine !

This to me more than justifies the requirement for writing your own code and confirms that FTN95 produces very capable programs.

JohnCampbell · Joined: 16 Feb 2006 Posts: 2617 Location: Sydney

John H,

Like you, I've always pursued writing my own code for FE and logistics simulation. There are always niches where this approach works better.

With regard to the Multi-Front solvers, I would expect this relates more to a multi-front reordering, which suits the spoked wheel style of problem.
I once worked with a boundary element package, which had 4 iterative solvers and for this and I expect most 3D solid problems, the iterative solver appears more suited.
I did most of my development work on solvers in 70�s and 80�s so there is a lot of new developments I don�t know about, although I�m not aware of any significant new direction in solving large sets of linear equations. I�ve never forgotten the post-grad who tried to sell everyone iterative solvers and blew the department�s mainframe budget on a problem that never stopped and had the time limit turned off.

We were recently using ANSYS for the analysis of a rail track support / acoustic isolator made of steel, rubber and HDPE. The vastly different stiffness moduli do not appear to suit an iterative solver.

I see that CSI-SAP2000 now has a "sapfire" solver. With the increasing commercialization of these solvers, it's difficult to know what works best.

Given the large range of projects we have to do, it is hare to find the time to pursue any one to the forefront of today's technology.

Certainly the direct solution Choleski profile solver, relies on two main vector functions, the dot product: A(i) = A.B and vector subtraction: A = A - factor x B, both of which are well suited to parallel or mult-processor optimization. I can�t see an easy way of achieving that with fortran. The dot_product , if limited to same kind vectors would be a clearly defined procedure that could use the new multiple �core�. There should be an API for this !

regards John C