|
Tim Davis,
Professor
E301 CSE Building
P.O. Box 116120
University of Florida
Gainesville, FL 32611-6120
phone (352) 392-1481, fax (352) 392-1220
email: my last name AT cise.ufl.edu
|
Experimental setup
This page documents a simple test in MATLAB with sparse symmetric
positive definite matrices from the
UF Sparse Matrix Collection.
These matrices arise in real applications. They are not randomly generated.
There are a total 198 matrices in the collection that are known to be
symmetric positive definite, and two more that might be, as of August 14, 2007.
The results were obtained on a 3.2GHz Pentium 4 with 4GB of memory,
running SUSE Linux.
I compared four different methods for solving Ax=b in MATLAB 7.3 (R2006b):
- x = A\b, the built-in backslash operator,
which uses either
CHOLMOD with just
AMD, or specialized
diagonal or banded solvers.
- x = cholmod2 (A,b), which is CHOLMOD with the best of AMD or
METIS.
- x = pcg (A, b, [ ],[ ], diag(diag(A))), which is the MATLAB PCG
function (preconditioned conjugate gradients), with
b = rand(n,1), default tolerance (1e-6), default max iteration
itmax = min(n,20), and a diagonal preconditioner.
The default starting guess is x = 0.
- x = pcg (A, b, [ ], 200, diag(diag(A))), which is the same except
for itmax=200.
Note that no incomplete factorizations, sparse approximate inverses,
algebraic multigrid, or other powerful preconditioners were used.
This comparison is using PCG as "black box" solver, which is admittedly
unfair to PCG. I used a random right-hand side, which is also unfair to PCG.
The primary point of this comparison is to show that
PCG doesn't make a good black box solver. You really need to use
problem-specific knowledge for iterative methods to work well.
The matrices were tested in increasing size, as determined by the
number of nonzeros in the Cholesky factorization using the AMD ordering.
The tests were terminated early, before the last 6 matrices were
tested. Backslash and CHOLMOD would most likely fail for these 6
(CHOLMOD+METIS might succeed). I'll try to continue the tests for
these 6 matrices. The results below are for 194 matrices.
Below are the M-files used and the raw output.
- testit.m: MATLAB test script.
- diary.txt: raw diary output.
- res.m: processing the results.
- Results2.mat: the raw results in a mat-file.
- other.txt: the 6 matrices not included, all of
which come from large 3D discretizations.
Results:
Below is figure 1 from res.m. The x-axis is the log10 (PCG
time / backslash time), or the log of the relative run time for PCG and
backslash. The y-axis is the log10 (PCG residual / backslash residual), or the
log of the relative residual. Each matrix appears as two dots on the figure,
one in blue and the other in red. The blue dot is PCG with itmax=20, the red
dot is PCG with itmax=200. A cyan line connects the two dots for any one
particular matrix. Thus, if the cyan line is flat, going from 20 to 200
iterations does not improve the result for PCG; it just increases the time (by
typically a factor of 10). Note that there is a cluster of matrices lying on
the x-axis. Backslash and PCG computed the same residual for these matrices
because they are diagonal.
Matrices in the top right quadrant are solved faster and more accurately
by backslash. Matrices in the top left are solved faster by PCG (if it
converges at all), but are solved more accurately by backslash.
Not shown in figure 1 are 4 matrices for which x=A\b failed
(thus, figure 1 shows the results for 190 matrices).
| Matrix
| PCG time, resid
| PCG:200 time, resid
| backslash time, resid
| CHOLMOD+METIS time,resid
|
| GHS_psdef/ldoor
| 30.8 sec, 2e-3
| 287.2 sec, 2e-3
| failed
| failed
|
| ND/nd12k
| 7.8 sec, 9e+1
| 71.7 sec, 9e+1
| failed
| 145.4 sec, 2e-7
|
| GHS_psdef/inline_1
| 24.3 sec, 1e-3
| 223.8 sec, 1e-3
| failed
| failed
|
| Koutsovasilis/F1
| 20.0 sec, 1e-3
| 187.3 sec, 1e-3
| failed
| failed
|
Figure 2 from res.m is shown below. It is the same as figure
1, except that CHOLMOD (with AMD and METIS) replaces backslash.
The figure gives the results for 191 matrices.
Figure 3 from res.m, below, compares backslash with
CHOLMOD+(AMD and METIS) on 190 matrices (excluding the 4 in the table above,
and the 6 that were not tested at all). The x-axis is the flop count for the
Cholesky factorization A(p,p) where p=amd(A). This gives a measure of the
problem size. The y-axis is the log of the relative run time. Backslash can
use a simple diagonal solver, which is faster than CHOLMOD; thus, there is a
cluster of matrices in the bottom left of the figure. To the right of the
figure are matrices for which METIS provides a better ordering than AMD, which
improves the solution time. Note that MATLAB does not include METIS, but
CHOLMOD can use it if it's available.
Conclusion
Essentially all matrices in this test set are solved much more accurately by
backslash (or CHOLMOD+(AMD and METIS)). The two exceptions are matrices that
are so ill-conditioned (HB/plat1919
and Andrews/Andrews) that neither
method gives a decent residual. There are 49 matrices for which PCG (with
itmax=20) is at least twice as fast as backslash, but for most of these, PCG
with itmax=200 does not give a more accurate answer. The solution quality
stagnates (the cyan lines are almost all flat in the top left of Figure 1).
For 163 of the 194 matrices, PCG with itmax=20 returned flag=1, which means
that PCG did not converge to the requested tolerance of 1e-6. With itmax=200,
PCG did not converge for 123 matrices.
Backslash failed for a few matrices because it ran out of memory.
Of course, in that case, PCG is superior to backslash.
Note that nearly all the cyan lines that slope downwards in Figures 1 and 2
(meaning that PCG with 200 iterations gives a more accurate result than 20
iterations) are in the top right of the two figures. Only 3 similarly sloped
lines are in the top left. One can conclude from this the following general
rule of thumb: if PCG fails to converge after 20 iterations, then more
iterations won't help: either (a) the solution quality has stagnated and more
iterations will just waste time (you are in the top left of the figure), or (b)
more iterations may improve the quality but in this case you are better off
using a direct solver (you are in the top right quadrant of the figure).