These reports are also available via at the CISE tech repoort site.
Enjoy.
The LDL software package is a set of short, concise routines for factorizing symmetric positive-definite sparse matrices, with some applicability to symmetric indefinite matrices. Its primary purpose is to illustrate much of the basic theory of sparse matrix algorithms in as concise a code as possible, including an elegant new method of sparse symmetric factorization that computes the factorization row-by-row but stores it column-by-column. The entire symbolic and numeric factorization consists of a total of only 53 lines of code. The package is written in C, and includes a MATLAB interface.
Given a sparse, symmetric positive definite matrix C and an associated sparse Cholesky factorization LDL' we develop sparse techniques for updating the factorization after a symmetric modification of a row and column of C. We show how the modification in the Cholesky factorization associated with this rank two modification of C can be computed efficiently using a sparse rank one technique developed in an earlier paper [SIAM J. Matrix Anal. Appl., 20 (1999), pp. 606-627]. We also determine how the solution of a linear system Lx=b changes after changing a row and column of C or after a rank-r change in C.
We present a factorization-based implementation of the LP Dual Active Set Algorithm. This implementation exploits a proximal approximation to the dual function, a multilevel solution strategy, and recently developed algorithms for updating a sparse Cholesky factorization after a small rank change. We compare solution times to those of simplex and barrier algorithms, as implemented in CPLEX, using the Netlib LP Test Problems.
A new method for sparse LU factorization is presented that combines a column pre-ordering strategy with a right-looking unsymmetric-pattern multifrontal numerical factorization. The column ordering is selected to give a good a priori upper bound on fill-in and then refined during numerical factorization (while preserving the bound). Pivot rows are selected to maintain numerical stability and to preserve sparsity. The method analyzes the matrix and automatically selects one of three pre-ordering and pivoting strategies. The number of nonzeros in the LU factors computed by the method is typically less than or equal to those found by a wide range of unsymmetric sparse LU factorization methods, including left-looking methods and prior multifrontal methods.
An ANSI C code for sparse LU factorization is presented that combines a column pre-ordering strategy with a right-looking unsymmetric-pattern multifrontal numerical factorization. The pre-ordering and symbolic analysis phase computes an upper bound on fill-in, work, and memory usage during the subsequent numerical factorization. User-callable routines are provided for ordering and analyzing a sparse matrix, computing the numerical factorization, solving a system with the LU factors, transposing and permuting a sparse matrix, and converting between sparse matrix representations. The simple user interface shields the user from the details of the complex sparse factorization data structures by returning simple handles to opaque objects. Additional user-callable routines are provided for printing and extracting the contents of these opaque objects. An even simpler way to use the package is through its MATLAB interface. UMFPACK is incorporated as a built-in operator in MATLAB 6.5 (backslash) when {\tt A} is sparse and unsymmetric.
UMFPACK is a set of routines for solving unsymmetric sparse linear systems, Ax=b, using the Unsymmetric MultiFrontal method and direct sparse LU factorization. It is written in ANSI/ISO C, with a MATLAB interface. UMFPACK relies on the Level-3 Basic Linear Algebra Subprograms (dense matrix multiply) for its performance. This code works on Windows and many versions of Unix (Sun Solaris, Red Hat Linux, IBM AIX, SGI IRIX, and Compaq Alpha).
UMFPACK is a set of routines for solving unsymmetric sparse linear systems, Ax=b, using the Unsymmetric-pattern MultiFrontal method and direct sparse LU factorization. It is written in ANSI/ISO C, with a MATLAB interface. UMFPACK relies on the Level-3 Basic Linear Algebra Subprograms (dense matrix multiply) for its performance. This code works on Windows and many versions of Unix (Sun Solaris, Red Hat Linux, IBM AIX, SGI IRIX, and Compaq Alpha). This is a ``quick start'' guide for Unix users of the C interface.
AMD is a set of routines for permuting sparse matrices prior to numerical factorization, using the approximate minimum degree ordering algorithm. It is written in both C and Fortran 77. A MATLAB interface is included.
AMD is a set of routines for permuting sparse matrices prior to factorization, using the approximate minimum degree ordering algorithm. It is written in both C and FORTRAN, with a MATLAB interface.
In a seminal paper (An efficient heuristic procedure for partitioning graphs, Bell System Technical Journal, vol 49, pp. 291-307, 1970), Kernighan and Lin propose a pair exchange algorithm for approximating the solution to min-cut graph partitioning problems. In their algorithm, a vertex from one set in the current partition is exchanged with a vertex in the other set to reduce the sum of the weights of the cut edges. The exchanges continue until the total weight of the cut edges is no longer reduced. In this paper, we consider a block exchange algorithm in which a group of vertices from one set is exchanged with a group of vertices from the other set in order to minimize the sum of the weights of cut edges. An optimal choice for the exchanged vertices is the solution to a quadratic programming problem. In Approximation and Complexity in Numerical Optimization: Continuous and Discrete Problems, P. M. Pardalos, Editor, 1999.
A talk presented at the October '99 Workshop on Graph Partitioning, Army High Performance Computing Research Center, Minneapolis, MN
A continuous quadratic programming formulations are given for min-cut graph partitioning problems. An optimal solution is related to an eigenvector (Fiedler vector) corresponding to the second smallest eigenvalue of the graph's Laplacian. Necessary and sufficient conditions characterizing local minima of the quadratic program are given. A generalization of the Kernighan and Lin exchange algorithm to a block exchange algorithm can be used to escape from a local minimum of the continuous quadratic program. Comparisons to the optimal cut obtained by other approaches to graph partitioning are presented.
Related papers:
Given a sparse symmetric positive definite matrix AA' and an associated sparse Cholesky factorization LDL' or LL', we develop sparse techniques for updating the factorization after either adding a collection of columns to A or deleting a collection of columns from A. Our techniques are based on an analysis and manipulation of the underlying graph structure, using the framework developed in an earlier paper on rank one modifications [SIAM J. Matrix Anal. Appl., 20 (1999), pp. 606-627]. Computationally, the multiple rank update has better memory traffic and executes much faster than an equivalent series of rank one updates since the multiple rank update makes one pass through L computing the new entries, while a series of rank one updates requires multiple passes through L. (a revised version appears in the SIAM Journal on Matrix Analysis and Applications).
Cell matrices in Matlab can be used to simulate simple parallel matrix algorithms for distributed memory parallel computers. Each process in the k-by-k grid of processors owns a single entry in a k-by-k cell array. One entry in cell array can be a scalar, a matrix, another cell array, or any other data type supported by Matlab. Messages are also passed using cell arrays. This paper describes Matlab's cell arrays and how they can be used to simulate the distribution of matrices and the sending of messages on a parallel computer. A complete 2-processor parallel matrix multiply algorithm is described. This paper was written in order to introduce parallel computing concepts into the course COT 4501 (Numerical Analysis - A Computational Approach).
This work was supported by the National Science Foundation, grant #9634470.
Block matrices and block matrix operations are essential for high performance computing. Using these techniques, applications can take advantage of architectural features such as cache, virtual memory, parallelism, and vector and other special instructions, that are present in most modern computers, from the latest PC's to the most powerful supercomputers. As a result, block matrix operations and the applications that rely on them (including scientific simulations, graphics, and multimedia applications) can often reach close to the theoretical maximum performance on any computer. This document, written for the course COT 4501 (Numerical Analysis - A Computational Approach) explains how these techniques work.
This work was supported by the National Science Foundation, grant #9634470.
An approximate minimum degree column ordering algorithm (COLAMD) for preordering an unsymmetric sparse matrix A prior to numerical factorization is presented. Using the quotient graph representation of A transpose times A and a tight and fast approximate degree, this algorithm is a good alternative to using symmetric codes directly on the explicit representation of A transpose times A. Additionally, COLAMD is shown to be superior to COLMMD, the column minimum degree ordering code in MATLAB.
(an MS thesis written under Tim Davis' direction).
Given a sparse symmetric positive definite matrix AA' and an associated sparse Cholesky factorization LDL' or LL', we develop sparse techniques for obtaining the new factorization associated with either adding a column to A or deleting a column from A. Our techniques are based on an analysis and manipulation of the underlying graph structure and on ideas of Gill, Golub, Murray, and Saunders for modifying a dense Cholesky factorization. We show that our methods extend to the general case where an arbitrary sparse symmetric positive definite matrix is modified. Our methods are optimal in the sense that they take time proportional to the number of nonzero entries in L and D that change. Revision of TR-97-003. SIAM Journal of Matrix Analysis and Applications, vol. 20, no. 3, pp. 606-627, 1999.
A set of slides for a talk presented at the 1997 SIAM Applied Linear Algebra meeting. The results are in a second postscript file.
We discuss the organization of frontal matrices in multifrontal methods for the solution of large sparse sets of unsymmetric linear equations. In the multifrontal method, work on a frontal matrix can be suspended, the frontal matrix can be stored for later reuse, and a new frontal matrix can be generated. There are thus several frontal matrices stored during the factorization and one or more or these are assembled (summed) when creating a new frontal matrix. Although this means that arbitrary sparsity patterns can be handled efficiently, extra work is required to sum the frontal matrices together and can be costly because indirect addressing is required. The (uni-)frontal method avoids this extra work by factorizing the matrix with a single frontal matrix. Rows and columns are added to the frontal matrix, and pivot rows and columns are removed. Data movement is simpler, but higher fill-in can result if the matrix cannot be permuted into a variable-band form with small profile. We consider a combined unifrontal/multifrontal algorithm to enable general fill-in reduction orderings to be applied without the data movement of previous multifrontal approaches. We discuss this technique in the context of a code designed for the solution of sparse systems with unsymmetric pattern.
This TR also appears as RAL TR-97-046, Rutherford Appleton Laboratory. It is a revision of CISE TR-95-020. This report describes UMFPACK Version 2.2.
ACM Transactions on Mathematical Software, vol. 25, no. 1, pp. 1-19, 1999.
Given a sparse symmetric positive definite matrix AA' and an associated sparse Cholesky factorization LL', we develop sparse techniques for obtaining the new factorization associated with either adding a column to A or deleting a column from A. Our techniques are based on the analysis and manipulation of the underlying graph structure and on ideas of Gill, Golub, Murray, and Saunders for modifying a dense Cholesky factorization. Our algorithm involves a new sparse matrix concept, the multiplicity of an entry in L. The multiplicity is essentially a measure of the number of times an entry is modified during symbolic factorization. We also indicate how our methods extend to the general case, for a sparse symmetric positive definite matrix A and its Cholesky factorization. Our methods are optimal in the sense that they take time proportional to the number of nonzero entries in L that change.
A short 75-word abstract for the SIAM Linear Algebra Conference (1997) and extended abstract are available, as are copies of the slides of the talk.
A revised version (TR-97-025) appears in SIAM J. on Matrix Analysis and Applications, vol. 20 (1999), pp. 606-627.
A Master's thesis. Includes MATLAB implementations of the Revised Simplex method, the Bartels-Golub method, the sparse Bartels-Golub method, the Forrest-Tomlin method, and Reid's method.
We present the symmetric inverse multifrontal method for computing the sparse inverse subset of symmetric matrices. The symmetric inverse multifrontal approach uses an equation presented by Takahashi, Fagan, and Chin to compute the numerical values of the elements of the inverse, and an inverted form of the symmetric multifrontal method of Duff and Reid to guide the computation. We take advantage of related structures that allow the use of dense matrix kernels (levels 2 & 3 BLAS) in the computation of this subset. We discuss the theoretical basis for this new algorithm and give numerical results for a serial implementation and demonstrate its performance on a Cray-C98.
The sparse inverse subset problem is the computation of the entries of the inverse of a sparse matrix for which the corresponding entry is nonzero in the factors of the matrix. We present a parallel, block-partitioned formulation of the inverse multifrontal algorithm to compute the sparse inverse subset. Numerical results for an implementation of this algorithm on an 8-processor, shared-memory CRAY-C98 architecture are discussed. We show that for large problems we obtain efficiency ratings of over 80% and performance in excess of 1 Gflop.
We present a new class of preconditioners based on an unsymmetric multifrontal incomplete LU factorization algorithm. In this algorithm entire update rows and columns of the frontal matrices are dropped based on a level drop strategy. We discuss some of the successes and difficulties inherent in applying this drop strategy. In general, we found the memory usage of this incomplete multifrontal factorization algorithm to be smaller than its complete multifrontal counterpart. We present numerical results to show the quality of the factors when used as preconditioners with the conjugate gradient squared iterative method.
(PhD thesis in technical report format. Advisor: Timothy A. Davis). In this dissertation we discuss the computation of sparse inverse subsets and the construction of preconditioners based on the incomplete LU factorization of general nonsingular nonsymmetric matrices. The first part of the dissertation focuses on the sparse inverse subsets problem. Here we propose a new method, the inverse multifrontal method, that provides the theoretical foundation for a suite of algorithms that can efficiently find inverse subsets of a general sparse matrix. This new method uses the Takahashi equations and many notions from the multifrontal LU factorization method. Numerical results from runs on the CRAY-C98 will show its effectiveness in both the sequential and parallel environmemts. The research on incomplete LU factorization forms the second part of the dissertation. Here we discuss a new class of preconditioners based on an unsymmetric multifrontal incomplete LU (ILU) factorization algorithm. We outline the algorithm and analyse the quality of the ILU factors when used as preconditioners in the conjugate gradient squared iterative method.
We discuss the organization of frontal matrices in multifrontal methods for the solution of large sparse sets of linear equations. In the multifrontal method, several frontal matrices are used. Each is used for one or more pivot steps, and the resulting Schur complement is summed with other Schur complements to generate another frontal matrix. Although this means that arbitrary sparsity patterns can be handled efficiently, extra work is required to add the Schur complements together and can be costly because indirect addressing is required. The (uni-)frontal method avoids this extra work by factorizing the matrix with a single frontal matrix. Rows and columns are added to the frontal matrix, and pivot rows and columns are removed. Data movement is simpler, but higher fill-in can result if the matrix cannot be permuted into a variable-band form with small profile. We consider a combined unifrontal/multifrontal algorithm to enable general fill-in reduction orderings to be applied without the data movement of previous multifrontal approaches.
NOTE: This paper discusses UMFPACK Version 2.0. You may also obtain a copy by sending email to netlib@ornl.gov with the message "send index from linalg". It is available for non-commerical use ONLY. For commercial use, the UMFPACK Version 2.0 algorithm appears as MA38 in the Release 12 of the Harwell Subroutine Library (intended release date of January 1996).
See TR-97-016 for a revision, which appears in ACM Transactions on Mathematical Software, vol. 25, no. 1, pp. 1-19, 1999.
Multifrontal matrix factorization methods used for solving large, sparse systems of linear equations decompose sparse matrices into overlapping dense submatrices which can be represented by vertices with relationships between submatrices shown via various types of edges. This paper describes the use of graph theory in a new parallel, distributed memory multifrontal method for the LU factorization of sequences of matrices with an identical, unsymmetric pattern. The directed acyclic graphs formed by these vertices and the various edge sets are used to structure the computations, schedule the parallel factorization, and provide a robust capability to dynamically change the pivot ordering to maintain numerical stability. Pivot reordering determines necessary permutations based on a path analysis of two component edge sets. The path properties represented by these edge sets define the impacts of these permutations on the structures of the submatrices and the number of nonzeros in the matrix factors. Transitive reductions of these edge sets provide the communications paths needed for parallel implementation.
The Unsymmetric-pattern MultiFrontal Package (UMFPACK, version 1.1) solves Ax=b using LU factorization, where A is a general unsymmetric sparse matrix. The method relies on dense matrix kernels (the BLAS) to factorize rectangular frontal matrices, which are dense submatrices of the sparse matrix being factorized. Includes both double and single precision ANSI Fortran 77 versions. Requires the BLAS, and two subroutines from the Harwell MA28 code (available from Netlib). Some licensing restrictions apply.
NOTE: This version is no longer available. Use UMFPACK Version 2.2 instead. You may also obtain a copy by sending email to netlib@ornl.gov with the message "send index from linalg". FOR NON-COMMERCIAL USE ONLY.
An Approximate Minimum Degree ordering algorithm (AMD) for preordering a symmetric sparse matrix prior to numerical factorization is presented. We use techniques based on the quotient graph for matrix factorization that allow us to obtain computationally cheap bounds for the minimum degree. We show that these bounds are often equal to the actual degree. The resulting algorithm is typically much faster than previous minimum degree ordering algorithms, and produces results that are comparable in quality with the best orderings from other minimum degree algorithms.
AMD appears as MC47 in Release 12 of the Harwell Subroutine Library. With some licensing restrictions, 6 variants of AMD (not AMD itself) appear in Netlib.
Appears in SIAM J. Matrix Analysis and Applications, vol. 17, no. 4, pp. 886-905, 1996.
For the first usage of the approximate minimum degree (in an unsymmetric context) see TR-92-014, below.
Sparse matrix factorization algorithms for general problems are typically characterized by irregular memory access patterns that limit their performance on parallel-vector supercomputers. For symmetric problems, methods such as the multifrontal method avoid indirect addressing in the innermost loops by using dense matrix kernels. However, no efficient LU factorization algorithm based primarily on dense matrix kernels exists for matrices whose pattern is very unsymmetric. We address this deficiency and present a new unsymmetric-pattern multifrontal method based on dense matrix kernels. As in the classical multifrontal method, advantage is taken of repetitive structure in the matrix by factorizing more than one pivot in each frontal matrix thus enabling the use of Level 2 and Level 3 BLAS. The performance is compared with the classical multifrontal method and other unsymmetric solvers on a CRAY YMP. (Revised version of TR-93-018).
Appears in SIAM J. Matrix Analysis and Applications, vol. 18, no. 1, pp. 140-158, 1996.
In many large scale scientific and engineering computations, the solution to a sparse linear system is required. We present a partial unsymmetric nested dissection method that can be used to parallelize any general unsymmetric sparse matrix algorithm whose pivot search can be restricted to a subset of rows and columns in the active submatrix. The submatrix is determined by the partial unsymmetric dissection. We present results of this approach, applied to the unsymmetric-pattern multifrontal method. (Presented at the 7th SIAM Conf. on Parallel Processing for Scientific Computing, Feb. 1995, San Francisco, CA).
The unsymmetric-pattern multifrontal method by Davis and Duff relaxes the assumption of a symmetric pattern matrix and has considerable performance advantages for the factorization of sequences of identically structured sparse matrices with unsymmetric patterns. However, when anticipated pivots become no longer numerically acceptable, standard multifrontal recovery techniques are inadequate. This paper develops a robust lost pivot recovery capability for the unsymmetric pattern multifrontal method. The recovery capability is built into a distributed memory, parallel implementation of the method and both its sequential and parallel performance are evaluated.
The sparse LU factorization algorithm by Davis and Duff is the first multifrontal method that relaxes the assumption of a symmetric pattern matrix. While the algorithm offers significant performance advantages for unsymmetric pattern matrices, the underlying computational structure changes from a tree (or forest) to a directed acyclic graph. This paper discusses some key issues in the parallel implementation of the unsymmetric-pattern multifrontal method when the pivot sequence is known prior to factorization. The algorithm was implemented on the nCUBE 2 distributed memory multiprocessor and achieved performance is reported.
The unsymmetric-pattern multifrontal method of Davis and Duff generalizes earlier multifrontal approaches to LU factorization by removing the assumption of a symmetric-pattern of nonzeros in the sparse matrix. As a result, the underlying computational structure becomes a directed acyclic graph (DAG) instead of a tree. This research explores the potential parallelism available in the unsymmetric-pattern multifrontal method using both unbounded and bounded parallelism models based on this DAG. The bounded parallelism model is extended to reflect the performance characteristics of the nCUBE 2 distributed memory multiprocessor to investigate the achievable parallelism. Finally, a factorization-only version of the method is implemented on the nCUBE 2 and its achieved parallelism evaluated.
(PhD thesis in technical report format. Advisor: Timothy A. Davis)
This research effort studies the parallel LU factorization of sequences of identically structured sparse matrices using the unsymmetric-pattern, multifrontal method of Davis and Duff. Computational burden is reduced by using the computational structure (called an assembly directed acyclic graph: DAG) that results from analysis of the first matrix for the numerical factorization of the subsequent matrices. Execution time is reduced via exploitation of parallelism in the assembly DAG. The theoretical parallelism in the assembly DAG is investigated using both analysis and simulation. Achievable parallelism is evaluated using a simulation model based on the nCUBE 2 distributed memory multiprocessor.
A fixed pivot order parallel LU factorization routine is implemented on the nCUBE 2 and evaluated. Key subissues within this implementation are task scheduling, subcube allocation, subcube assignment, and data assignments to reduce communications and overall execution time. The resulting implementation is shown to be competitive with conventional techniques and demonstrates significant parallel performance with excellent scalability.
Of greatest significance is the theoretical development and implementation of a lost pivot recovery capability for the unsymmetric-pattern, multifrontal method. The capability incorporates a lost pivot avoidance strategy with both inter- and intra-frontal matrix pivot recovery mechanisms. The inter-frontal matrix pivot recovery mechanisms migrate lost pivots to subsequent frontal matrices and accommodate corresponding side effects using relationships represented in the assembly DAG. Intra-frontal matrix pivot recoveries are done via a pipelined partial dense factorization kernel that handles unsymmetric permutations across processors. Evaluation of the lost pivot recovery capability using three matrix sequences from chemical engineering problems shows that significant execution time savings are afforded when executing on either a single processor or multiple processors.
Proceedings of the Fifth SIAM Conference on Applied Linear Algebra, Snowbird, Utah, June 15-18, 1994.
Proceedings of the Fifth SIAM Conference on Applied Linear Algebra, Snowbird, Utah, June 15-18, 1994.
In the multifrontal method, each frontal matrix must hold all of its pivot rows and columns at one time. Moving data between frontal matrices is costly, but the method can handle arbitrary fill-reducing orderings. In the unifrontal method, pivot rows and columns are shifted out of the frontal matrix as the factorization proceeds. Data movement is simpler, but higher fill-in can result. We consider a hybrid unifrontal/multifrontal algorithm. Fill-reducing orderings can still be applied, but data movement is reduced by allowing pivot rows and columns to be shifted into and out of each frontal matrix (just as in the unifrontal method). Performance results on a Cray YMP supercomputer are presented.
Proceedings of the Fifth International Symposium on Process Systems Engineering, Kyongju, Korea, May 30 - June 3, 1994.
A critical computational step in large-scale process simulation using rigorous equation-based models is the solution of a sparse linear equation system. Traditional sparse solvers based on indirect addressing are not effective on supercomputers because they do not vectorize well. By relying on vectorized dense matrix kernels, the multifrontal and frontal methods provide much better performance, as demonstrated using several examples. On problems with good initial matrix orderings the frontal method is most effective, while without good initial ordering the multifrontal method is attractive.
(NOTE: See TR-95-004 for a more recent version; this guide is for UMFPACK Version 1.0.)
The Unsymmetric-pattern MultiFrontal Package (UMFPACK) solves Ax=b using LU factorization, where A is a general unsymmetric sparse matrix. The method relies on dense matrix kernels (the BLAS) to factorize rectangular frontal matrices, which are dense submatrices of the sparse matrix being factorized. Includes both double and single precision ANSI Fortran 77 versions. Requires the BLAS, and two subroutines from the Harwell MA28 code (available from Netlib). UMFPACK is available via anonymous ftp to ftp.cis.ufl.edu:pub/umfpack, or by sending email to netlib@ornl.gov with the message "send umfpack.shar from linalg". Some licensing restrictions apply.
(NOTE: TR-94-038 is a revision of this report.) Sparse matrix factorization algorithms are typically characterized by irregular memory access patterns that limit their performance on parallel-vector supercomputers. For symmetric problems, methods such as the multifrontal method replace irregular operations with dense matrix kernels. However, no efficient LU factorization algorithm based primarily on dense matrix kernels exists for matrices whose pattern is very unsymmetric. A new unsymmetric-pattern multifrontal method based on dense matrix kernels is presented. Frontal matrices are rectangular instead of square, and the assembly tree is replaced with a directed acyclic graph. As in the classical multifrontal method, advantage is taken of repetitive structure in the matrix by amalgamating nodes in the directed acyclic graph, giving it high performance on parallel-vector supercomputers. The performance of three sequential versions is compared with the classical multifrontal method and other unsymmetric solvers on a Cray YMP-8/128.
See the revised version, TR-94-038 which also appears in SIAM J. Matrix Analysis and Applications, vol. 18, no. 1, pp. 140-158, 1996.
Task graphs are used for scheduling tasks on parallel processors when the tasks have dependencies. If the execution of the program is known ahead of time, then the tasks can be statically and optimally allocated to the processors. If the tasks and task dependencies aren't known ahead of time (the case in some analysis-factor sparse matrix algorithms), then task scheduling must be performed on the fly. We present simple algorithms for a concurrent dynamic-task graph. A processor that needs to execute a new task can query the task graph for a new task, and new tasks can be added to the task graph on the fly. We present several alternatives for allocating tasks for processors and compare their performance. Presented at the International Conference on Parallel Processing, 1993.
The unsymmetric-pattern multifrontal method for the LU factorization of sparse matrices creates a series of smaller, dense frontal matrices that are partially factored and then combined into a fully factored result. These smaller frontal matrices can overlap in rows and/or columns creating dependencies between themselves which are represented as an elimination directed acyclic graph(DAG). This elimination DAG can serve as a task graph to control the parallel factorization of frontal matrices. The potential for parallelism provided in the elimination DAGs produced by the unsymmetric-pattern multifrontal method is investigated via analysis and trace-driven simulations. Different levels of parallelism are explored using both unbounded and bounded processor models. Various scheduling and task allocation schemes are also investigated. Performance results are measured in terms of speed-ups and processor utilizations.
Sparse matrix factorization algorithms are typically characterized by irregular memory access patterns that limit their performance on parallel-vector supercomputers. For symmetric problems, methods such as the multifrontal method have replaced irregular operations with dense matrix kernels. However, no efficient method based on dense matrix kernels exists for matrices whose pattern is very unsymmetric. A new parallel unsymmetric-pattern multifrontal method based on dense matrix kernels is presented. The performance of two sequential prototypes is compared with the MA28, Mups (classical multifrontal), and D2 sparse matrix algorithms. The new method is faster than these other methods on a class of very large, very unsymmetric matrices arising in chemical engineering problems (on a single processor of a Cray YMP-8/128).
In Proceedings of the 7th IMACS Int. Conf. on Computer Methods for Partial Differential Equations, June 22-24, 1992, Rutgers Univ., New Brunswick, New Jersey, USA
This paper first describes the approximate minimum degree, a symmetric version of which is used in the AMD ordering algorithm.
Shared memory multiprocessor systems need efficient dynamic storage allocators, both for system purposes and to support parallel programs. Memory managers are often based on the buddy system, which provides fast allocation and release. Previous parallel buddy memory managers made no attempt to coordinate the allocation, splitting and release of blocks, and as a result needlessly fragment memory. We a present fast, and simple parallel buddy memory manager that is also as space efficient as a serial buddy memory manager. We test our algorithms using memory allocation/deallocation traces collected from a parallel sparse matrix algorithm. Proceedings of the Third International Conference on Computing and Information, Toronto, Ontario, May 1992, pp. 128-132.
Sparse matrix factorization algorithms are typically characterized by irregular memory access patterns that limit their performance on parallel-vector supercomputers. For symmetric problems, methods such as the multifrontal method replace irregular operations with dense matrix kernels. However, no efficient method based primarily on dense matrix kernels exists for matrices whose pattern is very unsymmetric. A new unsymmetric-pattern multifrontal method based on dense matrix kernels is presented. Frontal matrices are rectangular instead of square, and the elimination tree is replaced with a directed acyclic graph. As in the classical multifrontal method, advantage is taken of repetitive structure in the matrix by amalgamating nodes in the directed acyclic graph, potentially giving it high performance on parallel-vector supercomputers. Performance of a sequential version is compared with the classical multifrontal method and a standard unsymmetric solver on an Alliant FX/80 computer.