Frequency domain analysis of large nonlinear analog circuits, AT&T.

See ATT.ps for a paper describing these matrices, in the Other directory:  

@inproceedings{FeldmannMelvilleLong96,
	author={Feldmann, P. and Melville, R. and Long, D.},
	title={Efficient frequency domain analysis of large nonlinear analog circuits},
	booktitle={Proceedings of the IEEE Custom Integrated Circuits Conference},
	year={1996}
	,address={Santa Clara, CA}
	,annote={online, in matrices/ATandT/Documentation/ATT.ps}
	}


--------------------------------------------------------------------------------

rich.m converted to onetone1.rua, and sparse.m converted to onetone2.rua:

	Date: Wed, 8 May 96 07:30:09 EDT
	From: David Long <long at research.att.com>
	To: davis at cise.ufl.edu
	Subject: matrices

	I put two matrices in incoming/rich.m.gz and incoming/sparse.m.gz.
	Both are for the same medium-sized example.  The first one has a
	relatively rich collection of off-diagonal entries and is typical of
	what we might use for an SSOR-like preconditioner.  The second is
	sparser, but probably more typical of what would be used with direct
	factorization.  Both of these are fairly typical of the matrices that
	arise in "one-tone" problems, in that the off-diagonal entries are
	generally confined in some band around the diagonal.  I'll also try to
	find an appropriate matrix from a multi-tone problem.  Those tend to
	have one set of off-diagonal entries close to the diagonal and another
	set relatively far away.

	David

--------------------------------------------------------------------------------

The tt3.m (multitone) matrix was converted to twotone.rua:

	Date: Thu, 9 May 96 07:12:46 EDT
	From: David Long <long at research.att.com>
	To: davis at cise.ufl.edu
	Subject: Re: matrices

	A matrix from a two-tone problem is in incoming/tt3.m.gz.

--------------------------------------------------------------------------------
Regarding a question about the paper (ATT.ps):
--------------------------------------------------------------------------------
	
	Date: Fri, 10 May 96 14:07:22 EDT
	From: David Long <long at research.att.com>
	To: davis at cise.ufl.edu
	Subject: Re: matrices
	
	Yes, figure 2 is supposed to be a diagram of the Jacobian, but let me
	add a few comments that may help clarify things.  First, you can just
	ignore the Y part.  It's included in the formulation for completeness,
	but is not generally needed.  None of the matrices that I sent have a
	Y part.  Second, the matrices can be naturally structured in one of
	two ways, which we call nN format and Nn format.  In nN format, you
	view the matrix as an n by n block matrix, with each block having size
	N by N.  Similarly, Nn format indicates an N by N block matrix, with
	each block having size n by n.  The two formats are related by the
	obvious permutations.  The picture of the Jacobian in figure 2 is for
	the matrix in nN format.  However, to confuse matters, our program
	actually builds the preconditioner matrix in Nn format, and that's the
	format of the matrices that I sent.  (Also to confuse matters, the
	paper is a bit inconsistent.  For example, the statement before
	equation 15 about the Jacobian being block-diagonal for a linear
	circuit assumes that Jacobian is in Nn format.)  The almost-circulant
	structure in figure 2 is in the dense N by N blocks, i.e., each N by N
	block is basically a circulant matrix.  I say almost-circulant because
	the multiplication of Gamma C Gamma^-1 by Omega breaks the circulant
	structure.  For simplicity, in the paper, the discussion of the
	Jacobian structure is carried out assuming that the problem is
	formulated in the complex domain.  In the complex case though, half of
	the coefficients that define the soluation are just conjugates of the
	other half.  Thus, if you translate everything into the real domain,
	you can save some memory.  The program uses this observation, and
	that's why the matrices that I sent are real.  The Jacobian in the
	real case is
	
	Omega T^-1 Gamma C Gamma^-1 T + T^-1 Gamma G Gamma^-1 T
	
	where T is a matrix that translates from the real domain to the
	complex domain.  It turns out that when you apply T^-1 and T to the
	complex circulant matrix, each pair of conjugate stripes in the
	circulant maps into one of those tilted boxes that we discussed.  You
	can see these clearly in the tt3.m preconditioner matrix.
	
	The paper appeared in the 1996 Custom Integrated Circuits Conference
	(CICC'96).  The conference just ended Wednesday, and I haven't
	actually seen the proceedings yet, so I can't give you page numbers
	and such.
	
	David

--------------------------------------------------------------------------------
My response to the message just above:
--------------------------------------------------------------------------------

To: David Long <long at research.att.com>
cc: davis at cise.ufl.edu, rcm at allegra.att.com
Subject: Re: matrices 
Date: Fri, 10 May 1996 17:22:19 -0400
From: Tim Davis <davis at cise.ufl.edu>


In a previous message, David Long <long at research.att.com> wrote
> Yes, figure 2 is supposed to be a diagram of the Jacobian, but let me
> add a few comments that may help clarify things...

Yes - that clarifies things a lot.  I'll have to think it through.
In the nN form, the N-by-N blocks are dense?

So in the Nn format, the n-by-n blocks have a sparse structure like the
left hand side of Figure 2 (the "MNA" matrix)?  And the n-by-n matrix
has a rather irregular structure, right?

The dense blocks, by the way, are why UMFPACK gets good performance on these
matrices.  You may have observed this ... as you increase N, the mflop
rate should increase nicely.

The dense blocks are still there, in nN or Nn format, at least in the
sense that UMFPACK will find them no matter how you contort the input
matrix.  In one sense this is an ideal matrix for UMFPACK ... unsymmetric
in the n-by-n structure (UMFPACK handles asymmetry and irregular structure
fairly well, but there are better methods when the structure is regular),
with each element being a dense N-by-N matrix (UMFPACK likes dense blocks,
since it gets good performance on them in the dense BLAS kernels).

Here are some results so far, on my UltraSparc (128 MB memory, two level
cache (Level 1: 16K data and 16K instruction, Level 2: 512K), and 2GB swap
space).  Peak performance of the BLAS is about 80 mflops in double precision,
160 mflops in single.  Theoretical peak is 333 mflops.

I've renamed the matrices (rich.m = onetone1, sparse.m = onetone2,
and tt3 = twotone).

Methods used:
	UMFPACK2-defaults: with default parameters
	UMFPACK2-nondef.:  no BTF, prefer symmetric pivots
	UMFPACK2 is the same as MA38 (to appear in Harwell Subr. Library).
	SuperLU: a sparse partial pivoting code, from Berkeley.   Also uses
		the BLAS.
	MA48:	a successor to MA28.  Lots faster than MA28.  Does not make as
		extensive use of the BLAS as does SuperLU and UMFPACK.  Better
		suited to more irregular problems (I've seen it much faster
		than UMFPACK for some matrices).

The following is the factorization time (including ordering,
symbolic factorization, etc.):

			time(cpu)	nz in LU	flops		Mflops
			in seconds

onetone1:

UMFPACK2-defaults	60.8  		4693767		0.2148D+10	35.3
UMFPACK2-nondef.	61.0		4907410		0.2337D+10	38.2
SuperLU		       108.2		4674848		0.2525D+10	23.3
MA48		       331.0		5109483		0.4564D+10	13.8


onetone2:

UMFPACK2-defaults	10.5  		1269790		0.1588D+09	15.1
UMFPACK2-nondef.	 9.6		1439817		0.2200D+09	22.9
SuperLU			 8.6		1319194		0.1306D+09	15.2
MA48			38.3		1260893		0.2229D+09	 5.8

twotone:

UMFPACK2-defaults      217.3  		9746806		0.6988D+10	32.2
UMFPACK2-nondef.       238.9	       15730302		0.9601D+10	40.2
SuperLU			failed**
MA48		       734.1	       10860368         1.4675D+10	20.0

For twotone, SuperLU ran out of memory in a preordering part that I wrote,
not in SuperLU proper (I need to try it again, and give it more memory in
that part).

I haven't included pure factorization time - all of these methods can use
prior orderings, for example (UMD2FA vs. UMD2RF, for example).
Do you use UMD2RF?  It would be used if the values of the
Jacobian changes, but the pattern remains the same (and the pivot order
remains the same, and stays numerical acceptable).  In that case, UMFPACK2,
SuperLU, and MA48 would all be faster in the 2nd factorization than they would
in the first.

Do these numbers compare with the results you have?  Actually, your BLAS on
the RS/6000 should get closer to the theoretical peak than does the
UltraSparc.

Oh - and thanks for the matrices.  Nice to have larger ones in which UMFPACK2
does so nicely.

Thanks,
Tim


--------------------------------------------------------------------------------
Matrix pre2.m, converted to pre2.rua