Berna L. Massingill; "Experiments with Program Parallelization Using Archetypes and Stepwise Refinement"; Proceedings of the Third International Workshop on Formal Methods for Parallel Programming: Theory and Applications (FMPPTA'98 / IPPS'98) , 1998; extended version UF CISE TR 98-012.
Parallel programming continues to be difficult and error-prone, whether starting from specifications or from an existing sequential program. This paper presents (1) a methodology for parallelizing sequential applications and (2) experiments in applying the methodology. The methodology is based on the use of stepwise refinement together with what we call parallel programming archetypes (briefly, abstractions that capture common features of classes of programs), in which most of the work of parallelization is done using familiar sequential tools and techniques, and those parts of the process that cannot be addressed with sequential tools and techniques are addressed with formally-justified transformations. The experiments consist of applying the methodology to sequential application programs, and they provide evidence that the methodology produces correct and reasonably efficient programs at reasonable human-effort cost. Of particular interest is the fact that the aspect of the methodology that is most completely formally justified is the aspect that in practice was the most trouble-free.
Beverly A. Sanders, Berna L. Massingill, and Svetlana Kryukova; "Specification and Proof of an Algorithm for Location Management for Mobile Communication Devices"; Proceedings of the Second International Workshop on Formal Methods for Parallel Programming: Theory and Applications (FMPPTA'97 / IPPS'97), 1997 also UF CISE TR 96-015; revised version to appear in Parallel Processing Letters.
In a network supporting mobile communication devices, a mechanism to find the location of a device, wherever it may be, is needed. In this paper, we present a distributed algorithm for this purpose along with its formal specification and proof sketch. Starting with an algorithm due to Wang, the process of formalization together with careful attention to abstraction leads to a more regular, general, and robust algorithm with a clearer description. An incidental contribution is a useful theorem for proving progress properties in distributed algorithms that use tokens.
Adam Rifkin and Berna L. Massingill; "Performance Analysis for Archetypes"; Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'98), 1998; extended version Caltech CS-TR-96-27.
This document outlines a simple method for benchmarking a parallel communication library and for using the results to model the performance of applications developed with that communication library. We use compositional performance analysis -- decomposing a parallel program into its modular parts and analyzing their respective performances -- to gain perspective on the performance of the whole program. This model is useful for predicting parallel program execution times for different types of program archetypes, (e.g., mesh and mesh-spectral) using communication libraries built with different message-passing schemes (e.g., Fortran M and Fortran with MPI) running on different architectures (e.g., IBM SP2 and a network of Pentium personal computers).
K. Mani Chandy, Rajit Manohar, Berna L. Massingill, and Daniel I. Meiron; "Integrating Task and Data Parallelism with the Collective Communication Archetype"; Proceedings of the 9th Internation Parallel Processing Symposium (IPPS'95), 1995; also Caltech CS-TR-94-08.
A parallel program archetype aids in the development of reliable, efficient parallel applications with common computation/communication structures by providing stepwise refinement methods and code libraries specific to the structure. The methods and libraries help in transforming a sequential program into a parallel program via a sequence of refinement steps that help maintain correctness while refining the program to obtain the appropriate level of granularity for a target machine. The specific archetype discussed here deals with the integration of task and data parallelism using group communication. This archetype has been used to develop several applications.