The Archetype Library Project

Next: Reactive Distributed Systems Up: Concurrent Program Archetypes Previous: Education and Hypermedia

The Archetype Library Project

The archetype library project at Caltech has four main parts: The development of archetypes for (a) scientific applications, (b) Combinatorics and optimization, (c) reactive systems, and the development of technology to support documentation of archetype libraries. All the archetypes are part of an electronic document called etext which can be obtained on World Wide Web under URL http://www.etext.caltech.edu. The NEXTSTEP version, which includes more interactive multimedia material can be obtained by sending a message to kryukova@cco.caltech.edu. (The NEXSTEP version requires the NEXTSTEP operating system, and it also requires purchase of the CraftmanEngine tool which is the electronic document authoring system that we used.) The project is ongoing, and a tremendous amount of work remains to be done.

One way to describe the use of archetypes is to step through a few examples. Next, we discuss two small examples, one from combinatorics and the other from scientific computation.

Archetype Support for Software Design

Before discussing the examples, I want to emphasise that archetype libraries do not mechanically produce code from specifications - archetype libraries support program development methods, and as in all design methods, intelligence and creativity are required.

Finding out whether a problem fits an archetype exactly, or approximately, is not automatic, though the navigation tools of etext will help. If a problem does not appear to fit any of the archetypes in the library, then either the library cannot be used for this problem, or the problem has to be partitioned into subproblems, some of which may match an archetype. The archetype library does not provide support for the process of stepwise refinement unless the specification fits an archetype. In many cases, a significant amount of design work may have to be carried out before an archetype can be used.

A Combinatorics Example

The Problem

You are faced with the following (well known) problem from computational geometry. Given a set of points in a rectangle on the plane. Each point is specified by its x and y coordinates. The set contains at least two points. Develop a program to determine the distance between the nearest points.

You would like to develop a program in C++ and you would like to explore the use of C++ with a message-passing library such as MPI, and you would also like to try out a C++-extension parallel language such as CC++ (which was developed at Caltech by Carl Kesselman and myself). You are particularly interested in implementations on (a) networks of small numbers of workstations connected by common ethernet, and (b) the Intel Paragon.

Does the Archetype fit the Problem?

The first step is to determine if the divide and conquer archetype fits the given problem. Depending on your familiarity with the divide and conquer method of problem solving, you will (a) decide that for yourself without using the documentation, or (b) read the abbreviated description, or (c) study the detailed description. If you are using WWW (World Wide Web), you will get into etext, and then click on the divide and conquer archetype full version (written by Svetlana Kryukova) or the short version. The full version has a detailed description of the divide and conquer strategy and its parallel implementation. I have hopes that high-school students will develop some parallel programs on a variety of machines, and the detailed documentation is at a level that can be followed by a high-school student.

The structure of a divide and conquer program is as follows. Associated with each problem is its size. Assume that size is a natural number. Problems that are small enough that they can be solved by some (usually trivial) method known to the programmer are called base-case problems. If the problem is a base-case problem, then it is solved using a procedure, and let's call this procedure the base-case solution. If the given problem is larger than all base-case problems, the given problem is partitioned into a list of problems, each of size strictly smaller than the given problem. The solutions to each of the smaller problems are merged to obtain a solution to the given problem.

The archetype specifies what the programmer should provide: (a) the base-case test, (b) the base-case solution, (c) the procedure for splitting non-base-case problems into smaller subproblems, and (d) the procedure for merging subproblem solutions. More importantly, the archetype also describes parallel implementations of divide and conquer programs.

Our first step is to determine if we can obtain the procedures required by the archetype for the given nearest-neighbors problem. Having determined that we can do so, we next turn to parallel implementations.

Parallel Structures

Click the parallel implementation section of the text. This section describes different approaches to developing parallel algorithms based on the divide-and-conquer archetype.

The section points out that one approach is to first split non-basecase problems and then do all the subproblems in parallel, and then merge the results. The subproblems are solved concurrently with each other, but the subproblems are not solved concurrently with the split and merge. This approach is simple because the concurrently executing programs do not share data with each other.

The section points out an alternate approach which is to execute concurrently split, the procedures that compute the solutions to the subproblems, and the merge. In this case, split shares data with the subproblem solutions; in particular, split outputs the subproblems that become the input to the subproblem-solution procedures. We must adopt some protocol to share data between concurrently executing processes. Since the data shared between concurrent processes in this structure is written by at most one process, and we know a priori which processes read the data, the section suggests using either message-passing or single-assignment data structures. The value added by archetypes is such detailed design information about parallel process and communication structure which is specific to the archetype.

Performance Issues

The section on performance describes methods for determining process and message granularity, and for mapping processes to processors. If a non-basecase problem is small enough, it is more efficient to solve the problem sequentially rather than in parallel, because the overhead of communication and process spawning exceeds the time saved by parallel execution. For networks of workstations, with ratios of communication speeds to computation speeds that are low, it is more efficient to solve relatively large problems sequentially; by contrast, for parallel machines with fast switches, it is more efficient to solve problems in parallel until the problem size gets to be much smaller. A simple performance model suggests appropriate design choices for different target architectures.

Example Applications

Our next step is to look at example applications. Click on the mergesort example to see how the divide and conquer archetype is used for mergesort. Look at the programs in different languages that have been implemented for mergesort. Then look at other applications.

Inspecting a collection of applications, all of which use the same archetype, is helpful in understanding how to tailor an archetype to a specific problem.

Language Issues

We can postpone deciding on a language and runtime library as much as we wish, because the issues we considered earlier, such as program structure, reasoning about correctness and performance, are largely independent of language and runtime library. Having decided to use C++, we explore implementations using C++ for some of the problems, starting with mergesort.

We first explore a C++ language extension called CC++. Clicking on CC++ within mergesort brings you to a CC++ program. The documentation gives you the C++ skeleton structure, and it tells you exactly what parts you have to fill, and then it provides a detailed example of the parts to be filled in for mergesort. We see that your obligations to tailor this archetype to a specific problem are identical whether the target language is CC++, or C++ and PVM. Furthermore, your obligation is to develop sequential code; the archetype shields you from having to deal with parallelism.

Debugging

The next step is to debug your program. One would expect that programs developed so carefully would not have bugs. But if they did, the section on test suites and debugging suggests common bugs for this archetype. For instance, if the program does not terminate, a common bug is that the split program is incorrect because one of the subproblems is not smaller than the original problem. Again, the value of the archetype is archetype-specific debugging information.

A particularly useful aspect of archetypes are assertions provided with the program structure. When you fill in the program skeleton, you also fill in an assertional skeleton. Thus as you tailor an archetype to obtain a specific program, you also tailor its assertional structure to obtain assertions specific to the program. Assertions (for example in C) are necessary for reasoning and are very helpful in debugging.

Execution

Your next step is to execute the program on some target machines. The performance tuning information, and suggestions for collecting performance data will help you tune your program to get satisfactory efficiency. The archetype library includes performance data of divide and conquer examples executing on your target machines (such as networks of workstations and the Intel Paragon), and this data and the methods of collecting the data described in the archetype will be helpful to you in instrumenting your application and in understanding its performance.

Documentation

The documentation you need to write is limited to the procedures that you add. The basic documentation for divide and conquer is provided in great detail, and so all you need to do is to describe the sequential procedures that you added. Our idea here is not to provide a documentation template, but rather to split the documentation into two parts: documentation of the archetype (and this already exists in the template library) and linked to this is your problem-specific documentation.

Summary

In summary, archetypes are designed to support you every step of the way from concept to implementation on parallel machines. The advantage of archetypes is that it capitalizes experience and insight in developing parallel programs for a class of related problems.

A Scientific Application

Data Parallel Archetypes

Divide and conquer is an example of a task-parallel archetype; tasks are created and deleted dynamically. Many scientific applications are data parallel, and there are important problems which have both data parallel and task parallel components. Next, I discuss a simple data-parallel archetype: mesh computations.

The Problem

You are asked to parallelize a Fortran77 program that has been developed over a period of years. The program is a computational fluid dynamics application. You want to execute your program on parallel machines to get faster turnaround. You would like to explore executing your program on networks of workstations and the IBM SPx.

A difference between this problem and the previous one is that the specification of your problem is an existing sequential program. You want a parallel implementation of your program that produces exactly the same results as the sequential program (assuming that the floating point units are identical).

Steps towards a Parallel Implementation

Your sequential program has a mesh structure. The simulation consists of computing the values of variables at each point of an n-dimensional mesh. Each point of the mesh is computed in terms of values at neighboring points.

You click on the mesh archetype (written by Berna Massingill). The mesh archetype description has an overview of mesh computations, and a detailed description of parallel implementations. The mesh archetype describes the essential data-mappings and associated procedures to obtain parallel implementations. The parallel program structure is as follows. The mesh is partitioned into regions. Neighboring regions share a boundary. The values for each region and its associated boundary are computed, and then neighbors exchange the values of boundaries that they share.

A parallel mesh computation has sequential components and specifically parallel components. The sequential components compute new values for each region given old values for the region (including its boundaries) The operations that deal with parallelism are: IO to read input data into proceses and to output data from regions, and procedures to exchange boundaries. The parallel archetype is structured so that you can focus your attention on the problem-specific sequential components while using archetype-specific parallel procedures from the library.

Next, you could explore a specific parallel application in the library such as Dan Meiron's CFD application that simulates flow in a toroid. The point is not the similarity (or lack of it) between Dan's application and yours; the point is to study how an actual application is developed in a straightforward systematic manner from the archetype. Applications in the library are written to emphasise the similarity between the application and the template. In particular, applications are structured so that procedures that deal with parallelism are obtained from the archetype, and problem-specific procedures are sequential.

Dan's CFD application is written in Fortran M, a parallel extension of Fortran developed by Ian Foster at Argonne and myself. Key features of Fortran M are that it supports completely dynamic task and communication structures, while the compiler can verify that the program is deterministic (i.e., all runs of the program with the same inputs produce the same output.) If you want to develop a Fortran M application and execute it on an IBM SP, you can incorporate sequential Fortran 77 code derived from your sequential application into the communication and IO routines provided by the archetype.

Performance measurements given in the archetype, especially graphs of execution time versus the number of processors, will be particularly helpful to you as you tune your application. A simple performance model is provided to help you port your application to machines for which the archetype does not provide empirical evidence.

You may also choose to develop your application in Fortran 77 and PVM. In this case too, the structure of the archetype hides language-specific and runtime-specific details. So, most of the time you take in developing your parallel application will not be wasted whether you use Fortran M, or Fortran and PVM, or if you switch from one to the other.

Parallelizing Sequential Programs

A component of the mesh archetype is not in the divide and conquer archetype: this component (written by Berna Massingill) describes a detailed systematic sequence of small steps to convert a sequential mesh computation into a parallel mesh computation. The value added by the archetype is that the sequence of steps associated with the mesh archetype is designed specifically to support mesh applications; different sequences of steps may be adopted for other kinds of applications.

A sequence of steps is carried out in the sequential domain; each of these steps produces a sequential program (say in Fortran 77) executing in a sequential environment familiar to the programmer. The intent is to do as much of the parallelizing work in an environment familiar to the programmer. Familiar sequential environments seem particularly important for debugging.

The final transformation converts a (transformed) sequential program into a parallel program. This conversion is mechanical, and does not introduce errors. Thus programmers can develop parallel mesh applications in a semi-mechanical way.

Archetype Hierarchies

One archetype can use another, and our archetype library will have a partial-ordering of archetypes, based on the uses relation. One archetype using another is more complex than one procedure calling another. When one archetype X uses another archetype Y, X must be able to use all aspects of Y including Y's software architecture, methods of reasoning about correctness and performance, methods of parallelization, and even test suites and documentation.

The reason for the existence of archetypes is reuse. Therefore, though archetype composition is complex, it is essential that archetypes in the library use other archetypes effectively. Next, we give a few examples of dependence between archetypes.

Multigrid

Berna Massingill has begun work on a multigrid archetype with help from Dan Meiron. The multigrid archetype uses the mesh archetype. In essence the multigrid archetype consists of the mesh archetype for each grid, and has new computation and communication structures for converting between finer and coarser grids. Though the multigrid archetype is not complete yet, it does appear that the multigrid archetype can truly use all aspects of the mesh archetype.

Neighbor-Exchange

The mesh archetype uses a simpler SPMD mesh computation-communication archetype in which processes are structured in an n-dimensional mesh, and processes communicate in an SPMD fashion to all neighbors in the mesh. Let us call this simpler archetype the mesh neighbor-exchange archetype. The neighbor-exchange archetype deals with IO and communication with neighbors; it does not specify what data is associated with each process. By contrast, the mesh archetype requires that the data structure be a grid, and is explictly designed to deal with ghost boundaries.

The neighbor-exchange archetype is a special case of the collective communication on a process mesh archetype, which in turn is a special case of the collective communication archetype.

In summary, a great deal of thought has to go into structuring a library of archetypes. The complexity of this task is more than that of structuring a library of programs because one archetype uses much more than the code structure of another archetype.

Next: Reactive Distributed Systems Up: Concurrent Program Archetypes Previous: Education and Hypermedia

mani@cs.caltech.edu

The Archetype Library Project

Overview

How to get it

Archetype Support for Software Design

A Combinatorics Example

The Problem

Does the Archetype fit the Problem?

Parallel Structures

Performance Issues

Example Applications

Language Issues

Debugging

Execution

Documentation

Summary

A Scientific Application

Data Parallel Archetypes

The Problem

Steps towards a Parallel Implementation

Parallelizing Sequential Programs

Archetype Hierarchies

Multigrid

Neighbor-Exchange