A General Model of Parallel Computing

Introduction

People understand the world by comparing their observations of the world against a set of internally held models. This is an innate behavior, and for the most part we are not aware that it is taking place. When trying to bring order to a new field of study, however, or when trying to deepen understanding of an old one, it can be useful to make the models explicit.

For sequential programming, almost all programmers use a common model applicable to single-processor computers: the von Neumann model. The von Neumann model views a computer as a single processor with a single stream of instructions operating on a single memory. Processors and memory subsystems vary widely from one computer to another, but these details can be neglected by the programmer (except for final performance tuning), letting a program coded to the von Neumann model run on any single-processor computer.

For parallel programming, unfortunately no such consensus view yet exists. This complicates programmers' jobs by forcing them to choose from a range of models. In addition, the lack of universal models has diluted the efforts of programming tool developers, resulting in relatively immature tools for parallel computing. In other words, the lack of common models serves to make the parallel programmer's difficult job even harder.

In this note, we attempt to correct this shortcoming by proposing common models to support parallel computing. We begin by describing the types of models that are required; we then describe a particular set of models we will use in our pattern language and how they interact.

Types of Models

Our needs cannot be met with a single model; instead we need a hierarchical set of models, where each layer in the hierarchy is a self-consistent model appropriate for a class of activities. Each layer is self-contained, but the layers fit together in well-understood ways, so someone can work at one layer in the hierarchy confident that his or her work can be mapped onto models at lower and higher levels as needed. Such a hierarchy supports the way programmers normally work: High-level algorithm design would be done using the top layer of the hierarchy, in which (for parallel programs) concurrency and shared data are abstract and easy to express. Implementing a design would mean moving down one layer, to a model that supports thinking about how to safely implement concurrency and shared data. Once the design has been implemented as a program, a model in an even lower layer would be used to carry out optimizations such as overlapping computation and communication. Each step of the application-development process would use a different model (one that focuses on the key issues of the task at hand)..

What would the layers in such a hierarchy look like? There have been a number of attempts to define them, but no one approach has been adopted as a consensus view. We propose the following layers:

It is very important to define carefully the models and the way they interrelate, and it is important that the models work well together. If the programming model does not mesh well with the specification model, for example, the resulting programs will be overly complex both for the compiler to optimize and for the programmer to understand.

A Hierarchy of Models for our Pattern Language

We are concerned with helping the general-purpose programmer write parallel programs. To this end, we are most interested in the top three layers of the hierarchy of models. Cost models are important, especially when considering the tradeoffs raised by closely related algorithms. All too often, however, by emphasizing the cost model, the overall hierarchy is cluttered with low-level details, making the collection of models too complex for the general programmer to tolerate. To avoid this trap, we will ignore the cost model entirely and focus on the top three layers alone.

Computational model: the MIMD NUMA model

A parallel computer consists of multiple processing elements (PEs) connected into a single logical system. Borrowing from the work on candidate type architectures (CTA), we propose the following abstract view of a typical computer system, which we call the MIMD (multiple instruction, multiple data) NUMA (non-uniform memory access) model.

Consider a general MIMD computer consisting of multiple von Neumann computers. As implied by the term MIMD, each of these von Neumann computers has its own instruction stream and data stream; they are then connected by some mechanism that allows them to share information and to synchronize their execution. We can represent the whole system using a single model if we view any sharing of information, whether through a tightly-integrated physical shared memory (e.g., symmetric multiprocessor workstations) or by way of message-passing along an interconnection network (e.g., workstation clusters or the Paragon supercomputer), as a type of shared memory.

The cost of accessing information within an instruction stream varies depending on the source of the information. If it comes from physically local memory, the cost of accessing the information is low. If it comes from a remote memory location, the cost is usually much higher, since it includes both the overhead of accessing the network and any OS overhead. Once again relating shared memory to shared information , this means that we can view our general MIMD computer system as a NUMA (non-uniform memory access) computer.

Our proposal, therefore, is to view every parallel computer as a MIMD NUMA shared-memory computer. This is common practice among software performance engineers, since even for an SMP machine the caches attached to the processors create a memory hierarchy with varying costs. A consequence of adopting the model, then, is that locality of data, which affects performance in NUMA systems, must be explicitly taken into account by the programmer.

Note that we are not advocating that only shared-memory programming environments be used. Shared-memory programming environments are available on workstation clusters (e.g.. Myrias' PAM or TreadMarks), and message-passing environments are available on SMP workstations (e.g., MPI from MPI Technologies Inc.). All we are advocating is that in thinking about an algorithm, you can view any practical parallel computer system as a shared memory MIMD NUMA machine.

This model does not support a quantitative cost model, so it cannot be used, for example, to study the impact of different network topologies. It does, however, address the major issues in constructing a parallel algorithm, namely organizing a computation to minimize synchronization and to make memory references as local as possible.

Programming model: the coordination model

Many parallel programming models can be layered on top of the MIMD NUMA model. For our purposes, the most effective programming model that is general enough to meet our needs is the coordination model. It does not support quantitative cost estimation and therefore cannot help designers make fine distinctions between related algorithms. It does, however, provide a common ground for discussing parallel algorithms.

The model is simple: A parallel computation is viewed as a number of "units of execution" (UEs). A UE (usually a thread or a process) acts very much like a von Neumann machine. Each UE runs independently except at discrete and explicit points where it interacts with another UE. These interactions, the so-called coordination events, coordinate the UEs; that is, they provide communication, synchronization, and management services for the UEs.

This model clearly fits most coordination languages (Linda, ALWAN, etc.) and most message-passing systems (PVM, MPI, etc.). It does not adequately describe the situation in which several UEs share a single address space (i.e., a multithreaded programming environment); such a system allows interactions to occur anywhere, since memory addresses are accessible to multiple UEs. Note, however, that a program that allowed multiple threads to interact freely would likely be subject to race conditions; in contrast, well-written multithreaded programs make accesses to shared memory distinct and protect them through some type of synchronization. In other words, a good shared-memory program is based on the coordination model.

Specification model: abstract parallel algorithms

Having defined the lower layers of our hierarchy of models, we now look at the top layer, which we will use to describe concurrent algorithms. We define the following elements of a parallel algorithm:

As with the underlying models, the above statements are too general to support formal reasoning or quantitative performance analysis. It does, however, give us a shared vocabulary for a parallel algorithm. In particular, it makes clear the algorithm designers' need to define tasks, manage their coordination, map them onto UEs, and finally map the UEs onto processing elements.

How the models interact

Having defined models for the three layers we want to address, we can talk about the relationships among them. These relationships are an important part of our overall model.