Pattern Name: DependencyAnalysis

FindingConcurrency Design Space


Intent:

This pattern addresses the question "After you have decomposed a problem into tasks, how do you analyze how they depend on each other?"

Motivation:

This pattern comes into play once you have decomposed a problem into tasks that can execute concurrently. In a few cases, not only can these tasks execute concurrently, but they are completely independent. For independent tasks, the programmer need only create the tasks and schedule them for efficient execution on the processing elements of the parallel computer.

More frequently, however, a problem's tasks are not independent; they influence each other such that what happens in one task affects the execution of another task. We call these influences between tasks dependencies. Dependencies take one of two basic forms:

Finding and managing dependencies is one of the most difficult jobs facing a parallel program designer.

Applicability:

Use this pattern when

Implementation:

The goal of a dependency analysis is to understand in detail how the tasks that make up your parallel program depend on each other. There are two kinds of dependencies:

The dependencies between a problem's tasks have a major impact on both the efficiency and the complexity of the final program:

Correctly analyzing dependencies among tasks is both difficult and crucial to the overall effectiveness of the design. There is no single way to accomplish this analysis, but we have found the following approach to be effective:

It is not always possible to see whether decisions made at one step of this process will be effective until after you have worked through later steps. In fact, in many cases you cannot fully understand whether a decomposition will be effective until after analyzing the resulting dependencies. Therefore, you should expect to iterate back and forth between the dependency and decomposition patterns, first carrying out a decomposition, analyzing its dependencies, and then reconsidering the decomposition. Even experienced parallel programmers may need to iterate though several cycles before getting it right.

Examples:

Molecular dynamics.

We will consider a single example as we look at each one of the dependency analysis design patterns. In the present pattern, we'll just introduce the problem and its decomposition. The dependency analysis itself will be described in the Examples sections of the individual dependency patterns.

Our example problem is to design a parallel molecular dynamics program. The theoretical background of molecular dynamics is interesting, but not really relevant to this discussion; we just need to understand the problem at its simplest level.

Molecular dynamics is used to simulate the motions of a large molecular system. For example, molecular dynamics simulations show how a large protein moves around and how different-shaped drugs might interact with the protein. Not surprisingly, molecular dynamics is extremely important in the pharmaceutical industry. It turns out that molecular dynamics is important in computer science as well. It's a perfect test problem for computer scientists working on parallel computing: it's simple to understand, relevant to science at large, and very tough to effectively parallelize. Many papers have been written by computer scientists about parallel molecular dynamics algorithms (see the references in [Mattson94d] for some of these papers).

So what is the basic molecular dynamics problem? The idea is to treat the molecule as a large collection of balls connected by springs. The balls represent the atoms in the molecule, while the springs represent the chemical bonds between the atoms. The molecular dynamics simulation itself is an explicit time-stepping process. At each time step, you compute the force on each atom, and then use standard classical mechanics to compute how the force moves the atoms. This process is carried out repeatedly to step through time and compute a trajectory for the molecular system.

The forces due to the chemical bonds (the "springs" are relatively simple to compute. These correspond to the vibrations and rotations of the chemical bounds themselves. These are short-range forces that can be computed with knowledge of the handful of atoms that share chemical bonds. What makes the molecular dynamics problem so difficult is the fact that the "balls" have partial electrical changes. Hence, while atoms interact with a small neighborhood of atoms through the chemical bonds, the electrical charge causes every atom to apply a force to every other atom.

This is the famous N-body problem. On the order of N2 terms must be computed to get the long-range force. Since N is large (tens or hundreds of thousands) and the number of time steps in a simulation is huge (tens of thousands), the time required to compute these long-range forces dominates the computation. Clever scientists have worked out elegant ways to reduce the effort required to solve the N-body problem. We are only going to discuss the simplest of these tricks: the so-called cutoff method.

The idea is quite simple. Even though each atom exerts a force on every other atom, this force decreases as the distance between the atoms grows. Hence, it should be possible to pick a distance beyond which the force contribution can be ignored. That's the cutoff, and it reduces a problem that scales as O(N2) (where "O(N)" denotes "on the order of N") to one that scales as O(N · n), where n is the number of atoms within the cutoff volume, usually hundreds). The computation is still huge, and it dominates the overall runtime for the simulation, but at least the problem is tractable.

There are a host of details, but the basic simulation can be summarized with the following simplified pseudo-code:


    Int const N   // number of atoms 

    Array of real :: Atoms (3,N) //3D coordinates
    Array of real :: Force (3,N) //force in each dimension
    Array of lists :: neighbors(N) //atoms in cutoff volume

    loop over time steps
        vibrational_forces (N, Atoms, Forces)
        rotational_forces (N, Atoms, Forces)
        neighbor_list (N, Atoms, neighbors)
        long_range_forces (N, Atoms, neighbors, Forces)
        update_atom_positions(N, Atoms, Forces)
        physical_properties ( ... Lots of stuff  ... )
    end loop

There are a few details to mention, and then we can consider how this can be solved in parallel.

First, the neighbor_list() computation is time-consuming. The gist of the computation is a loop over each atom, inside which every other atom is checked to see if it falls within the indicated cutoff volume. Fortunately, the time steps are very small, and the atoms don't move very much in any given time step. Hence, this time-consuming computation is only carried out every 10 to 100 steps. For our discussion, we are not going to parallelize this computation.

Second, the physical_properties() function computes velocities, energies, correlation coefficients, and a host of interesting physical properties. These computations, however, are not that involved and do not affect the overall parallel algorithm. Hence, we will ignore them in this discussion.

We are now ready to look at how to decompose this problem for execution on a parallel computer. In any parallel algorithm project, the first step is to decide where the most time-consuming computations are taking place. We have already discussed this and pointed out that the bulk of the time is in long_range_forces(). This means that whatever we do, we must pick a problem decomposition that makes that computation run efficiently in parallel.

While we won't discuss it in detail here (see [Mattson94d] for details), each of the computations inside the time loop has a similar structure. Namely, they include a loop over atoms in which each loop iteration independently updates that atom's corresponding array elements. So a natural task definition for each function is an iteration of this atom-loop: i.e., the update required by each atom.

Now let's look at the data decomposition. Each element of the array of atomic coordinates, Atoms (the array of atomic coordinates) is updated using its own data in all cases and coordinate data from a neighborhood of atoms in most cases. Elements of the Forces array are updated similarly (Newton's third law comes into play so when you compute the force of atom I on atom J you also get the negative of the force of atom J and atom I). Hence, these arrays need to be managed as shared data.

Summarizing our problem and its decomposition, we have the following:

Dependencies among these tasks and the associated data will be analyzed in the Examples sections of the dependency patterns ( GroupTasks, OrderTasks, and DataSharing).