Pattern Name: DecompositionStrategy

FindingConcurrency Design Space


Intent:

This pattern addresses the question "How do you go about decomposing your problem into parts that can be run concurrently?"

Motivation:

Parallel programs let you solve bigger problems in less time. They do this by simultaneously solving different parts of the problem on different processors. This can only work if your problem contains exploitable concurrency, i.e., multiple activities or tasks that can take place at the same time.

Exposing concurrency requires decomposing the problem along two different dimensions:

Balancing the opposing forces of data and task decomposition occurs against the backdrop of two additional factors: efficiency and flexibility. The final program must effectively utilize the resources provided by the parallel computer. At the same time, parallel computers come in a variety of architectures, and you need enough flexibility to handle all the parallel computers you care about.

Note that in some cases the appropriate decomposition will be obvious; in others you will need to dig deeply into the problem to expose the concurrency. Sometimes you may even need to completely recast the problem and restructure how you think about its solution.

Applicability:

Use this pattern when

Implementation:

The goal is to decompose your problem into relatively independent entities that can execute concurrently. As mentioned earlier, there are two dimensions to be considered:

While the decomposition needs to address both the tasks and the data, the nature of the problem usually (but not always) suggests one decomposition or the other as the primary decomposition, and it is easiest to start with that one.

In some cases, you can view the problem in either way. For example, we earlier described a data decomposition of a matrix multiplication problem. You can also view this as a task-based decomposition -- for example, by associating the update of each matrix column with a task. In cases where a clear decision cannot be made, the best approach is to try each decomposition and see which one is most effective at exposing lots of concurrency.

During the design process, you also need to keep in mind the following competing forces:

Balancing these competing forces as you decompose your problem is difficult, and in all likelihood you will not get it right the first time.

Therefore, use an iterative decomposition strategy in which you decompose by the most obvious method (task or data) and then by the other method (data or task).

Examples:

Medical imaging.

We will define a single problem here, taken from the field of medical imaging, and then decompose it two different ways: in terms of tasks and in terms of data. The decompositions will be presented in the DataDecomposition and TaskDecomposition patterns; the discussion here will serve to define the problem and then describe the way the two solutions interact.

An important diagnostic tool is to give a patient a radioactive substance and then watch how that substance propagates through the body by looking at the distribution of emitted radiation. Unfortunately, the images are of low resolution, due in part to the scattering of the radiation as it passes through the body. It is also difficult to reason from the absolute radiation intensities, since different pathways through the body attenuate the radiation differently.

To solve this problem, medical imaging specialists build models of how radiation propagates through the body and use these models to correct the images. A common approach is to build a Monte Carlo model. Randomly selected points within the body are assumed to emit radiation (usually a gamma ray), and the trajectory of each ray is followed. As a particle (ray) passes through the body, it is attenuated by the different organs it traverses, continuing until the particle leaves the body and hits a camera model, thereby defining a full trajectory. To create a statistically significant simulation, thousands if not millions of trajectories are followed.

The problem can be parallelized in two ways. Since each trajectory is independent, it would be possible to parallelize the application by associating each trajectory with a task. This approach is discussed in the Examples section of the TaskDecomposition pattern. Another approach would be to partition the body into sections and assign different sections to different processing elements. This approach is discussed in the Examples section of the DataDecomposition pattern.

As in many ray-tracing codes, there are no dependencies between trajectories, making the task-based decomposition the natural choice. By eliminating the need to manage dependencies, the task-based algorithm also gives the programmer plenty of flexibility later in the design process, when how to schedule the work on different processing elements becomes important.

The data decomposition, however, is much more effective at managing memory utilization. This is frequently the case with a data decomposition as compared to a task decomposition. Since memory is decomposed, data-decomposition algorithms also tend to be more scalable. These issues are important and point to the need to at least consider the types of platforms that will be supported by the final program. The need for portability drives one to make decisions about target platforms as late as possible. There are times, however, when delaying consideration of platform-dependent issues can lead one to choose a poor algorithm.

Parallel database.

As another example of a single problem that can be decomposed in multiple ways, consider a parallel database. One approach is to break up the database itself into multiple chunks. Multiple worker processes would handle the actual searching operations, each on the chunk of the database it "owns" and a single manager would receive search requests and forward each to the relevant worker to carry out the search.

A second approach for this parallel database problem would also use a manager and multiple workers but would keep the database intact in one logical location. The workers would be essentially identical and each would be able to work on any piece of the database.

Observe that the issues raised in this example are similar to those by the medical imaging example.

Iterative algorithms.

Many linear-algebra problems can be solved by repeatedly applying some operation to a large matrix or other array. Effective parallelizations of such algorithms are usually based on parallelizing each iteration (rather than, say, attempting to perform the iterations concurrently). For example, consider an algorithm that solves a system of linear equations Ax = b (where A is a matrix and x and b are vectors) by calculating a sequence of approximations x(0), x(1), x(2), and so forth, where for some function f, x(k + 1) = f (x(k)). A typical parallelization would be structured as a sequential iteration (computing the x(k)s in sequence), with each iteration (computing x(k + 1) = f (x(k)) for some value of k) being computed in a way that exploits potential concurrency. For example, if each iteration requires a matrix multiplication, this operation can be parallelized using either a task-based decomposition (as discussed in the Examples section of the TaskDecomposition pattern) or a data-based decomposition (as discussed in the Examples section of the DataDecomposition pattern).