DecompositionStrategy Design Pattern

Pattern Name: DecompositionStrategy

FindingConcurrency Design Space

Intent:

This pattern addresses the question "How do you go about decomposing your problem into parts that can be run concurrently?"

Parallel programs let you solve bigger problems in less time. They do this by simultaneously solving different parts of the problem on different processors. This can only work if your problem contains exploitable concurrency, i.e., multiple activities or tasks that can take place at the same time.

Exposing concurrency requires decomposing the problem along two different dimensions:

Task decomposition. Break the stream of instructions into multiple chunks called tasks that can execute simultaneously. To achieve reasonable runtime performance, tasks must execute with minimal need to interact; i.e., the overhead associated with managing dependencies must be small compared to the program's total execution time.
Data decomposition. Determine how data interacts with the tasks. Some of the data will be modified only within a task; i.e., it is local to each task. For such data, the algorithm designer must figure out how to break up the data and properly associate it with the right tasks. Other data is modified by multiple tasks; i.e., the data is global or shared between tasks. For shared data, the goal is to design the algorithm so that tasks don't get in each other's way as they work with the data.

Balancing the opposing forces of data and task decomposition occurs against the backdrop of two additional factors: efficiency and flexibility. The final program must effectively utilize the resources provided by the parallel computer. At the same time, parallel computers come in a variety of architectures, and you need enough flexibility to handle all the parallel computers you care about.

Note that in some cases the appropriate decomposition will be obvious; in others you will need to dig deeply into the problem to expose the concurrency. Sometimes you may even need to completely recast the problem and restructure how you think about its solution.

Applicability:

Use this pattern when

You have determined that your problem is large and significant enough that expending the effort to create a parallel program is worthwhile.
You understand the key data structures of the problem and how they are used.
You understand which parts of the problem are most compute-intensive. It is on these parts that you will focus your efforts.

Implementation:

The goal is to decompose your problem into relatively independent entities that can execute concurrently. As mentioned earlier, there are two dimensions to be considered:

The task-decomposition dimension focuses on the operations that will take place within concurrently-executing entities. We refer to a set of operations that are logically grouped together as a task. For a task to be useful, the operations that make up the task should be largely independent of the operations taking place inside other tasks.
The data-decomposition dimension focuses on the data. You need to decompose the problem's data into chunks that can be operated on relatively independently.

While the decomposition needs to address both the tasks and the data, the nature of the problem usually (but not always) suggests one decomposition or the other as the primary decomposition, and it is easiest to start with that one.

A data-based decomposition is a good starting point if:
- The most compute-intensive part of the problem manipulates a large data structure.
- Similar operations are being applied to different parts of the data structure, in such a way that the different parts can be operated on relatively independently.
For example, many problems can be cast in terms of the multiplication of large matrices. Mathematically, each element of the product matrix is computed using the same set of operations. This suggests that an effective way to think about this problem is in terms of the decomposition of the matrices. We talk about this approach in more detail in the DataDecomposition pattern.
Data-based decompositions tend to be more scalable (i.e., their performance scales with the number of processing elements), since memory is being decomposed.
A task-based decomposition is a good starting point if:
- It is natural to think about the problem in terms of a collection of independent (or nearly independent) tasks.
For example, many problems can be considered in terms of a function that is evaluated repeatedly, with a slightly different set of conditions each time. We can associate each function evaluation with a task. If this is the case, you should start with a task-based decomposition, which is the subject of the TaskDecomposition pattern.
If there are many nearly independent tasks, task-based decompositions tend to produce a design with a lot of flexibility, which is an advantage when later deciding how to allocate tasks to processing elements.

In some cases, you can view the problem in either way. For example, we earlier described a data decomposition of a matrix multiplication problem. You can also view this as a task-based decomposition -- for example, by associating the update of each matrix column with a task. In cases where a clear decision cannot be made, the best approach is to try each decomposition and see which one is most effective at exposing lots of concurrency.

During the design process, you also need to keep in mind the following competing forces:

Flexibility. Is your design abstract enough that you have sufficient flexibility to adapt to different implementation requirements? For example, you don't want to narrow your options to a single computer system or style of programming at this stage of the design.
Efficiency. A parallel program is only useful if it scales efficiently with the size of the parallel computer (in terms of reduced runtime and/or memory utilization). For the problem's decomposition, this means you need enough tasks to keep all the processing elements (PEs) busy with enough work per task to compensate for overhead incurred to manage dependencies. The drive for efficiency can lead to complex decompositions that lack flexibility.
Simplicity. Your decomposition needs to be complex enough to get the job done, but simple enough to let you debug and maintain your program in a reasonable amount of time.

Balancing these competing forces as you decompose your problem is difficult, and in all likelihood you will not get it right the first time.

Therefore, use an iterative decomposition strategy in which you decompose by the most obvious method (task or data) and then by the other method (data or task).

Examples:

Medical imaging.

We will define a single problem here, taken from the field of medical imaging, and then decompose it two different ways: in terms of tasks and in terms of data. The decompositions will be presented in the DataDecomposition and TaskDecomposition patterns; the discussion here will serve to define the problem and then describe the way the two solutions interact.

An important diagnostic tool is to give a patient a radioactive substance and then watch how that substance propagates through the body by looking at the distribution of emitted radiation. Unfortunately, the images are of low resolution, due in part to the scattering of the radiation as it passes through the body. It is also difficult to reason from the absolute radiation intensities, since different pathways through the body attenuate the radiation differently.

To solve this problem, medical imaging specialists build models of how radiation propagates through the body and use these models to correct the images. A common approach is to build a Monte Carlo model. Randomly selected points within the body are assumed to emit radiation (usually a gamma ray), and the trajectory of each ray is followed. As a particle (ray) passes through the body, it is attenuated by the different organs it traverses, continuing until the particle leaves the body and hits a camera model, thereby defining a full trajectory. To create a statistically significant simulation, thousands if not millions of trajectories are followed.

The problem can be parallelized in two ways. Since each trajectory is independent, it would be possible to parallelize the application by associating each trajectory with a task. This approach is discussed in the Examples section of the TaskDecomposition pattern. Another approach would be to partition the body into sections and assign different sections to different processing elements. This approach is discussed in the Examples section of the DataDecomposition pattern.

As in many ray-tracing codes, there are no dependencies between trajectories, making the task-based decomposition the natural choice. By eliminating the need to manage dependencies, the task-based algorithm also gives the programmer plenty of flexibility later in the design process, when how to schedule the work on different processing elements becomes important.

The data decomposition, however, is much more effective at managing memory utilization. This is frequently the case with a data decomposition as compared to a task decomposition. Since memory is decomposed, data-decomposition algorithms also tend to be more scalable. These issues are important and point to the need to at least consider the types of platforms that will be supported by the final program. The need for portability drives one to make decisions about target platforms as late as possible. There are times, however, when delaying consideration of platform-dependent issues can lead one to choose a poor algorithm.

Parallel database.

As another example of a single problem that can be decomposed in multiple ways, consider a parallel database. One approach is to break up the database itself into multiple chunks. Multiple worker processes would handle the actual searching operations, each on the chunk of the database it "owns" and a single manager would receive search requests and forward each to the relevant worker to carry out the search.

A second approach for this parallel database problem would also use a manager and multiple workers but would keep the database intact in one logical location. The workers would be essentially identical and each would be able to work on any piece of the database.

Observe that the issues raised in this example are similar to those by the medical imaging example.

Iterative algorithms.

Many linear-algebra problems can be solved by repeatedly applying some operation to a large matrix or other array. Effective parallelizations of such algorithms are usually based on parallelizing each iteration (rather than, say, attempting to perform the iterations concurrently). For example, consider an algorithm that solves a system of linear equations Ax = b (where A is a matrix and x and b are vectors) by calculating a sequence of approximations x⁽⁰⁾, x⁽¹⁾, x⁽²⁾, and so forth, where for some function f, x^{(k + 1)} = f (x^(k)). A typical parallelization would be structured as a sequential iteration (computing the x^(k)s in sequence), with each iteration (computing x^{(k + 1)} = f (x^(k)) for some value of k) being computed in a way that exploits potential concurrency. For example, if each iteration requires a matrix multiplication, this operation can be parallelized using either a task-based decomposition (as discussed in the Examples section of the TaskDecomposition pattern) or a data-based decomposition (as discussed in the Examples section of the DataDecomposition pattern).