TaskDecomposition Design Pattern

Pattern Name: TaskDecomposition

FindingConcurrency Design Space

Intent:

This pattern addresses the question "How do you decompose a problem into tasks that can execute concurrently?"

Also Known As:

Functional Decomposition.
Task Parallelism.

This pattern addresses the issues raised during a primarily task-based decomposition. The key to an effective task decomposition is ensuring that the tasks are sufficiently independent that maintaining dependencies takes only a small fraction of the program's overall execution time. It is also important to ensure that the execution of the tasks can be equally shared among the ensemble of processing elements (the so-called load-balancing problem).

Applicability:

Use this pattern when

You have reviewed the DecompositionStrategy pattern and decided to try a task-based decomposition of your problem.

Implementation:

It is very rare that a task-based decomposition can be carried out automatically. You have to do this by hand based on your knowledge of the problem and the code required to implement it.

In a task-based decomposition, you look at your problem as a collection of distinct tasks, paying particular attention to

The actions that are carried out to solve your problem.
Whether these actions are distinct and relatively independent.

As a first pass, try to identify as many tasks as possible; it is much easier to start with too many tasks and merge them later on than to start with too few tasks and later try to split them. The key is that the individual tasks execute for the most part independently of each other.

You can find tasks in many different places:

In some cases, each task corresponds to a distinct call to a function in your program. These function calls can be associated with the tasks, leading to what is sometimes called a "functional decomposition".
Another place to find tasks is in distinct iterations within an algorithm. If the iterations are independent and there are enough of them, then they map very well onto tasks in a task-based decomposition. This style of task-based decomposition leads to what are sometimes called "loop-splitting" algorithms.
Tasks also play a key role in data-driven decompositions. In this case, a large data structure is decomposed and multiple units of execution concurrently update different chunks of the data structure. In this case, the tasks are those updates on individual chunks.

This is only a partial list of where you can find tasks. Keep in mind the competing forces mentioned in the DecompositionStrategy pattern:

Flexibility. Your design needs to be flexible in the number of tasks generated. This will let the design adapt to a wide range of parallel computers. Additional flexibility is gained if the definition of the tasks is independent of how they are scheduled for execution.
Efficiency. There are two major efficiency issues to consider in your task decomposition. First, each task must include enough work to compensate for the overhead incurred by creating the tasks and managing their dependencies. Second, the number of tasks must be large enough so that all the units of execution are busy with useful work throughout the computation.
Simplicity. Define tasks to make debugging and maintenance simple. When possible, define tasks so they reuse code from existing sequential programs that solve related problems.

Once you have your tasks, you need to look at the data decomposition implied by the tasks. The DataDecomposition pattern may help you with this analysis.

Examples:

Medical imaging.

Consider the medical imaging problem described earlier (in the Examples section of the DecompositionStrategy pattern). In this application, a point inside a model of the body is selected randomly, a radioactive decay is allowed to occur at this point, and the trajectory of the emitted particle is followed. To create a statistically significant simulation, thousands if not millions of trajectories are followed.

It is natural to associate each trajectory with a task. These tasks are particularly simple to manage concurrently since they are completely independent. Another important point is to make sure there are enough of them so we can support a range of computer systems, from those with a small number of processing elements to massively parallel computers.

With the basic tasks in hand, we can now consider the corresponding data decomposition and define what portions of the problem space need to be associated with each task. In this case, each task needs to hold the information defining the trajectory at any point, but that is all. The more challenging issue is the model of the body. You can't deduce this from the description of the problem we've given, but it turns out that the body model can be huge. Since it is a read-only model, there is no problem if there is an effective shared-memory system, since each task can read data as needed. If the target system uses distributed memory, however, it may be necessary to replicate the body model on each processing element. This can be very time-consuming and can waste a great deal of memory. Thus, for such target systems a data-based decomposition, described in the Examples section of the DataDecomposition pattern, may work better.

Matrix multiplication.

Consider the standard multiplication of two matrices (C = A · B). We can produce a task-based decomposition of this problem by considering the calculation of each element of the product matrix as a separate task. Each task needs access to one row of A and one column of B. This decomposition has the advantage that all the tasks are independent, and because all the data that is shared among tasks (A and B) is read-only, it will likely be straightforward to implement in a shared-memory environment. In a distributed-memory environment, however, the requirement that each task have access to a row of A and a column of B may lead to excessive memory use and/or communication, so this decomposition may not be effective. See the Examples section of the DataDecomposition pattern for a discussion of other decompositions more suited to distributed-memory environments. (In fact, a data-based decomposition may be more effective for most current platforms -- the above task-based decomposition works well only if access to memory elements is roughly uniform, which is not the case with cache-based computers.)

Known Uses:

Task-based decompositions are used in many applications. The distance geometry code (DGEOM) described in [Mattson96] uses a task based decomposition.