DivideAndConquer Design Pattern

Pattern Name: DivideAndConquer

AlgorithmStructure Design Space

Intent:

This pattern is used for parallel applications based on the well-known divide-and-conquer strategy; concurrency is obtained by solving concurrently the subproblems into which the strategy splits the problem.

Motivation:

Consider the divide-and-conquer strategy employed in many sequential algorithms. With this strategy, a problem is solved by splitting it into subproblems, solving them independently, and merging their solutions into a solution for the whole problem. The subproblems can be solved directly, or they can in turn be solved using the same divide-and-conquer strategy, leading to an overall recursive program structure. The potential concurrency in this strategy is not hard to see: Since the subproblems are solved independently, their solutions can be computed concurrently, leading to a parallel program that is very similar to its sequential counterpart. The following figure illustrates the strategy and the potential concurrency.

The divide-and-conquer strategy can be more formally described in terms of the following functions (where N is a constant):

Solution solve(Problem P): Solve a problem (returns its solution).
Problem[] split(Problem P): Split a problem into N subproblems, each strictly smaller than the original problem (returns the subproblems).
Solution merge(Solution[] subS): Merge N subsolutions into solution (returns the merged solution).
boolean baseCase(Problem P): Decide whether a problem is a "base case" that can be solved without further splitting (returns TRUE if base case, FALSE if not).
Solution baseSolve(Problem P): Solve a base-case problem (returns its solution).

The strategy then leads to the following top-level program structure:

    Solution solve(Problem P) {
        if (baseCase(P)) 
            return baseSolve(P);
        else {
            Problem subProblems[N];
            Solution subSolutions[N];
            subProblems = split(P);
            for (int i = 0; i < N; i++)
                subSolutions[i] = solve(subProblems[i]);
            return merge(subSolutions);
       }
   }

If the subproblems of a given problem can be solved independently, then we can solve them in any order we like, including concurrently. This means that we can produce a parallel application by replacing the for loop in the above program with a parallel-for construct, so that the subproblems will be solved concurrently rather than in sequence. It is worth noting at this point that program correctness is independent of whether the subproblems are solved sequentially or concurrently, so we can even design a hybrid program that sometimes solves them sequentially and sometimes concurrently, based on which approach is likely to be more efficient. This will be discussed further in the Implementation section.

We can map this strategy onto a design in terms of tasks by defining one task for each invocation of the solve function, as illustrated in the following figure (rectangular boxes correspond to tasks):

Note the recursive nature of the design, with each task in effect generating and then absorbing a subtask for each subproblem.

Note also that either the split or the merge phase can be essentially absent:

No split phase is needed if all the base-case problems can be derived directly from the whole problem (without recursive splitting). In this case, the overall design will look like the bottom half of our figures.

No merge phase is needed if the problem can be considered solve when all of the base-case problems have been identified and solved. In this case, the overall design will look like the bottom half of our figures.

Applicability:

Use the DivideAndConquer pattern when:

The problem can be solved using the divide-and-conquer strategy, with subproblems being solved independently.

The pattern is particularly effective when:

The amount of work required to solve the base case is large compared to the amount of work required for the recursive splits and merges.
The split produces subproblems of roughly equal size.

Structure:

Implementations of this pattern include the following key elements:

Definitions of the functions described in the Motivation section above (solve, split, merge, baseCase, and baseSolve).
A way of scheduling the tasks that exploits the available concurrency (subproblems can be solved concurrently) efficiently.

Usage:

This pattern is typically used to provide high-level structure for an application; that is, the application is typically structured as an instance of this pattern.

Consequences:

The traditional divide-and-conquer strategy is a widely useful approach to algorithm design. The parallel DivideAndConquer pattern shares this characteristic. It has the additional advantage that a sequential program developed using the divide-and-conquer strategy is almost trivial to parallelize using the DivideAndConquer pattern.
A downside to this pattern is that, as the first figure in the Motivation section suggests, the amount of exploitable concurrency varies over the life of the program; at the outermost level of the recursion (initial split and final merge) there is no exploitable concurrency, while at the innermost level (base-case solves) the number of concurrently-executable tasks is the number of base-case problems (which is often the same as the problem size). Ideally, we would like to always have at least as many concurrently-executable tasks as processors, and clearly this pattern falls short in that respect, and the problem only gets worse as we increase the number of processors, so in general this pattern does not scale well. Effective use of this pattern depends on reducing the fraction of the program's lifespan during which there are fewer concurrently-executable tasks than processors, and there are several factors that contribute to this goal:

Having a problem whose split and merge operations are computationally trivial compared to its base-case solve.
Having a problem size that is large compared to the maximum number of processors available on the target environment.
Reducing the number of levels of recursion required to arrive at the base-case solve by splitting each problem into more subproblems. This generally requires some algorithmic cleverness but can be quite effective, especially in the limiting case of "one-deep divide-and-conquer", in which the initial split is into P subproblems, where P is the number of available processors. See the Related Patterns section for more discussion of this strategy.

As in sequential divide-and-conquer, this pattern is more efficient when the subproblems into which each problem is split are roughly equal in size / computational complexity.

Implementation:

Key issues.

Definitions of functions.

It is usually straightforward to produce a program structure that defines the required functions: What is required is almost the same as the equivalent sequential program, except for code to schedule tasks, as described in the next section.

Scheduling the tasks.

Where a parallel divide-and-conquer program differs from its sequential counterpart is that the parallel version is also responsible for scheduling the tasks in a way that exploits the potential concurrency (subproblems can be solved concurrently) efficiently.

The simplest approach is to simply replace the sequential for loop over subproblems with a parallel-for construct, allowing the corresponding tasks to execute concurrently. (Thus, in the second figure in the Motivation section, the two lower-level splits execute concurrently, the four base-case solves execute concurrently, and the two lower-level merges execute concurrently.) To improve efficiency (as discussed later in "Efficiency considerations"), we can also use a combination of parallel-for constructs and sequential for loops, typically using parallel-for at the top levels of the recursion and sequential for at the more deeply nested levels. In effect, this approach combines parallel divide-and-conquer with sequential divide-and-conquer.

Correctness issues.

Most of the correctness issues in implementing this pattern are the same ones involved in sequential divide-and-conquer. Considering first the sequential divide-and-conquer strategy expressed in our earlier pseudocode for solve, we can guarantee that solve(P) returns a correct solution of P if the other functions meet the following specifications, expressed in terms of preconditions and postconditions. (As before, N is an integer constant.)

Solution baseSolve(Problem P):

Precondition: baseCase(P) = TRUE .

Postcondition: returned value is a solution of P.

Problem[] split(Problem P):

Precondition: baseCase(P) = FALSE .

Postcondition: returned value is an array of N subproblems, each strictly smaller than P, whose solutions can be combined to give a solution of P. Here, "strictly smaller" means smaller with respect to some integer measure that, when small enough, indicates a base-case problem. (This ensures that the recursion is finite.)

Solution merge(Solution[] subS):

Precondition: subS is an array of N solutions such that for some problem P and array of subproblems subP = split(P), subS[i] is a solution of subP[i], for all i from 1 through N.

Postcondition: returned value is a solution of P.

One additional restriction is required to make the concurrency work: Solutions of subproblems must be computed independently. That is, for two distinct subproblems subP[i] and subP[j] of P, solve(subP[i]) and solve(subP[j]) must be computed independently. This will be true if neither call to solve modifies variables shared with the other call.

Efficiency issues.

If problem size is large compared to the number of available processors, at some point in the computation the number of concurrently-executable tasks will exceed the number of processors. If we take the simple approach of always using the parallel-for construct to schedule the tasks corresponding to subproblems, this approach produces a situation in which at some point the number of units of execution exceeds the number of available processors. If such a situation would be inefficient in the target environment (i.e., if context-switching among UEs is expensive), or if there is significant overhead associated with the parallel-for construct, it will probably be more efficient to use the parallel-for construct only for the outer levels of the recursion, switching to a sequential loop when the total number of subproblems (number of subproblems per split multiplied by recursion level) exceeds number of available processors.

Examples:

Mergesort.

Mergesort is a well-known sorting algorithm based on the divide-and-conquer strategy, applied as follows to sort an array of N elements:

The base case is an array of size 1, which is already sorted and can thus be returned without further processing.
In the split phase, the array is split by simply partitioning it into two contiguous subarrays, each of size N/2 (or (N+1)/2 and (N-1)/2, if N is odd).
In the solve-subproblems phase, the two subarrays are sorted (by applying the mergesort procedure recursively).
In the merge phase, the two (sorted) subarrays are recombined into a single sorted array in the obvious way.

This algorithm is readily parallelized by performing the two recursive mergesorts in parallel. Code for this example may be added later.

Matrix diagonalization.

[Dongarra87] describes a parallel algorithm for diagonalizing (computing the eigenvectors and eigenvalues of) a symmetric tridiagonal matrix T. The problem is to find a matrix Q such that Q^T · T · Q is diagonal; the divide-and-conquer strategy goes as follows (omitting the mathematical details):

The base case is a 1-by-1 matrix, which is already diagonal and can be returned without further processing.
The split phase consists of finding matrix T' and vectors u, v, such that T = T' + uv^T, and T' has the form

where T₁ and T₂ are symmetric tridiagonal matrices (which can be diagonalized by recursive calls to the same procedure).

The merge phase recombines the diagonalizations of T₁ and T₂ into a diagonalization of T.

Details can be found in [Dongarra87] or in [Golub89].

Known Uses:

Any introductory algorithms text will have many examples of algorithms based on the divide-and-conquer strategy, most of which can be parallelized with this pattern. (As noted in the Consequences section, however, such parallelizations are not always efficient.)

Other uses of this pattern include:

Leslie Grignard's Fast Multipole Algorithm.
Floating Point Systems' FFT.
Tree-based reductions, particularly for the PRAM model, as described in [JaJa92].
Certain well-known algorithms for solving the N-body problem, for example the Barnes-Hut algorithm and some algorithms of John Salmon.

Related Patterns:

It is interesting to note that just because an algorithm is based on a (sequential) divide-and-conquer strategy does not mean that it must be parallelized with this pattern. A hallmark of this pattern is the recursive arrangement of the tasks, leading to a varying amount of concurrency. Since this can be inefficient, it is often better to rethink the problem such that it can be mapped onto some other pattern, such as GeometricDecomposition or SeparableDependencies.