Pattern Name: GettingStarted

FindingConcurrency Design Space


Intent:

This pattern addresses the question "Where do you begin when you want to create a new parallel application?" It is the entry point into the FindingConcurrency design space and will guide you through the first steps of designing a parallel program.

Also Known As:

Motivation:

Creating a parallel application is a daunting task. The standard programming problems faced by every software engineer are still there; for example, you still need to understand the problem specification and the way the program will interact with the world outside the computer. Parallel programmers, however, have the extra problem of managing multiple concurrent tasks in a single program. They must decide how to map these tasks onto a physical collection of processing elements (PEs) and manage the interaction between them. It is easy to be overwhelmed by the magnitude of the effort! Where does one even get started in this complex task?

Fortunately, parallel programming has been around for some decades, so there is a body of knowledge about how to get started with the creation of a parallel program. This pattern exposes that knowledge and tells you what to do as you begin the design of your parallel algorithm.

Applicability:

To see if this pattern will help you, you need to address three issues:

Observe that this pattern may need to take into account considerations outside the scope of the pattern language. Our language doesn't deal directly with such issues as GUIs or I/O, but that doesn't mean these issues can be ignored. For example, the design of the parallel algorithm may have implications for how the program will do I/O, and the design process must take that into account.

Observe also that in designing a parallel algorithm to solve a problem it may be useful to consider approaches that might be less efficient in a sequential program but that have more exploitable concurrency. Iterative numerical methods are a source of examples -- Gauss-Seidel iteration converges more quickly than Jacobi iteration and so is preferred in sequential programs, but Jacobi iteration is easier to parallelize and hence may make for a more effective parallel algorithm. Such considerations are outside the scope of our pattern language, but are something to keep in mind, particularly if it seems difficult to find exploitable concurrency in the problem at hand.

Usage:

To arrive at a parallel program design, this pattern works with the other patterns at this level. This pattern helps you gather the information you need to make the decisions addressed by the DecompositionStrategy, DependencyAnalysis, and DesignEvaluation patterns. See the introduction to this design space for more information about how these patterns relate to each other.

Implementation:

We will assume that if you've made it this far into the pattern, you have an application you would like to speed up using parallelism. So where do you go from here?

First, think about your problem and decide which parts are the most compute-intensive. If you are lucky, these will represent a small portion of the overall problem solution. If this is not the case, parallelization is still possible, but it will require that more of the code be made explicitly parallel.

Once you know which parts of the problem can benefit, or benefit most, from parallelism, you can start designing your parallel algorithm. To do this, you need to understand the tasks that need to be carried out and the data structures that are to be manipulated. At this point, it is important to think about the data structures as abstract objects and not get bogged down in how they will be implemented.

Once this information has been collected and organized, you are ready to move on to the other patterns at this level of the language. These patterns will help you:

It is usually best to follow the above order, though it is common to address dependency issues while working on the basic decomposition. For example, a good choice of decomposition makes it much easier to manage the dependencies between tasks.

Examples:

Medical imaging.

Consider the following problem taken from the field of medical imaging. An important diagnostic tool is to give a patient a radioactive substance and then watch how that substance propagates through the body by looking at the distribution of emitted radiation. Unfortunately, the images are of low resolution, due in part to the scattering of the radiation as it passes through the body. It is also difficult to reason from the absolute radiation intensities, since different pathways through the body attenuate the radiation differently.

To solve this problem, medical imaging specialists build models of how radiation propagates through the body and use these models to correct the images. A common approach is to build a Monte Carlo model. Randomly selected points within the body are assumed to emit radiation (usually a gamma ray), and the trajectory of that ray is followed out to the camera. This is an extremely compute-intensive application taking several hours for a single model.

Since the problem can take several hours to solve sequentially, it is reasonable to consider developing a parallel solution. The first question to ask is what parts of the problem are the most compute-intensive. Based on the preceding description, we can deduce that an application to solve this problem will spend most of its time computing trajectories, so we should focus on doing that part of the computation in parallel. Note that while in this case we can deduce this from the problem description, in many cases the only way to determine which parts of the problem are most compute-intensive is to profile sequential programs that solve the problem.

The next question to ask is what key data structures are employed in the compute-intensive computation. The model of the body through which the trajectories pass is the largest and most complex data structure in the problem. An analysis of the detailed representation is required to really understand this data structure. In this case, the body model is very large, occupying hundreds if not thousands of megabytes.

Now that we know what the key data is and how large it tends to be, we need to understand how it is used. In this case, the body model is read by the routines that compute the trajectories. It is not modified by the program.

At this point, we are ready to decompose the problem. To do this, we move on to the DecompositionStrategy pattern.