next up previous
Next: Archetype implementation Up: Introduction: mesh computations Previous: Structuring the parallel program

Sequential vs. parallel

Given the parallelization strategy described in the previous section, a parallel program to accomplish a particular mesh computation closely resembles its sequential counterpart, except that the work has been partitioned between a host process and a number of essentially identical grid processes:

Computing new values for grid-based variables.
Sequential program
loops over the whole grid. Points on the boundary may be treated differently from interior points.
Host process
does nothing.
Grid processes
first ensure that ghost boundaries to be used as input contain current values (via a boundary-exchange operation), then each loop over a local section. Because of the presence of the ghost boundaries, no special handling is required for points on ``internal'' boundaries (points that are on the boundary of a local section but that do not correspond to points on the boundary of the whole array). If points on the boundary of the whole array require different treatment, this is handled by grid processes that contain part of the boundary.

Reading values into a grid-based variable.
Sequential program
reads into a whole array, e.g. from a file.

The parallel program may take several approaches. The most straightforward makes use of the host process:
Host process
reads into its array and then participates in redistribution operation that distributes array values over the process grid.
Grid processes
participate in redistribution operation.

An alternative approach reads data directly into the grid processes:
Host process
does nothing.
Grid processes
each read from a separate sequential file. Each file contains data for one local section.

Writing values from a grid-based variable.
Sequential program
writes a whole array, e.g. to a file.

The parallel program may take several approaches. The most straightforward makes use of the host process:
Host process
participates in redistribution operation that collects array values from the process grid and then writes from its array.
Grid processes
participate in redistribution operation.

An alternative approach writes data directly from the grid processes:
Host process
does nothing.
Grid processes
each write to a separate sequential file. Each file contains data for one local section.

Reading values into a duplicated (non-grid) variable.
Sequential program
reads data (global constants, e.g.) from a file.
Host process
reads data in the same way the sequential program would and then participates in a broadcast operation to copy the data to the grid processes.
Grid processes
participate in a broadcast operation to obtain data from the host process.

Writing values from a duplicated (non-grid) variable.
Sequential program
writes data (results of a reduction operation, e.g.) to a file.
Host process
writes data exactly as the sequential program does. (Usually, the variable whose value is to be written has the same value in all processes -- either because it is a global constant or because it is the result of a reduction operation, as described below.)
Grid processes
do nothing.

Performing a reduction operation.
Sequential program
performs the reduction, often by looping over the whole array.
Host process
participates in the reduction operation -- without, however, supplying data -- and receives the result.
Grid processes
participate in the reduction operation, supplying data and receiving the result. (E.g., to compute a global maximum, each grid process computes a local maximum, and then all processes (host and grid) participate in a reduction operation, after which all processes have the resulting global maximum.)

If the computation does not perform whole-grid reads or writes using the host process, then it is possible to parallelize it without a host process; in that case, the actions performed by the host process in the above descriptions are instead performed by one of the grid processes, which is singled out as the ``designated I/O process''.


next up previous
Next: Archetype implementation Up: Introduction: mesh computations Previous: Structuring the parallel program

Berna L Massingill
Mon Jun 8 19:35:58 PDT 1998