Pattern Name: GeometricDecomposition

AlgorithmStructure Design Space

Supporting Document: Examples

 

Mesh Computation Example, OpenMP Implementation:

Generating a parallel version using OpenMP is fairly straightforward. Since OpenMP is a shared-memory environment, there is no need to explicitly partition and distribute the two key arrays (uk and ukp1); computation of new values is implicitly distributed among UEs by the two parallel loop directives, which also implicitly provide the required synchronization (all threads must finish updating ukp1 before any threads start updating uk, and vice versa. The SCHEDULE(STATIC) clauses distribute the iterations of the parallel loops among available threads.

        real uk(1:NX), ukp1(1:NX)
        dx = 1.0/NX
        dt = 0.5*dx*dx
C-------initialization of uk, ukp1 omitted
        do k=1,NSTEPS
C$OMP       PARALLEL DO SCHEDULE(STATIC)
            do i=2,NX-1
                ukp1(i)=uk(i)+(dt/(dx*dx))*(uk(i+1)-2*uk(i)+uk(i-1))
            enddo
C$OMP       END PARALLEL DO
C$OMP       PARALLEL DO SCHEDULE(STATIC)
            do i=2,NX-1
                uk(i)=ukp1(i)
            enddo
C$OMP       END PARALLEL DO

            print_step(k, uk)
        enddo
        end

Mesh Computation Example, MPI Implementation:

An MPI-based program for this example uses the data distribution described in the Motivation section of the main document. The program is an SPMD computation in which each process's code strongly resembles the sequential code for the program, except that the loop to compute values for variable ukp1 is preceded by message-passing operations in which the values of uk in the leftmost and rightmost cells of the subarray are sent to the processes owning the left and right neighbors of the subarray, and corresponding values are received. The following shows the code to be executed by each process; some details are omitted for the sake of simplicity. NP denotes the number of processes.

        real uk(0:(NX/NP)+1), ukp1(0:(NX/NP)+1)
        dx = 1.0/NX
        dt = 0.5*dx*dx
C-------process IDs for this process and its neighbors
        integer my_id, left_nbr, right_nbr, istat(MPI_STATUS_SIZE)
C-------initialization of uk, ukp1 omitted
C-------initialization of my_id, left_nbr, right_nbr omitted
C-------compute loop bounds (must skip ends of original array)
        istart=1
        if (my_id .eq. 0) istart=2
        iend=NX/NP
        if (my_id .eq. NP-1) iend=(NX/NP)-1
C-------main timestep loop
        do k=1,NSTEPS
C-----------exchange boundary information
            if (my_id .ne. 0) then
                call MPI_SEND(uk(1),1,MPI_INTEGER,left_nbr,0,
     $              MPI_COMM_WORLD,ierr)
            endif
            if (my_id .ne. NP-1) then
                call MPI_SEND(uk(NX/NP),1,MPI_INTEGER,right_nbr,0,
     $              MPI_COMM_WORLD,ierr)
            endif
            if (my_id .ne. 0) then
                call MPI_RECV(uk(0),1,MPI_INTEGER,left_nbr,0,
     $              MPI_COMM_WORLD,istat,ierr)
            endif
            if (my_id .ne. NP-1) then
                call MPI_RECV(uk((NX/NP)+1),1,MPI_INTEGER,right_nbr,0,
     $              MPI_COMM_WORLD,istat,ierr)
            endif
C-----------compute new values for this chunk of ukp1
            do i=istart, iend
                ukp1(i)=uk(i)+(dt/(dx*dx))*(uk(i+1)-2*uk(i)+uk(i-1))
            enddo
C-----------copy to uk for next iteration and print
            do i=istart, iend
                uk(i)=ukp1(i)
            enddo
            print_step_distrib(k, uk, my_id)
        enddo
        end