Parallel Computing: Comparing Different Domain Decomposition Techniques

Amy Garrison
Department of Mathematics
Colorado State University
garrison@lagrange.math.colostate.edu

In parallel computing, there are three ways of utilizing the power of parallel architecture: domain decomposition, functional decomposition, and master-slave model. Domain decomposition is achieved by distributing the computational domain of a problem across nodes. Each node is given some subset of the data on which to work. Functional decomposition is achieved by having nodes execute different portions of the same code simultaneously. Each node performs a specific sub-task in the solution. The final method is the master-slave model which is achieved by having one node, or the system host, assign work to the rest of the nodes used by the application.

I chose to use domain decomposition on two problems and wanted to see the correlation between different decompositions verses run time.

Matrix Multiply

The first problem is a matrix multiply. The algorithm below explains the general idea of the first decomposition.

: Give node the ith horizontal strip of A call it A and the ith vertical strip of B call it B .

: Compute all of the C values corresponding to A and B call it C.

: Node passes its B values to node and receives B values from node.

Repeat steps 1 and 2 until node has computed the C values with the B strip and A. Then node would have computed the ith strip of C.

The advantage of this decomposition is the low amount of storage space. Each node only has three strips of each matrix. This method is also faster than the second decomposition.

The second decomposition is as follows:

: Give node the ith vertical strip of A call it A and the ith horizontal strip of B call it B.

: Perform an ordinary matrix multiply with A and B . This will give you an NxN matrix C. Each element in C is a portion of the total sum of the corresponding element in C.

: Call a parallel function GSSUM to create C from the C's. This function call is why the second decomposition takes so much longer.

Let k be the width of each strip of A,B and M equal the total number of nodes.

The following plot indicates the performance of the two matrix multiply methods.

If you wish to see the code click below:

Domain Decomposition 1

Domain Decomposition 2

Jacobi Iteration

The second problem is the Jacobi point iterative method applied to the grid (1,0)X(1,0) for which the boundary conditions are known. N is the number of interior grid points.

where

In this case the exact solution is known.

I considered two different domain decompositions. The first was giving each node a vertical strip of .

After each iteration the internal ''boundary conditions'' have to be shared with neighboring nodes.

The second decomposition was to divide as follows:

Giving each node a block of . After each iteration the internal ''boundary conditions'' are shared as described above. The plot below shows the performance of the two methods.

If you wish to see the code, click below.

Domain Decomposition 1

Domain Decomposition 2

All of the above work was performed on the Paragon located at San Diego Super Computing facilities.