Parallel Programming In C. Chap. X



Parallel Programming in C for the Transputer
© D. Thiébaut, 1995



Distributed load-balancing schemes

The Manager-Workers paradigm is easy to implement. Virtual channels allow the creation of a star-shaped network on any physical network with very little effort. The Manager sits at the center of the star and enjoys direct access to the workers. For larger networks, however, this simplicity may not be acceptable due to the delays that may dramatically reduce the processors' utilization. In such cases, distributed methods for load-balancing may be more attractive, and because they require a synchronization that involves only near neighbors, lower performance penalty can be expected in general. In addition, the processor originally implementing the Master can now be given a bigger share of the computation.




Full-exchange heuristic

The method we will be describing is a heuristic. Heuristics for load-balancing are methods that cannot predict the optimal solution, but that usually provide solutions close to optimal. They work by finding a local balance between near neighbors in a network, and by involving the neighbors in different neighborhood. We coin the heuristic we now investigate the full-exchange heuristic, indicating that, during the load balancing steps, the processors exchange their entire load with each-other. Let's consider the example of a toroidal rectangular mesh containing eight processors. Assume that we initially assign them loads of size 1, 2, 3, 4, 5, 6, 7, an 8 units, as shown below.

Starting configuration. The first step of the local balance strategy will be to pair up near neighbors and have them exchange their loads (shown here as the number inside the processor boxes).
First exchange: the arrows indicate the directions of the data exchange. The load resulting from this exchange is determined by the following algorithm: Send(MyLoad); Receive(NeighborLoad); Average = (MyLoad+NeighborLoad)/2; if (MyLoad>Average) MyLoad = Average; else if (NeighborLoad>Average) MyLoad += NeighborLoad-Average; The resulting loads are shown in the processors
The next step repeats the same operations, but this time with different neighbors, as shown by the arrows. The loads start to even out.
The next step is to pair up neighbors on the vertical dimension. This time the final reading of the loads shows even distribution of the work among all the processors.

Figure 10-4: Example of a load-balancing heuristic involving near-neighbors.

We see in this simple example that the loads end up equal at all the processors. This is not always the case. Our initial loads were close enough in value that a global balance was achieved. Still, in practice the repetition of these steps as the computation progresses will reduce the disparity among the loads. We shouldn't jump to the conclusion, however, that global balance is always reachable as long as we can apply the balancing steps often enough! We should remember that the processors constantly gnaw at these loads in different ways, even while the load balancing is taking place.

10-2 An application: Balancing the N-Queen problem

We now take a look at an example of load balancing applied to the N-queen problem. The parallel program is set up to search for a solution containing N queens on an N by N chess-board by positioning a queen on successive rows, starting with the top row of the board, and going down one row at a time, as shown in Figures 9-6 and 9-7. At startup, a processor is given one of the many possible starting positions of the first queen on the first row (N total positions exist). Each processor contains an array representing the chess-board, and from the knowledge of that first position it deduces all the allowed positions of the second queen on Row 2. Each one of these positions represents the leaf of a unary tree of height 2, and the processor records all these trees in a heap, keeping the last tree found as the current tree to which it will try to add a new level.

As the processor progresses it may find that the next row of the chess-board is completely covered by the queens already on the board, and the current tree cannot be further extended. The current tree my therefore been thrown out and a new tree must be obtained from the heap. Because the program maintains the heap as a stack, and because the program always increases the height of its current tree, the tree on top of the stack is always the taller one (other trees in the stack may have the same height). This way, when a processor has reached an impasse and gets a new tree from the heap, it gets one that has the fewest leaves to add, hence a partial solution with the highest number of queens already in place.

The load in each transputer is balanced by a manager-workers scheme, where the root transputer hosts the manager. The manager accesses its workers via a virtual star network (network of virtual channels) that can be mapped over any physical network.

The main program

The goal of the main program is to start two parallel tasks, one in charge of the load balancing, the other one in charge of the search. It is shown in Listing 10-.

/* =======================================================================
   queenslb.c

   DESCRIPTION:
   This is the main module.  It starts the load-balancing process and
   the search for N queens.  The user must supply a constant indicating
   the delay that must be observed between balance operations.  The
   number represents an interval of time expressed in microseconds
   (LoadBalanceDelay).

   Must be linked with balance.c search.c and heap.c.
   This program is independent of the network configuration, and requires
   only to know the number of nodes (NoNodes) in the network.

   TO COMPILE:

   make -f queenslb

 ====================================================================== */
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include "conc.h"
#include "balance.h"
#include "heap.h"
#include "search.h"

/* ============================ PROTOTYPES ============================= */
void GetParams(int argc, char *argv[]);
void DisplayHeapSize(void);

/* ============================= GLOBALS =============================== */
int     NoNodes;
int     N;
int     LoadBalanceDelay;
int     SolutionsWanted=1;
Channel **ChanToRootList;
Channel **ChanFromRootList;
FILE    *ResultFile;

/* -------------------------------------------------------------------- */
/*                                 MAIN                                 */
/* -------------------------------------------------------------------- */
main(int argc, char *argv[])
{
     /*--- maximize the heap ---*/
     _heapend = (void *)0x800FFFFC;

     /*--- get user-defined parameters ---*/
     GetParams(argc, argv);

     /*--- Initialize virtual network for load balancing ---*/
     InitLoadChannels();

Two parallel tasks: one balancing the load...
the other looking for queens.

     StartLoadBalancing(LoadBalanceDelay);

     /*--- start the search ---*/
     FindQueens(N);
     exit(0);
}

/* -------------------------------------------------------------------- */
/* GETPARAMS:                                                           */
/* Get user defined parameters from command line.                       */
/* -------------------------------------------------------------------- */
void GetParams(int argc, char *argv[])
{
     int i;

     if (argc<6)
     {
          if (_node_number==1)
               fprintf(stderr, "Syntax: ld-net %s %s %s %s %s\n",argv[0],
                      "NoNodes","LoadBalanceDelay", "N", "ResultFile",
                      "#SolutionsWanted");
          exit(1);
     }
     NoNodes = atoi(argv[1]);
     LoadBalanceDelay = atoi(argv[2]);
     N = atoi(argv[3]);
     SolutionsWanted = atoi(argv[5]);
     if ((ResultFile = fopen(argv[4], "wt"))==NULL)
     {
          fprintf(stderr, "Cannot open %s\n", argv[4]);
          exit(1);
     }

     if (_node_number==1)
     {
          printf("NoNodes   = %d\n", NoNodes);
          printf("LoadDelay = %d\n", LoadBalanceDelay);
          printf("N         = %d\n", N);
          printf("Result    = %s\n", argv[4]);
          printf("Solutions wanted = %d\n", SolutionsWanted);
          fprintf(ResultFile, "NoNodes   = %d\n", NoNodes);
          fprintf(ResultFile, "LoadDelay = %d\n", LoadBalanceDelay);
          fprintf(ResultFile, "N         = %d\n", N);
          fprintf(ResultFile, "Result    = %s\n", argv[4]);
          fprintf(ResultFile, "Solutions wanted = %d\n", SolutionsWanted);
     }
}

The search module

The search module is stored in search.c and its associated header file search.h.

/* =======================================================================
   search.h
 ====================================================================== */
#ifndef SEARCH_H
#define SEARCH_H

#define QUEEN           -1

void FindQueens(int N);
int  DoQueens(int N);
void Update(INDEX **ChessBoard, int N, int *CurrentTree, int CurrentLeaf);
void SetUpChessBoard(INDEX **ChessBoard, int N, int *CurrentTree);
void Display(INDEX **ChessBoard, int N, int *CurrentTree);
#endif



/* =======================================================================
   search.c

   DESCRIPTION:
   Search algorithm for N-Queen problem.  Does a lexicographic search of
   the valid positions of N queens on an N by N chessboard.
 ====================================================================== */
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <time.h>
#include "conc.h"
#include "balance.h"
#include "heap.h"
#include "search.h"

/* ============================= GLOBALS =============================== */
int    *CurrentTree;
int    CurrentLeaf;
INDEX  **ChessBoard;
int    SolutionCount=0;
extern FILE *ResultFile;
extern int LoadBalanceDelay;

int LocalSolutionFound;

/* -------------------------------------------------------------------- */
/* FINDQUEENS:                                                          */
/* Initializes the data structures and starts the search function.      */
/* -------------------------------------------------------------------- */
void FindQueens(int N)
{
     int i, Result;

     /*--- initialize data structures ---*/
     CurrentTree = safecalloc((int) (N+1), (int) sizeof(int));
     ChessBoard  = safecalloc((int) (N+1), (int) sizeof(INDEX*));
     for (i = 1; i<=N; i++)
          ChessBoard[i] = safecalloc((int) (N+1), (int) sizeof(INDEX));

     /*--- do the work! ---*/
     DoQueens(N);

     /*--- we never return here ---*/
}


/* -------------------------------------------------------------------- */
/* DOQUEENS:                                                            */
/* This is the core of the search module.  Gets a tree from the heap,   */
/* maps it to the chessboard, and finds new valid positions for the next*/
/* queen.  Stores all of them as new trees in the heap, but one that is */
/* kept to continue the search.                                         */
/* -------------------------------------------------------------------- */
int DoQueens(int N)
{
     int i, First, LeavesFound, NextLeaf, temp;

     /*--- Start with initial configuration ---*/
     InitializeHeap(N);

     /*--- keep investigating new SubTreeHeight until empty ---*/
     /*--- heap or found required number of solutions       ---*/
     while (1) 
     {

If the heap is empty, just wait...

          /*--- wait until we can get a search tree.  If heap empty ---*/
          /*--- load balancing may refill it.                       ---*/
          while (!GetTopTreeAndLeaf(CurrentTree, &CurrentLeaf))
               /* wait */;

          /*--- set it up in chessboard ---*/
          SetUpChessBoard(ChessBoard, N, CurrentTree);

          /*--- try to expand the tree to N leaves ---*/
          while (1)
          {
               if (CurrentLeaf==N)          /* we have found a solution */
               {
                    LocalSolutionFound++;   /* tell load-balance module */
                    break;                      /* abandon current tree */
               }

               /*--- Any possible new leaves at lower level? ---*/
               First = 1;
               LeavesFound = 0;
               for (i = 1; i<=N; i++)
                    if (ChessBoard[CurrentLeaf+1][i]==0)
                    {
                         LeavesFound++;
                         if (First)
                         {
                              NextLeaf = i;
                              First = 0;
                         }
                         else
                         {
                              /*--- add 1 leaf to tree ---*/
                              CurrentTree[CurrentLeaf+1] = i;
                              temp = CurrentTree[0];
                              CurrentTree[0] = CurrentLeaf+1;
                              /*--- add it to heap ---*/
                              InsertHeap(CurrentTree);
                              /*--- get original tree back ---*/
                              CurrentTree[0] = temp;
                         }
                    }

               /*--- use first leave found and continue ---*/
               if (LeavesFound)
               {
                    CurrentTree[CurrentLeaf+1] = NextLeaf;
                    CurrentLeaf++;
                    CurrentTree[0] = CurrentLeaf;
                    Update(ChessBoard, N, CurrentTree, CurrentLeaf);
               }
               else
                    break;
          }
     }
     return 1;
}


/* -------------------------------------------------------------------- */
/* UPDATE:                                                              */
/* Updates the chessboard so that all the new positions "taken" by the  */
/* new queen are marked as invalid.                                     */
/* -------------------------------------------------------------------- */
void Update(INDEX **ChessBoard, int N, int *CurrentTree, int CurrentLeaf)
{
     int i, j, leaf, row, column;

     row = CurrentLeaf;
     column = CurrentTree[CurrentLeaf];

     /*--- create row and column "taken"---*/
     for (i = 1; i<=N; i++)
     {
          if (ChessBoard[row][i]!=QUEEN) ChessBoard[row][i] = 1;
          if (ChessBoard[i][column]!=QUEEN) ChessBoard[i][column] = 1;
     }

     /*--- create diagonals "taken" downward ---*/
     for (i = row+1, j = column+1; (i<=N) && (j<=N); i++, j++)
          if (ChessBoard[i][j]!=QUEEN) ChessBoard[i][j] += 1;

     for (i = row+1, j = column-1; (i<=N) && (j>=0); i++, j--)
          if (ChessBoard[i][j]!=QUEEN) ChessBoard[i][j] += 1;

     ChessBoard[row][column] = QUEEN;
}

/* -------------------------------------------------------------------- */
/* SETUPCHESSBOARD:                                                     */
/* Creates the chessboard and positions first queen                     */
/* -------------------------------------------------------------------- */
void SetUpChessBoard(INDEX **ChessBoard, int N, int *CurrentTree)
{
     int i, j, leaf, row, column;

     /*--- clear board first ---*/
     for (i = 1; i<=N; i++)
          for (j = 1; j<=N; j++)
          ChessBoard[i][j] = 0;

     for (leaf = 1; leaf<=CurrentTree[0]; leaf++)
     {
          row = leaf;
          column = CurrentTree[leaf];

          /*--- create row and column "taken"---*/
          for (i = 1; i<=N; i++)
          {
               ChessBoard[row][i] = 1;
               ChessBoard[i][column] = 1;
          }

          /*--- create diagonals "taken" ---*/
          for (i = row+1, j = column+1; (i<=N) && (j<=N); i++, j++)
               ChessBoard[i][j] = 1;

          for (i = row+1, j = column-1; (i<=N) && (j>=1); i++, j--)
               ChessBoard[i][j] = 1;
     }

     /*--- put queens back on chess-board ---*/
     for (leaf = 1; leaf<=CurrentTree[0]; leaf++)
     {
          row = leaf;
          column = CurrentTree[leaf];
          ChessBoard[row][column] = QUEEN;
     }
}

The code is straightforward. The only way in which it differs from a version that would not be associated with a load-balancing scheme is in the following statement:

          /*--- wait until we can get a search tree.  If heap empty ---*/
          /*--- load balancing may refill it.                       ---*/
          while (!GetTopTreeAndLeaf(CurrentTree, &CurrentLeaf))
               /* wait */;

GetTopTreeAndLeaf accesses the heap to retrieve a tree containing queens already found in a valid position on the board. If the heap is empty of trees, GetTopTreeAndLeaf returns 0 to indicate to the caller than no tree could be found. In a normal search, this condition should be used to exit the program and would imply that all possible solutions have been found. Here, however, because of the load-balancing process, it is possible for the heap to empty out, but this should not be a condition for the node to exit, because other nodes may have more trees to investigate, in which case the current node might receive more work soon. Therefore the program forces the transputer to wait and do nothing if the heap is empty. Before we move on to the load-balancing module, let's take a look at the heap module.




[Previous] [HOME] [NEXT]