Using a heterogeneous network of workstations to perform computationally intensive real-time graphical animations.

 

David Rashkin

UFID:xxxx-xxxx

COP5615 Termpaper

 

 

Animation places very stringent requirements on a distributed system.  Frames in an animation must be updated 7 – 30 times per second in order to give the illusion of continuous motion.  This means that a new frame must be made available every 33-143 ms. The goal of this paper is to explore the possibility of using a heterogeneous network of workstations to achieve animation of a computationally intensive animation.

 

In some studies of distributed systems, the time taken for message passing is negligible compared to the time taken for computation [6].  Due to the timing constraints inherent in animation, message passing becomes a significant bottleneck, which must be addressed when trying to design a distributed animation system. 

 

Several linear algebra algorithms have been implemented in distributed systems of heterogeneous workstations, with considerable performance gains over conventional workstations [1].  While not quite fast enough for animation, with continuing advances in network and processor speeds, such a system might soon be able to perform highly complex 3D animations.

 

 

Measuring Processing Power

 

In most classical literature, processor capabilities are measured in flops, or as the ratio of individual flops to overall flops for the nodes connected to the system [1,5].  For animation, we need a more precise measurement, measured in milliseconds (or microseconds).  We also need to consider message propagation speeds and bandwidth.  [1] gives a formula for determining the time delay due to communication overhead.:

 

     [EQ1]

Where TC is the time required to transmit nb bytes, TL is the time gap between the processor order to transmit and the beginning of transmission, and TE is the packing time, and LB is the network bandwidth measured in Mbits/s. 

 

So, the processing power of a single node can be calculated by:

 

Flops =        [EQ2]

Where Tr is the time taken for a single node to receive a job, Ts is the time taken to send the results of the job (and the time taken for the coordinating node to receive the results), and flops’ is the node’s power without considering network delays. 

 

Substituting [EQ1] into [EQ2] yields and dividing by the total flops necessary for an animation frame yields the percentage of the workload that an individual node can complete:

 

     [EQ3]

 

If one can safely assume that a worker node only participates in a calculation if it would otherwise be idle, then the flops capabilities of a node can be considered a static value.

 

[3] gives a comparison of two different network message passing techniques.  On a 100Mbit network, 100 bytes can be transmitted (sent and received) in 0.33 ms, while 100 kbytes can be transmitted in 16 ms.

 

 

Distributing the workload

 

There are two distinct types of nodes in this system: consumers (coordinating nodes) of animation frames and producers (worker nodes) of animation frames.  For simplicity, I will assume a single coordinating node and multiple worker nodes.  The coordinating node is responsible for dividing a task into appropriate subtasks, sending those subtasks to the workers, and receiving the responses from the workers. 

Most task assignment algorithms rely on an accurate measurement of node processing power.  For a single coordinating node, measuring the processing power of the individual nodes would require O(n) time.

 

   [EQ4]

Where TSU is time for state update, and N is the number of worker nodes.  If we substitute [EQ1] for T in [EQ3], then we can get a more accurate estimate of how long it will take to send a job out to workers for processing.

 

Node failures introduce more overhead.  Coordinating nodes must be made aware of node failures in a timely manner to avoid sending jobs to dead worker nodes.

 

So, in order to minimize the number of nodes required for the system, the coordinating agent needs to have accurate and up-to-date state information about the worker nodes in the system.

 

 

Probabilistic Distribution

 

In [2], it was determined that probabilistic (random) distribution of jobs resulted in sub-optimal performance for load balancing.  However, probabilistic distribution of jobs could help eliminate some network overhead in that state information need not be considered.  Furthermore, in probabilistic distribution model, worker node failure does not produce any overhead.  The coordinating node can simply broadcast jobs not-yet-completed to all nodes, and remove jobs from the broadcast when worker nodes report the completion of jobs.  In order to reduce the risk of too many worker nodes working on the same job, worker nodes can sleep for a random (and small) amount of time before getting a job from the broadcast (assuming that broadcasts are completely transient).  Since the coordinating node does not keep track of state information, jobs should be split up into equal size parts, and should be small enough so that relatively slow worker nodes can still complete at least one job in the required time.  The coordinating node would continue broadcasting jobs until an animation frame needs to be displayed (33-143 ms), at which point it would display as much of the frame as it has received from worker nodes. 

 

 

Coordinating Node Hierarchy

 

Updating state information can be made less costly by using a hierarchy of coordinating nodes (tree).  The root of the tree is the main coordinating node (the node responsible for beginning a computation, and ultimately for displaying the resulting animation frame).  Tree nodes represent coordinating agents, and leaf nodes represent worker nodes. Each branch of the tree represents message passing overhead.  Each node in the tree can be treated as a worker node by its parent, with the processing power being determined by:

     [EQ5]

Where M is the number of nodes connected to node p, and flopsi is determined by [EQ2].  Note that [EQ2] is dependent on the speed with which a worker node can communicate with its coordinating node (in this case, worker nodes are the children nodes and coordinating nodes are the parent nodes).  Thus, connecting a new node to a parent node in the tree could adversely affect (lower) the flops for that node (a parent might spend a lot of time communicating with a slow child, thus slowing down response times to other children).

 

If network speeds among the coordinating nodes are equal, then [EQ4] can be bounded by O(logn) instead of O(n).

If network speeds among the coordinating nodes are not equal then the nodes with the highest network speeds should appear at the bottom of the tree.  Nodes with slower network speeds should be placed higher up in the tree.  If network speeds among coordinating nodes are not static, then we must deal with updating state information about the nodes in the tree, and re-structuring the tree  based on updated state information.  This would introduce more overhead, which could negate the benefits of using the hierarchy of coordinating nodes.

 

 

 

Conclusions

 

The timing constraints placed on a distributed system by animation currently restricts the usefulness of nodes to high-speed local networks.  Nodes linked by 100Mbit Ethernet networks could make use of a distributed system to perform complex animation sequences (such as fractals), but most ordinary Internet client computers would probably not have sufficient network speeds to contribute to such a distributed system.

 

 

 

References

 

[1] J. Barbosa, J.Tavares and A.J. Padilha, “Linear Algebra Algorithms in a Heterogeneous Cluster of Personal Computers”, Heterogeneous Computing Workshop, 2000. (HCW 2000) Proceedings. 9th , 1 May 2000 Pages:147  - 159

[2] H.D. Karatza, “A Comparison of Load Sharing and Job Scheduling in a Network of Workstations”, I.J. of SIMULATION Vol.4 No 3-4

[3] G. Eisenhauer and L.K. Daley, “Fast Heterogeneous Binary Data Interchange”, Heterogeneous Computing Workshop, 2000. (HCW 2000) Proceedings. 9th , 1 May 2000 Pages:90  - 101

[4] H.D. Karatza and R.C. Hilzer, “Load Sharing in Heterogeneous Distributed Systems”, Proceedings of the 2002 Winter Simulation Conference

[5] L. Xiao, X. Zhang, Y. Qu, “Effective Load Sharing on Heterogeneous Networks of Workstations”, Proceedings of 2000 International Parallel and Distributed Processing Symposium (IPDPS’2000),Cancun,Mexico,May 1-5, 2000

[6]  S. Ali, H.J. Siegel, M. Maheswaran, D. Hensgen, and S. Ali, “Task Execution Time Modeling for Heterogeneous Computing Systems”, Heterogeneous Computing Workshop, 2000. (HCW 2000) Proceedings. 9th , 1 May 2000 Pages:185-199