DOTS
DOTS is a system environment for building and executing distributed parallel C++ applications.
It integrates a wide range of different computing platforms into a homogeneous parallel
environment. Up to now, DOTS has been deployed on (heterogeneous) clusters composed of the
following platforms: Microsoft Windows 98/NT/2000/XP, Linux, SUN Solaris, SGI IRIX, IBM AIX,
FreeBSD, QNX Realtime Platform, and IBM Parallel Sysplex Cluster.
This research particularly addresses task-parallel applications which involve fully dynamic
problem decomposition, and consequently dynamic task mapping. Such requirements typically arise
when dealing with specific classes of irregular problems for which it is impossible to predict
the run-time of individual parallel tasks. For applications exhibiting these characteristics, it is
inevitable to continuously decompose and distribute tasks, in order to permanently keep all
processors busy. Prominent examples of applications belonging to this class are parallel discrete
optimization and parallel constraint satisfaction.
Parallel Programming Models
DOTS provides several parallel programming models (each represented by an own API) forming
a comprehensive set of tools for realizing distributed parallel applications.
- Task API
The Task API represents the basic API layer of DOTS on which all other APIs are based.
It provides support for DOTS task objects, which are instances of application specific classes.
These are derived from the base class DOTS_Task and implement a run() method. The code
provided in the run() method is executed on its own thread when the task object is s
cheduled for execution. The base class also provides methods for explicit,
program controlled migration of the task object.
- Active Message API
The Active Message API provides support for object-oriented message passing. After a message object
is transferred to its destination node it becomes an active object, i.e. a new thread is
created for the message object on the receiver side that executes application specific code.
- Multithreading API
Multithreaded computations are generalizations of asynchronous (remote) procedure calls. The DOTS
Multithreading API provides a compact set of completely orthogonal primitives for realizing a
comprehensive class of multithreaded computations. The DOTS multithreading programming model is
enhanced with object-oriented features and support for highly irregular non-deterministic
computations.
- Autonomous Task API
The Autonomous Tasks API can be used to realize task objects that operate as mobile agents. In contrast
to standard task objects, the execution of an autonomous task is not determined by the load
distribution mechanism of DOTS. Instead, its execution locations can be explicitly
determined by the programmer. For facilitating the control of autonomous tasks, the API
provides higher level migration primitives, e.g. for realizing round trips of mobile agents
within the distributed environment.
Run-Time System
Generally, dynamic problem decomposition requires explicit load balancing, i.e. tasks have to be
assigned to processors at run-time. Especially for problems with high irregularity, the task pool
model should be employed. It decouples problem decomposition and load balancing by an explicit
data structure holding tasks resulting from dynamic decomposition operations. DOTS provides
support for a centralized and a distributed task-pool execution model.
Tool Support and Visualization
We developed an Eclipse based integrated development environment
for DOTS that enables a seamless work-flow for implementation, execution, analysis, and tuning of
parallel programs. At its core lies a visualization-based approach to performance analyzing
and tuning of highly irregular task-parallel applications.
Executing highly irregular task-parallel applications results in complex
and unstructured execution and communication patterns. Moreover, dynamic
task decomposition and dynamic load balancing techniques inject a
considerable degree of non-determinism. As a result, the manifestation
of a performance impeding phenomenon and its root cause(s) are often
spatially and temporally decoupled. The specific challenge of this research
is to extract and relate performance-relevant information from a highly
unstructured set of execution events. In this situation, visualization
can facilitate to reveal dependencies between different influence
factors that are not easily accessible using pure analytical methods.
Our approach essentially relies on displaying execution graphs using
a novel automatic layout algorithm based on Sugiyama's framework.
Thus, the main perspective of the visualization is closely related to the programming model and is
located on a higher level of abstraction than conventional message passing displays.
This considerably reduces the semantic gap between the visual representation
of computations and the application code. Specifically, our visualization provides
the following information:
- Time flow of execution process with dynamic mapping of tasks to processors
(processors are depicted by vertical swim-lanes in the background)
- Activity and idle times of processors
- Parent/child relationships of tasks
- Data dependencies between tasks
- Transfer activities of tasks between processors (for load balancing)
- Documentation of the internal state of the run-time system for each relevant event
Typically, the first step of analyzing a parallel program run is to check the overall load balancing.
The figures on the right show an overview display of the execution graphs of two highly irregular
parallel computations. The load balancing of the two computations exhibits considerable different quality.
(Idle times of processors are indicated by a red colored background.)
Using the navigation capabilities of our tool one can zoom into the affected regions and investigate
on possible causes of processor idling by examining the local structure of the graph and checking
the state of the run-time system on other processors, for example, sizes of task pools. To see the
effect of modifications, cross execution views can be employed for visual comparison.
For enhancing scalability of our visualization our tool generates lists of possible performance problems,
which can be automatically detected from the event trace. Double-clicking on a list item points the
graph view to the affected region of the execution graph and highlights the relevant graph elements.
E.g. the left figure shows a roaming effect where tasks are passed to several nodes by the load balancing
strategy before they are actually executed. The so called ping-pong effect is a special case
of roaming. Here, a thread is passed back and forth between two nodes before it finally gets
executed. Extensive roaming can be caused by inapt threshold values. Roaming can seriously
increase communication overhead of the computation.
Selected Publications:
- W. Blochinger, M. Kaufmann, and M. Siebenhaller.
Visualization aided performance tuning of irregular task-parallel
computations. Information Visualization, 5(2):81-94, 2006.
- W. Blochinger, M. Kaufmann, and M. Siebenhaller.
Visualizing structural properties of irregular parallel computations.
In Proc. of the 2005 ACM Symposium on Software Visualization
(SoftVis 2005), pages 125-134, Saint Louis, Missouri, USA, 2005.
- W. Blochinger, W. Küchlin, C. Ludwig, and A. Weber.
An object-oriented platform for distributed high-performance
Symbolic Computation. Mathematics and Computers in Simulation, 49:161-178, 1999.
|