Journal Articles |
| |
S. Schulz, W. Blochinger, M. Held, and C. Dangelmayr.
COHESION - A microkernel based desktop grid platform for irregular
task-parallel applications.
Future Generation Computer Systems - The International Journal
of Grid Computing: Theory, Methods and Applications, 24(5):354-370, 2008.
[ bib |
doi ]
We present Cohesion, a novel approach to Desktop Grid Computing. A major design goal
of Cohesion is to enable advanced parallel programming models and application specific frameworks.
We focus on methods for irregularly structured task-parallel problems, which require fully dynamic
problem decomposition. Cohesion overcomes limitations of classical Desktop Grid platforms by
employing Peer-To-Peer principles and a flexible system architecture based on a microkernel approach.
Arbitrary modules can be dynamically loaded to replace default functionality, resulting in a platform
that can easily adapt to application-specific requirements. We discuss two representative example
applications and report on the results of performance experiments that especially consider the high
volatility of resources prevailing in a Desktop Grid.
|
| |
B. Thomaszewski, S. Pabst, and W. Blochinger.
Parallel techniques for physically-based simulation on multi-core
processor architectures.
Computers & Graphics, 32(1):25-40, 2008.
[ bib |
doi ]
As multi-core processor systems become more and more widespread, the demand for efficient
parallel algorithms also propagates into the field of computer graphics.
This is especially true for physically-based simulation, which is notorious for expensive numerical methods.
In this work, we explore possibilities for accelerating physically-based simulation algorithms on multi-core
architectures. Two components of physically-based simulation represent a great potential for bottlenecks in
parallelisation:
implicit time integration and collision handling. From the parallelisation point of view these two component
s are substantially different. Implicit time integration can be treated efficiently using static problem dec
omposition. The linear system
arising in this context is solved using a data-parallel preconditioned conjugate gradient algorithm.
The collision handling stage, however, requires a different approach, due to its dynamic structure.
This stage is handled using multi-threaded programming with fully dynamic task decomposition.
In particular, we propose a new task splitting approach based on a reasonable estimation of work, which
analyses previous simulation steps. Altogether, the combination of different parallelisation techniques lead
s to a concise and yet versatile framework for highly efficient physical simulation.
|
| |
B. Thomaszewski and W. Blochinger.
Physically based simulation of cloth on distributed memory
architectures. Parallel Computing, 33(6):377-390, 2007.
[ bib |
doi ]
Physically based simulation of cloth in virtual environments is a
computationally demanding problem. It involves modeling the internal material
properties of the textile (physical modeling) and also treating interactions
with the surrounding scene (collision handling).
In this paper, we present an approach to parallel cloth simulation designed for
distributed memory parallel architectures, particularly clusters built of commodity
components. We discuss parallel techniques for the physical modeling phase as well
as for the collision handling phase which can significantly reduce the respective
computation times.
To deal with the very fine granularity of the physical modeling phase we apply a
static data decomposition approach based on graph partitioning. In order to cope
with the high irregularity of the collision handling phase we employ task parallel
techniques based on fully dynamic problem decomposition. We show how both techniques
can be integrated into a robust parallel cloth simulation method which can
deal with considerably complex scenes. |
| |
W. Blochinger, M. Kaufmann, and M. Siebenhaller.
Visualization aided performance tuning of irregular task-parallel
computations. Information Visualization, 5(2):81-94, 2006.
[ bib |
doi ]
This paper deals with a visualization based
approach to performance analyzing and tuning of highly irregular
task-parallel applications. At its core lies a novel automatic
layout algorithm for execution graphs which is based on Sugiyama's
framework. Our visualization enables the application designer to
reliably detect manifestations of parallel overhead and to
investigate on their individual root causes. We particularly focus
on structural properties of task-parallel computations which are
hard to detect in a more analytical way, e.g. false sharing and false parallelism. In addition, we discuss
embedding our visualization into an integrated development
environment, realizing a seamless work-flow for implementation,
execution, analysis, and tuning of parallel programs. |
| |
W. Blochinger, C. Sinz, and W. Küchlin.
Parallel propositional satisfiability checking with distributed
dynamic learning. Parallel Computing, 29(7):969-994, 2003.
[ bib |
doi ]
We address the parallelization and distributed execution of an
algorithm from the area of symbolic computation: propositional satisfiability (SAT)
checking with dynamic learning.
Our parallel programming models are strict multithreading for the
core SAT checking procedure, complemented by mobile agents realizing a distributed
dynamic learning process.
Individual threads treat dynamically created subproblems, while mobile agents collect
and distribute pertinent knowledge obtained during the learning process.
The parallel algorithm runs on top of our parallel system platform DOTS
(Distributed Object-Oriented Threads System), which provides support for our parallel
programming models in highly heterogeneous distributed systems.
We present performance measurements evaluating the performance gains by our
approach in different application domains with practical significance. |
| |
W. Blochinger, W. Küchlin, C. Ludwig, and A. Weber.
An object-oriented platform for distributed high-performance
Symbolic Computation. Mathematics and Computers in Simulation, 49(3):161-178, 1999.
[ bib | .pdf |
doi ]
We describe the Distributed Object-Oriented Threads System (DOTS), a programming environment designed to support
object-oriented fork-join parallel programming in a heterogeneous
distributed environment. A mixed network of
Windows NT PC's and UNIX workstations is transformed by DOTS into
a homogeneous pool of anonymous compute servers forming together a
multicomputer. DOTS is a complete redesign of the Distributed
Threads System (DTS) using the object-oriented paradigm both in
its internal implementation and in the programming paradigm it
supports. It has been used for the parallelization of applications
in the field of computer algebra and in the field of computer
graphics. We also give a brief account of applications in the
domain of symbolic computation that were developed using DTS. |
|
Conference Papers |
|
|
M. Held and W. Blochinger.
Collaborative BPEL design with a rich internet application.
In Proc. of the Eighth IEEE International Symposium on Cluster
Computing and the Grid (CCGrid 2008), Lyon, France, May 2008, accepted.
[ bib ]
Workflow design is often an effort of distributed
and inhomogeneous teams. We present a collaborative workflow
design tool, which is implemented as a Rich Internet Application.
The tool provides a desktop like user experience, but can
easily be embedded into web sites. It enables synchronous as
well as asynchronous cooperation of design team members and
encourages a well coordinated design process.
|
| |
B. Thomaszewski, S. Pabst, and W. Blochinger.
Exploiting parallelism in physically-based simulations on multi-core
processor architectures.
In Proc. of Eurographics/ACM Siggraph Symposium on Parallel Graphics and
Visualization 2007, pages 69-76, Lugano, Switzerland, May 2007.
[ bib ]
As multi-core processor systems become more and more widespread, the demand for designing efficient parallel algorithms propagates also into th
e field of computer graphics. This is especially true for the physically-based
simulation, which is notorious for expensive numerical methods. In this paper we explore possibilities for accelerating
these algorithms on modern multi-core architectures. As an application we focus on physically-based cloth
simulation. In this context, two distinct problems can be identified: the physical model and the collision handling
stage both bearing potential bottlenecks for the simulation. From the parallelization point of view these two
components are substantially different. The physical model can be treated efficiently using static problem decomposition.
The collision handling problem, however, requires a different approach, due to its dynamically changing
structure. We address this problem using multi-threaded programming with fully dynamic task decomposition.
Furthermore, we propose a new task splitting approach based on a robust work estimate. The associated data is
derived from temporal coherence. Altogether, the combination of different parallelization techniques leads to a
concise and yet versatile framework for highly efficient physical simulation. |
| |
S. Schulz and W. Blochinger.
An integrated approach for managing peer-to-peer desktop grid
systems.
In Proc. of the Seventh IEEE International Symposium on Cluster
Computing and the Grid (CCGrid 2007), pages 233-240, Rio de Janeiro,
Brazil, May 2007.
[ bib |
doi ]
In this paper we propose a comprehensive integrated management architecture for large-scale Desktop Grid
systems. By virtualization of managed entities and automation of repetitive management tasks, we deal wit
h the additional complexity induced by the size of these systems. We introduce the concepts of peer-to-pe
er and disconnected management to cope with network segmentation and node volatility, which are both intr
insic to Desktop Grid systems. By studying real-world use cases, we demonstrate the applicability of our
solution. |
| |
B. Thomaszewski and W. Blochinger.
Parallel simulation of cloth on distributed memory architectures.
In Proc. of Eurographics/ACM Siggraph Symposium on Parallel Graphics and
Visualization 2006, Braga, Portugal, May 2006.
[ bib ]
The physically based simulation of clothes in virtual environments
is a highly demanding problem. It involves both modeling the
internal material properties of the textile and the interaction
with the surrounding scene. We present a parallel cloth simulation
approach designed for distributed memory parallel architectures,
in particular clusters built of commodity components. In this
paper, we focus on the parallelization of the collision handling
phase. In order to cope with the high irregularity of this problem
we employ a task parallel approach with fully dynamic problem
decomposition. This leads to a robust algorithm, regardless of the
complexity of the scene. We report on initial performance
measurements indicating the usefulness of our approach. |
| |
W. Blochinger, C. Dangelmayr, and S. Schulz.
Aspect-oriented parallel discrete optimization on the cohesion
desktop grid platform.
In Proc. of the Sixth IEEE International Symposium on Cluster
Computing and the Grid (CCGrid06), pages 49-56, Singapore, May 2006.
[ bib |
doi ]
Cohesion is an advanced Desktop Grid platform
which lays the system-level foundations for parallel programming
models and application specific frameworks. It is designed as an
extensible layered architecture built around an industrial strength
microkernel component. Cohesion provides and integrates advanced
functionality required for parallel computing, especially
a scalable group model, failure detection and various methods
for collective communication.
On top of Cohesion we implemented an aspect-oriented framework
for parallel discrete optimization. Our work particulary
addresses challenges arising from the highly dynamic nature
which is inherent in this type of advanced parallel application,
like dynamic problem decomposition, load balancing, termination
detection and fault tolerance.
We report on initial performance measurements indicating the
usefulness of our approach. |
| |
W. Blochinger and S. Sick.
A user-level connection cache for TCP based applications.
In Proc. of the IASTED Intl. Conference Parallel and
Distributed Computing and Systems (PDCS 2005), Phoenix, AZ, USA, Nov.
2005. ACTA Press.
[ bib ]
We present a user-level connection caching component that
supports a broad range of TCP-based distributed applications.
Our work focuses on fully distributed connection
caching in order to be able to specifically support applications
which exhibit more complex communication patterns
than (centralized) client/server communication. We present
performance measurements indicating the usefulness of our
approach. |
| |
W. Blochinger.
Towards robustness in parallel SAT solving.
In Parallel Computing: Current & Future Issues of High-End
Computing (Proc. of the International Conference ParCo 2005), Malaga,
Spain, Sept. 2006.
[ bib ]
This paper deals with a novel method for parallel Boolean satisfiability (SAT) solving. Our approach
focuses on improving robustness, which plays a key role in enabling practical usability of
parallel SAT. Specifically, we adaptively induce competition parallelism into parallel computations
based on exploratory decomposition in order to control work-anomalies. We present initial performance
measurements indicating the usefulness of our approach. |
| |
W. Blochinger, M. Kaufmann, and M. Siebenhaller.
Visualizing structural properties of irregular parallel computations.
In Proc. of the 2005 ACM Symposium on Software Visualization
(SoftVis 2005), pages 125-134, Saint Louis, Missouri, USA, 2005.
[ bib |
doi ]
An important task in parallel programming is the appropriate
distribution of work on the processors. This distribution is
usually dynamically changing and hard to predict, further it is
very sensitive to the change of parameters. Even with advanced
analysis tools this problem is hard to be solved. We propose to
visualize the program structure as it changes over the execution
time. We therefore present a new automatic layout algorithm based
on Sugiyama's framework, which enables the user to detect
structural patterns which might be fatal for the performance of
the program - patterns which might be impossible to detect in a
more analytical way. Furthermore it assists the user to find
appropriate timing parameters for load balancing. We integrate our
visualization into an integrated development environment that
supports the implementation, execution, and analysis of parallel
programs. |
| |
W. Blochinger, W. Westje, W. Küchlin, and S. Wedeniwski.
ZetaSAT - Boolean satisfiability solving on desktop grids.
In Proc. of the IEEE International Symposium on Cluster
Computing and the Grid (CCGrid 2005), Cardiff, UK, 2005.
[ bib |
doi ]
ZetaSAT is a research effort to enable efficient parallel Boolean
satisfiability (SAT) solving on the Desktop Grid. ZetaSAT is based
on the Desktop Grid platform ZetaGrid. Our work particularly
addresses specific issues arising when executing constraint
satisfaction problems of the kind of SAT in Desktop Grids, like
dynamic problem decomposition, load balancing, termination
detection, and domain specific fault tolerance. We report on
performance measurements indicating the usefulness of our approach. |
| |
M. Keckeisen and W. Blochinger.
Parallel implicit integration for cloth animations on distributed
memory architectures.
In Proc. of Eurographics/ACM Siggraph Symposium on Parallel Graphics and
Visualization 2004, Grenoble, France, June 2004.
[ bib | .pdf ]
We present a parallel cloth simulation engine designed for distributed memory parallel architectures, in particular
clusters built of commodity components. We focus on efficient parallel processing of irregularly structured and
real-world sized problems typically occurring in the simulation of garments. We report on performance measurements
showing a high degree of parallel efficiency and scalability indicating the usefulness of our approach. |
| |
W. Blochinger and W. Küchlin.
Cross-platform development of high performance applications using
generic programming.
In T. Gonzales, editor, Proc. of the IASTED Intl. Conference
Parallel and Distributed Computing and Systems (PDCS 2003), volume 2,
pages 654-659, Marina del Rey, CA, USA, November 2003. ACTA Press.
[ bib | .pdf ]
We investigate methods for creating highly portable parallel and distributed
applications using the programming language C++. The paper
elaborates on software engineering issues for efficiently mapping
abstract class interfaces to system functionality - typically
designed by applying the wrapper facade design pattern - on a
large number of target platforms. We introduce a novel mapping
technique which is based on generic programming. The paper
presents a detailed comparison of the proposed method with two
other commonly applied techniques, concluding that our approach
represents the most suitable technique for creating highly
portable C++ programs in terms of efficiency, ease of
portability, and enforcement of correctness. |
| |
W. Blochinger and W. Küchlin.
The design of an API for strict multithreading in C++.
In H. Kosch, L. Böszörményi, and H. Hellwagner, editors, Proc. of 9th Intl. Conf. Euro-Par 2003, number 2790 in LNCS, pages
722-731, Klagenfurt, Austria, August 2003. Springer-Verlag.
[ bib | .pdf ]
This paper deals with the design of an API for building distributed parallel
applications in C++ which embody strict multithreaded computations.
The API is enhanced with mechanisms to deal with highly irregular
non-deterministic computations often occurring in the field of
parallel symbolic computation.
The API is part of the Distributed Object-Oriented Threads System DOTS.
The DOTS environment provides support for realizing strict multithreaded
computations on highly heterogeneous networks of workstations. |
| |
W. Blochinger, C. Sinz, and W. Küchlin.
A universal parallel SAT checking kernel.
In H. R. Arabnia and Y. Mun, editors, Proc. of the Intl. Conf. on Parallel and Distributed Processing Techniques and Applications
PDPTA 03, volume 4, pages 1720-1725, Las Vegas, NV, U.S.A., June 2003.
CSREA Press.
[ bib | .pdf ]
We present a novel approach to parallel Boolean
satisfiability (SAT) checking. A distinctive feature of our
parallel SAT checker is that it incorporates all
essential heuristics employed by the state-of-the-art sequential
SAT checking algorithm. This property makes our parallel
SAT checker applicable in a wide range of different application
domains. For its distributed execution a combination of
the strict multithreading and the mobile agent programming model
is employed. We give results of run-time measurements for problem
instances taken from different application domains, indicating the
usefulness of the presented method. |
| |
W. Blochinger, C. Sinz, and W. Küchlin.
Parallel consistency checking of automotive product data.
In G. R. Joubert, A. Murli, F. J. Peters, and M. Vanneschi, editors, Proc. of the Intl. Conf. ParCo 2001: Parallel Computing -
Advances and Current Issues, pages 50-57, Naples, Italy, 2002. Imperial
College Press.
[ bib | .pdf ]
This paper deals with a parallel approach
to the verification of consistency aspects of an industrial
product configuration data base. The data base we analyze is used
by DaimlerChrysler to check the orders for cars and commercial
vehicles of their Mercedes lines. By formalizing the ordering
process and employing techniques from symbolic computation we
could establish a set of tools that allow the automatic execution
of huge series of consistency checks, thereby ultimately enhancing
the quality of the product data. However, occasional occurrences
of computation intensive checks are a limiting factor for the
usability of the tools. Therefore, a prototypical parallel
re-implementation using our Distributed Object-Oriented Threads
System (DOTS) was carried out. Performance measurements on a
heterogeneous cluster of shared-memory multiprocessor Unix
workstations and standard Windows PCs revealed considerable
speed-ups and substantially reduced the average waiting time for
individual checks. We thus arrive at a noticeable improvement in
usability of the consistency checking tools. |
| |
W. Blochinger, C. Sinz, and W. Küchlin.
Distributed parallel SAT checking with dynamic learning using
DOTS.
In T. Gonzales, editor, Proc. of the IASTED Intl. Conference
Parallel and Distributed Computing and Systems (PDCS 2001), pages
396-401, Anaheim, CA, August 2001. ACTA Press.
[ bib | .pdf ]
We present a novel method for distributed
parallel automatic theorem proving. Our approach uses a
dynamically learning parallel SAT checker incorporating
distributed multi-threading and mobile agents. Individual
threads process dynamically created subproblems, while agents
collect and distribute new knowledge created by the learning
process. As parallelization platform we use the Distributed
Object-Oriented Threads System (DOTS) that provides support for
both distributed threads and mobile agents. We present experiments
indicating the usefulness of the presented approach for different
application domains. |
| |
C. Sinz, W. Blochinger, and W. Küchlin.
PaSAT - parallel SAT-checking with lemma exchange: Implementation
and applications.
In H. Kautz and B. Selman, editors, LICS 2001 Workshop on
Theory and Applications of Satisfiability Testing (SAT 2001), volume 9 of Electronic Notes in Discrete Mathematics, Boston, MA, June 2001.
Elsevier Science Publishers.
[ bib | .pdf ]
We present PaSAT, a parallel implementation of a Davis-Putnam-style
propositional satisfiability checker incorporating dynamic search
space partitioning, intelligent backjumping, as well as lemma
generation and exchange; the main focus of our implementation is
on speeding up propositional encodings of real-world combinatorial
problems. We investigate and analyze the speed-ups obtained by
parallelization in conjunction with lemma exchange and describe
the effects we observed during our experiments. Finally, we
present performance measurements from the application of our
prover in the areas of formal consistency checking of product
documentation, cryptanalysis and hardware verification. |
| |
W. Blochinger.
Distributed high performance computing in heterogeneous environments
with DOTS.
In Proc. of Intl. Parallel and Distributed Processing Symp.
(IPDPS 2001), page 90, San Francisco, CA, U.S.A., April 2001. IEEE
Computer Society Press.
[ bib | .pdf ]
This paper deals with high performance computing in heterogeneous
clusters using the Distributed Object-Oriented Threads System
(DOTS). DOTS is a parallelization platform that enables the
programmer to build distributed parallel applications using a high
level and easy-to-use parallel programming paradigm. It integrates
a wide range of different systems (from realtime kernels to
mainframes) into a single system environment for high performance
computing. A special design focus of DOTS is to optimize usability
aspects. Additionally, the paper discusses three advanced
features of DOTS that especially support its deployment in
heterogeneous environments for high performance computing. For
defining the parallel environment and for documenting and
evaluating the results of program runs an integrated XML centric
approach is presented. A light-weight checkpointing mechanism
which is seamlessly integrated into DOTS and especially designed
for heterogeneous environments is described. DOTS is based on a
highly customizable communication kernel. Run time measurements
are presented that prove the efficiency of the communication
kernel in heterogeneous clusters. Also, performance measurements
of a non-trivial parallel application from the field of Symbolic
Computation are presented, which revealed good speedups on a
heterogeneous cluster of workstations. |
| |
W. Blochinger, R. Bündgen, and A. Heinemann.
Dependable high performance computing on a Parallel Sysplex
cluster.
In H. R. Arabnia, editor, Proc. of the Intl. Conf. on
Parallel and Distributed Processing Techniques and Applications (PDPTA
2000), volume 3, pages 1627-1633, Las Vegas, NV, U.S.A., June 2000. CSREA
Press.
[ bib | .pdf ]
In this paper we address the issue of dependable distributed
high performance computing in the field of Symbolic Computation.
We describe the extension of a middleware infrastructure designed
for high performance computing with efficient checkpointing
mechanisms. As target platform an IBM Parallel Sysplex Cluster is
used. We consider the satisfiability checking problem for boolean
formulae as an example application from the realm of Symbolic
Computation. Time measurements for an implementation of this
application on top of the described system environment are
given. |
| |
R.-D. Schimkat, W. Blochinger, C. Sinz, M. Friedrich, and W. Küchlin.
A service-based agent framework for distributed Symbolic
Computation.
In M. Bubak, R. Williams, H. Afsarmanesh, and B. Hertzberger,
editors, Proc. 8th Intl. Conf. on High Performance Computing and
Networking Europe, HPCN 2000, number 1823 in LNCS, pages 644-656,
Amsterdam, Netherlands, May 2000. Springer-Verlag.
[ bib | .pdf ]
We present Okeanos, a distributed service-based
agent framework implemented in Java, in which agents can act
autonomously and make use of stationary services. Each agent's
behaviour can be controlled individually by a rule-based knowledge
component, and cooperation between agents is supported through the
exchange of messages at common meeting points (agent
lounges). We suggest this general scheme as a new parallelization
paradigm for Symbolic Computation, and demonstrate its
applicability by an agent-based parallel implementation of a
satisfiability (SAT) checker. |
| |
M. Meißner, T. Hüttner, W. Blochinger, and A. Weber.
Parallel direct volume rendering on PC networks.
In H. R. Arabnia, editor, Proc. of the Intl. Conf. on
Parallel and Distributed Processing Techniques and Applications
(PDPTA '98), Las Vegas, NV, U.S.A., July 1998. CSREA Press.
[ bib | .pdf ]
Volume rendering is becoming a key technology in the field of scientific
computing, medical imaging, and many other fields with
non-invasive scanner technology. The process of generating images
from 3D volumetric data is highly demanding and pushes systems to
their computational limits. Many approaches have been presented in
the past using networks of workstations to distribute the
workload. In contrast, we present parallel direct volume rendering
on a network of PCs. To enable communication between processes
running on different machines we use an object oriented thread
library, called DOTS. This library enables the user to rapidly
parallelize its code without getting involved in network issues. |
| |
W. Blochinger, W. Küchlin, and A. Weber.
The distributed object-oriented threads system DOTS.
In A. Ferreira, J. Rolim, H. Simon, and S.-H. Teng, editors, Fifth Intl. Symp. on Solving Irregularly Structured Problems in Parallel
(IRREGULAR '98), number 1457 in LNCS, pages 206-217, Berkeley, CA, U.S.A.,
August 1998. Springer-Verlag.
[ bib | .pdf ]
We describe the design and implementation of the Distributed
Object-Oriented Threads System (DOTS). This system is a
complete redesign of the Distributed Threads System (DTS)
using the object-oriented paradigm both in its internal
implementation and in the programming paradigm it supports. DOTS
extends the support for fork-join parallel programming
from shared memory threads to a distributed environment. It is
currently implemented on top of the Adaptive Communication
Environment (ACE). A heterogeneous network of Windows NT PC's and
of UNIX workstations is transformed by DOTS into a homogeneous
pool of anonymous compute servers. DOTS has been used recently in
applications from computer graphics and computational number
theory. We also discuss the performance characteristics of DOTS
for a workstation cluster running under Solaris and a PC network
using Windows NT, as they were obtained from a prototypical
example. |
|
|
Misc |
|
|