W O L F G A N G   B L O C H I N G E R  :: 

Journal Articles

S. Schulz, W. Blochinger, M. Held, and C. Dangelmayr. COHESION - A microkernel based desktop grid platform for irregular task-parallel applications. Future Generation Computer Systems - The International Journal of Grid Computing: Theory, Methods and Applications, 24(5):354-370, 2008.
[ bib | doi ]

We present Cohesion, a novel approach to Desktop Grid Computing. A major design goal of Cohesion is to enable advanced parallel programming models and application specific frameworks. We focus on methods for irregularly structured task-parallel problems, which require fully dynamic problem decomposition. Cohesion overcomes limitations of classical Desktop Grid platforms by employing Peer-To-Peer principles and a flexible system architecture based on a microkernel approach. Arbitrary modules can be dynamically loaded to replace default functionality, resulting in a platform that can easily adapt to application-specific requirements. We discuss two representative example applications and report on the results of performance experiments that especially consider the high volatility of resources prevailing in a Desktop Grid.
B. Thomaszewski, S. Pabst, and W. Blochinger. Parallel techniques for physically-based simulation on multi-core processor architectures. Computers & Graphics, 32(1):25-40, 2008.
[ bib | doi ]

As multi-core processor systems become more and more widespread, the demand for efficient parallel algorithms also propagates into the field of computer graphics. This is especially true for physically-based simulation, which is notorious for expensive numerical methods. In this work, we explore possibilities for accelerating physically-based simulation algorithms on multi-core architectures. Two components of physically-based simulation represent a great potential for bottlenecks in parallelisation: implicit time integration and collision handling. From the parallelisation point of view these two component s are substantially different. Implicit time integration can be treated efficiently using static problem dec omposition. The linear system arising in this context is solved using a data-parallel preconditioned conjugate gradient algorithm. The collision handling stage, however, requires a different approach, due to its dynamic structure. This stage is handled using multi-threaded programming with fully dynamic task decomposition. In particular, we propose a new task splitting approach based on a reasonable estimation of work, which analyses previous simulation steps. Altogether, the combination of different parallelisation techniques lead s to a concise and yet versatile framework for highly efficient physical simulation.
B. Thomaszewski and W. Blochinger. Physically based simulation of cloth on distributed memory architectures. Parallel Computing, 33(6):377-390, 2007.
[ bib | doi ]

Physically based simulation of cloth in virtual environments is a computationally demanding problem. It involves modeling the internal material properties of the textile (physical modeling) and also treating interactions with the surrounding scene (collision handling). In this paper, we present an approach to parallel cloth simulation designed for distributed memory parallel architectures, particularly clusters built of commodity components. We discuss parallel techniques for the physical modeling phase as well as for the collision handling phase which can significantly reduce the respective computation times. To deal with the very fine granularity of the physical modeling phase we apply a static data decomposition approach based on graph partitioning. In order to cope with the high irregularity of the collision handling phase we employ task parallel techniques based on fully dynamic problem decomposition. We show how both techniques can be integrated into a robust parallel cloth simulation method which can deal with considerably complex scenes.
W. Blochinger, M. Kaufmann, and M. Siebenhaller. Visualization aided performance tuning of irregular task-parallel computations. Information Visualization, 5(2):81-94, 2006.
[ bib | doi ]

This paper deals with a visualization based approach to performance analyzing and tuning of highly irregular task-parallel applications. At its core lies a novel automatic layout algorithm for execution graphs which is based on Sugiyama's framework. Our visualization enables the application designer to reliably detect manifestations of parallel overhead and to investigate on their individual root causes. We particularly focus on structural properties of task-parallel computations which are hard to detect in a more analytical way, e.g. false sharing and false parallelism. In addition, we discuss embedding our visualization into an integrated development environment, realizing a seamless work-flow for implementation, execution, analysis, and tuning of parallel programs.
W. Blochinger, C. Sinz, and W. Küchlin. Parallel propositional satisfiability checking with distributed dynamic learning. Parallel Computing, 29(7):969-994, 2003.
[ bib | doi ]

We address the parallelization and distributed execution of an algorithm from the area of symbolic computation: propositional satisfiability (SAT) checking with dynamic learning. Our parallel programming models are strict multithreading for the core SAT checking procedure, complemented by mobile agents realizing a distributed dynamic learning process. Individual threads treat dynamically created subproblems, while mobile agents collect and distribute pertinent knowledge obtained during the learning process. The parallel algorithm runs on top of our parallel system platform DOTS (Distributed Object-Oriented Threads System), which provides support for our parallel programming models in highly heterogeneous distributed systems. We present performance measurements evaluating the performance gains by our approach in different application domains with practical significance.
W. Blochinger, W. Küchlin, C. Ludwig, and A. Weber. An object-oriented platform for distributed high-performance Symbolic Computation. Mathematics and Computers in Simulation, 49(3):161-178, 1999.
[ bib | .pdf | doi ]

We describe the Distributed Object-Oriented Threads System (DOTS), a programming environment designed to support object-oriented fork-join parallel programming in a heterogeneous distributed environment. A mixed network of Windows NT PC's and UNIX workstations is transformed by DOTS into a homogeneous pool of anonymous compute servers forming together a multicomputer. DOTS is a complete redesign of the Distributed Threads System (DTS) using the object-oriented paradigm both in its internal implementation and in the programming paradigm it supports. It has been used for the parallelization of applications in the field of computer algebra and in the field of computer graphics. We also give a brief account of applications in the domain of symbolic computation that were developed using DTS.

Conference Papers

M. Held and W. Blochinger. Collaborative BPEL design with a rich internet application. In Proc. of the Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), Lyon, France, May 2008, accepted.
[ bib ]

Workflow design is often an effort of distributed and inhomogeneous teams. We present a collaborative workflow design tool, which is implemented as a Rich Internet Application. The tool provides a desktop like user experience, but can easily be embedded into web sites. It enables synchronous as well as asynchronous cooperation of design team members and encourages a well coordinated design process.
B. Thomaszewski, S. Pabst, and W. Blochinger. Exploiting parallelism in physically-based simulations on multi-core processor architectures. In Proc. of Eurographics/ACM Siggraph Symposium on Parallel Graphics and Visualization 2007, pages 69-76, Lugano, Switzerland, May 2007.
[ bib ]

As multi-core processor systems become more and more widespread, the demand for designing efficient parallel algorithms propagates also into th e field of computer graphics. This is especially true for the physically-based simulation, which is notorious for expensive numerical methods. In this paper we explore possibilities for accelerating these algorithms on modern multi-core architectures. As an application we focus on physically-based cloth simulation. In this context, two distinct problems can be identified: the physical model and the collision handling stage both bearing potential bottlenecks for the simulation. From the parallelization point of view these two components are substantially different. The physical model can be treated efficiently using static problem decomposition. The collision handling problem, however, requires a different approach, due to its dynamically changing structure. We address this problem using multi-threaded programming with fully dynamic task decomposition. Furthermore, we propose a new task splitting approach based on a robust work estimate. The associated data is derived from temporal coherence. Altogether, the combination of different parallelization techniques leads to a concise and yet versatile framework for highly efficient physical simulation.
S. Schulz and W. Blochinger. An integrated approach for managing peer-to-peer desktop grid systems. In Proc. of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), pages 233-240, Rio de Janeiro, Brazil, May 2007.
[ bib | doi ]

In this paper we propose a comprehensive integrated management architecture for large-scale Desktop Grid systems. By virtualization of managed entities and automation of repetitive management tasks, we deal wit h the additional complexity induced by the size of these systems. We introduce the concepts of peer-to-pe er and disconnected management to cope with network segmentation and node volatility, which are both intr insic to Desktop Grid systems. By studying real-world use cases, we demonstrate the applicability of our solution.
B. Thomaszewski and W. Blochinger. Parallel simulation of cloth on distributed memory architectures. In Proc. of Eurographics/ACM Siggraph Symposium on Parallel Graphics and Visualization 2006, Braga, Portugal, May 2006.
[ bib ]

The physically based simulation of clothes in virtual environments is a highly demanding problem. It involves both modeling the internal material properties of the textile and the interaction with the surrounding scene. We present a parallel cloth simulation approach designed for distributed memory parallel architectures, in particular clusters built of commodity components. In this paper, we focus on the parallelization of the collision handling phase. In order to cope with the high irregularity of this problem we employ a task parallel approach with fully dynamic problem decomposition. This leads to a robust algorithm, regardless of the complexity of the scene. We report on initial performance measurements indicating the usefulness of our approach.
W. Blochinger, C. Dangelmayr, and S. Schulz. Aspect-oriented parallel discrete optimization on the cohesion desktop grid platform. In Proc. of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid06), pages 49-56, Singapore, May 2006.
[ bib | doi ]

Cohesion is an advanced Desktop Grid platform which lays the system-level foundations for parallel programming models and application specific frameworks. It is designed as an extensible layered architecture built around an industrial strength microkernel component. Cohesion provides and integrates advanced functionality required for parallel computing, especially a scalable group model, failure detection and various methods for collective communication. On top of Cohesion we implemented an aspect-oriented framework for parallel discrete optimization. Our work particulary addresses challenges arising from the highly dynamic nature which is inherent in this type of advanced parallel application, like dynamic problem decomposition, load balancing, termination detection and fault tolerance. We report on initial performance measurements indicating the usefulness of our approach.
W. Blochinger and S. Sick. A user-level connection cache for TCP based applications. In Proc. of the IASTED Intl. Conference Parallel and Distributed Computing and Systems (PDCS 2005), Phoenix, AZ, USA, Nov. 2005. ACTA Press.
[ bib ]

We present a user-level connection caching component that supports a broad range of TCP-based distributed applications. Our work focuses on fully distributed connection caching in order to be able to specifically support applications which exhibit more complex communication patterns than (centralized) client/server communication. We present performance measurements indicating the usefulness of our approach.
W. Blochinger. Towards robustness in parallel SAT solving. In Parallel Computing: Current & Future Issues of High-End Computing (Proc. of the International Conference ParCo 2005), Malaga, Spain, Sept. 2006.
[ bib ]

This paper deals with a novel method for parallel Boolean satisfiability (SAT) solving. Our approach focuses on improving robustness, which plays a key role in enabling practical usability of parallel SAT. Specifically, we adaptively induce competition parallelism into parallel computations based on exploratory decomposition in order to control work-anomalies. We present initial performance measurements indicating the usefulness of our approach.
W. Blochinger, M. Kaufmann, and M. Siebenhaller. Visualizing structural properties of irregular parallel computations. In Proc. of the 2005 ACM Symposium on Software Visualization (SoftVis 2005), pages 125-134, Saint Louis, Missouri, USA, 2005.
[ bib | doi ]

An important task in parallel programming is the appropriate distribution of work on the processors. This distribution is usually dynamically changing and hard to predict, further it is very sensitive to the change of parameters. Even with advanced analysis tools this problem is hard to be solved. We propose to visualize the program structure as it changes over the execution time. We therefore present a new automatic layout algorithm based on Sugiyama's framework, which enables the user to detect structural patterns which might be fatal for the performance of the program - patterns which might be impossible to detect in a more analytical way. Furthermore it assists the user to find appropriate timing parameters for load balancing. We integrate our visualization into an integrated development environment that supports the implementation, execution, and analysis of parallel programs.
W. Blochinger, W. Westje, W. Küchlin, and S. Wedeniwski. ZetaSAT - Boolean satisfiability solving on desktop grids. In Proc. of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2005), Cardiff, UK, 2005.
[ bib | doi ]

ZetaSAT is a research effort to enable efficient parallel Boolean satisfiability (SAT) solving on the Desktop Grid. ZetaSAT is based on the Desktop Grid platform ZetaGrid. Our work particularly addresses specific issues arising when executing constraint satisfaction problems of the kind of SAT in Desktop Grids, like dynamic problem decomposition, load balancing, termination detection, and domain specific fault tolerance. We report on performance measurements indicating the usefulness of our approach.
M. Keckeisen and W. Blochinger. Parallel implicit integration for cloth animations on distributed memory architectures. In Proc. of Eurographics/ACM Siggraph Symposium on Parallel Graphics and Visualization 2004, Grenoble, France, June 2004.
[ bib | .pdf ]

We present a parallel cloth simulation engine designed for distributed memory parallel architectures, in particular clusters built of commodity components. We focus on efficient parallel processing of irregularly structured and real-world sized problems typically occurring in the simulation of garments. We report on performance measurements showing a high degree of parallel efficiency and scalability indicating the usefulness of our approach.
W. Blochinger and W. Küchlin. Cross-platform development of high performance applications using generic programming. In T. Gonzales, editor, Proc. of the IASTED Intl. Conference Parallel and Distributed Computing and Systems (PDCS 2003), volume 2, pages 654-659, Marina del Rey, CA, USA, November 2003. ACTA Press.
[ bib | .pdf ]

We investigate methods for creating highly portable parallel and distributed applications using the programming language C++. The paper elaborates on software engineering issues for efficiently mapping abstract class interfaces to system functionality - typically designed by applying the wrapper facade design pattern - on a large number of target platforms. We introduce a novel mapping technique which is based on generic programming. The paper presents a detailed comparison of the proposed method with two other commonly applied techniques, concluding that our approach represents the most suitable technique for creating highly portable C++ programs in terms of efficiency, ease of portability, and enforcement of correctness.
W. Blochinger and W. Küchlin. The design of an API for strict multithreading in C++. In H. Kosch, L. Böszörményi, and H. Hellwagner, editors, Proc. of 9th Intl. Conf. Euro-Par 2003, number 2790 in LNCS, pages 722-731, Klagenfurt, Austria, August 2003. Springer-Verlag.
[ bib | .pdf ]

This paper deals with the design of an API for building distributed parallel applications in C++ which embody strict multithreaded computations. The API is enhanced with mechanisms to deal with highly irregular non-deterministic computations often occurring in the field of parallel symbolic computation. The API is part of the Distributed Object-Oriented Threads System DOTS. The DOTS environment provides support for realizing strict multithreaded computations on highly heterogeneous networks of workstations.
W. Blochinger, C. Sinz, and W. Küchlin. A universal parallel SAT checking kernel. In H. R. Arabnia and Y. Mun, editors, Proc. of the Intl. Conf. on Parallel and Distributed Processing Techniques and Applications PDPTA 03, volume 4, pages 1720-1725, Las Vegas, NV, U.S.A., June 2003. CSREA Press.
[ bib | .pdf ]

We present a novel approach to parallel Boolean satisfiability (SAT) checking. A distinctive feature of our parallel SAT checker is that it incorporates all essential heuristics employed by the state-of-the-art sequential SAT checking algorithm. This property makes our parallel SAT checker applicable in a wide range of different application domains. For its distributed execution a combination of the strict multithreading and the mobile agent programming model is employed. We give results of run-time measurements for problem instances taken from different application domains, indicating the usefulness of the presented method.
W. Blochinger, C. Sinz, and W. Küchlin. Parallel consistency checking of automotive product data. In G. R. Joubert, A. Murli, F. J. Peters, and M. Vanneschi, editors, Proc. of the Intl. Conf. ParCo 2001: Parallel Computing - Advances and Current Issues, pages 50-57, Naples, Italy, 2002. Imperial College Press.
[ bib | .pdf ]

This paper deals with a parallel approach to the verification of consistency aspects of an industrial product configuration data base. The data base we analyze is used by DaimlerChrysler to check the orders for cars and commercial vehicles of their Mercedes lines. By formalizing the ordering process and employing techniques from symbolic computation we could establish a set of tools that allow the automatic execution of huge series of consistency checks, thereby ultimately enhancing the quality of the product data. However, occasional occurrences of computation intensive checks are a limiting factor for the usability of the tools. Therefore, a prototypical parallel re-implementation using our Distributed Object-Oriented Threads System (DOTS) was carried out. Performance measurements on a heterogeneous cluster of shared-memory multiprocessor Unix workstations and standard Windows PCs revealed considerable speed-ups and substantially reduced the average waiting time for individual checks. We thus arrive at a noticeable improvement in usability of the consistency checking tools.
W. Blochinger, C. Sinz, and W. Küchlin. Distributed parallel SAT checking with dynamic learning using DOTS. In T. Gonzales, editor, Proc. of the IASTED Intl. Conference Parallel and Distributed Computing and Systems (PDCS 2001), pages 396-401, Anaheim, CA, August 2001. ACTA Press.
[ bib | .pdf ]

We present a novel method for distributed parallel automatic theorem proving. Our approach uses a dynamically learning parallel SAT checker incorporating distributed multi-threading and mobile agents. Individual threads process dynamically created subproblems, while agents collect and distribute new knowledge created by the learning process. As parallelization platform we use the Distributed Object-Oriented Threads System (DOTS) that provides support for both distributed threads and mobile agents. We present experiments indicating the usefulness of the presented approach for different application domains.
C. Sinz, W. Blochinger, and W. Küchlin. PaSAT - parallel SAT-checking with lemma exchange: Implementation and applications. In H. Kautz and B. Selman, editors, LICS 2001 Workshop on Theory and Applications of Satisfiability Testing (SAT 2001), volume 9 of Electronic Notes in Discrete Mathematics, Boston, MA, June 2001. Elsevier Science Publishers.
[ bib | .pdf ]

We present PaSAT, a parallel implementation of a Davis-Putnam-style propositional satisfiability checker incorporating dynamic search space partitioning, intelligent backjumping, as well as lemma generation and exchange; the main focus of our implementation is on speeding up propositional encodings of real-world combinatorial problems. We investigate and analyze the speed-ups obtained by parallelization in conjunction with lemma exchange and describe the effects we observed during our experiments. Finally, we present performance measurements from the application of our prover in the areas of formal consistency checking of product documentation, cryptanalysis and hardware verification.
W. Blochinger. Distributed high performance computing in heterogeneous environments with DOTS. In Proc. of Intl. Parallel and Distributed Processing Symp. (IPDPS 2001), page 90, San Francisco, CA, U.S.A., April 2001. IEEE Computer Society Press.
[ bib | .pdf ]

This paper deals with high performance computing in heterogeneous clusters using the Distributed Object-Oriented Threads System (DOTS). DOTS is a parallelization platform that enables the programmer to build distributed parallel applications using a high level and easy-to-use parallel programming paradigm. It integrates a wide range of different systems (from realtime kernels to mainframes) into a single system environment for high performance computing. A special design focus of DOTS is to optimize usability aspects. Additionally, the paper discusses three advanced features of DOTS that especially support its deployment in heterogeneous environments for high performance computing. For defining the parallel environment and for documenting and evaluating the results of program runs an integrated XML centric approach is presented. A light-weight checkpointing mechanism which is seamlessly integrated into DOTS and especially designed for heterogeneous environments is described. DOTS is based on a highly customizable communication kernel. Run time measurements are presented that prove the efficiency of the communication kernel in heterogeneous clusters. Also, performance measurements of a non-trivial parallel application from the field of Symbolic Computation are presented, which revealed good speedups on a heterogeneous cluster of workstations.
W. Blochinger, R. Bündgen, and A. Heinemann. Dependable high performance computing on a Parallel Sysplex cluster. In H. R. Arabnia, editor, Proc. of the Intl. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA 2000), volume 3, pages 1627-1633, Las Vegas, NV, U.S.A., June 2000. CSREA Press.
[ bib | .pdf ]

In this paper we address the issue of dependable distributed high performance computing in the field of Symbolic Computation. We describe the extension of a middleware infrastructure designed for high performance computing with efficient checkpointing mechanisms. As target platform an IBM Parallel Sysplex Cluster is used. We consider the satisfiability checking problem for boolean formulae as an example application from the realm of Symbolic Computation. Time measurements for an implementation of this application on top of the described system environment are given.
R.-D. Schimkat, W. Blochinger, C. Sinz, M. Friedrich, and W. Küchlin. A service-based agent framework for distributed Symbolic Computation. In M. Bubak, R. Williams, H. Afsarmanesh, and B. Hertzberger, editors, Proc. 8th Intl. Conf. on High Performance Computing and Networking Europe, HPCN 2000, number 1823 in LNCS, pages 644-656, Amsterdam, Netherlands, May 2000. Springer-Verlag.
[ bib | .pdf ]

We present Okeanos, a distributed service-based agent framework implemented in Java, in which agents can act autonomously and make use of stationary services. Each agent's behaviour can be controlled individually by a rule-based knowledge component, and cooperation between agents is supported through the exchange of messages at common meeting points (agent lounges). We suggest this general scheme as a new parallelization paradigm for Symbolic Computation, and demonstrate its applicability by an agent-based parallel implementation of a satisfiability (SAT) checker.
M. Meißner, T. Hüttner, W. Blochinger, and A. Weber. Parallel direct volume rendering on PC networks. In H. R. Arabnia, editor, Proc. of the Intl. Conf. on Parallel and Distributed Processing Techniques and Applications (PDPTA '98), Las Vegas, NV, U.S.A., July 1998. CSREA Press.
[ bib | .pdf ]

Volume rendering is becoming a key technology in the field of scientific computing, medical imaging, and many other fields with non-invasive scanner technology. The process of generating images from 3D volumetric data is highly demanding and pushes systems to their computational limits. Many approaches have been presented in the past using networks of workstations to distribute the workload. In contrast, we present parallel direct volume rendering on a network of PCs. To enable communication between processes running on different machines we use an object oriented thread library, called DOTS. This library enables the user to rapidly parallelize its code without getting involved in network issues.
W. Blochinger, W. Küchlin, and A. Weber. The distributed object-oriented threads system DOTS. In A. Ferreira, J. Rolim, H. Simon, and S.-H. Teng, editors, Fifth Intl. Symp. on Solving Irregularly Structured Problems in Parallel (IRREGULAR '98), number 1457 in LNCS, pages 206-217, Berkeley, CA, U.S.A., August 1998. Springer-Verlag.
[ bib | .pdf ]

We describe the design and implementation of the Distributed Object-Oriented Threads System (DOTS). This system is a complete redesign of the Distributed Threads System (DTS) using the object-oriented paradigm both in its internal implementation and in the programming paradigm it supports. DOTS extends the support for fork-join parallel programming from shared memory threads to a distributed environment. It is currently implemented on top of the Adaptive Communication Environment (ACE). A heterogeneous network of Windows NT PC's and of UNIX workstations is transformed by DOTS into a homogeneous pool of anonymous compute servers. DOTS has been used recently in applications from computer graphics and computational number theory. We also discuss the performance characteristics of DOTS for a workstation cluster running under Solaris and a PC network using Windows NT, as they were obtained from a prototypical example.

Misc