Preface

#### **Introduction**

In the recent decades, the amount of data produced by scientific, engineering, and life science applications has increased with several orders of magnitude. In parallel with this development, the applications themselves have become increasingly complex in terms of functionality, structure, and behaviour. In the same time, development and production cycles of such applications exhibit a tendency of becoming increasingly shorter, due to factors such as market pressure and rapid evolution of supporting and enabling technologies.

As a consequence, an increasing fraction of the cost of creating new applications and manufacturing processes shifts from the creation of new artifacts to the *adaption* of existing ones. A key component of this activity is the *understanding* of the design, operation, and behavior of existing manufactured artifacts, such as software code bases, hardware systems, and mechanical assemblies. For instance, in the software industry, it is estimated that maintenance costs exceed 80% of the total costs of a software product's lifecycle, and software understanding accounts for as much as half of these maintenance costs.

*Reverse engineering* encompasses the set of activities aiming at (re)discovering the functional, structural, and behavioral semantics of a given artifact, with the aim of leveraging this information for the efficient usage or adaption of that artifact, or the creation of related artifacts. Rediscovery of information is important in those cases when the original information is lost, unavailable, or cannot be efficiently processed within a given application context. Discovery of new information, on the other hand, is important when new application contexts aim at reusing information which is inherently present in the original artifact, but which was not made explicitly available for reuse at the time of creating that artifact.

Reverse engineering has shown increasing potential in various application fields during the last decade, due to a number of technological factors. First, advances in data analysis and data mining algorithms, coupled with an increase of cheap computing power, has made it possible to extract increasingly complex information from raw data, and to structure this information in ways that make it effective for

#### XII Preface

answering specific questions on the function, structure, and behavior of the artifact under study. Secondly, new data sources, such as 3D scanners, cell microarrays, and a large variety of sensors, has made new types of data sources available from which detailed insights about mechanical and living structures can be extracted.

Preface XI

and con's related to the applicability of existing reverse engineering technology to embedded software, they also discuss the specific challenges that embedded software poses to classical reverse engineering, and outline potential directions for improvement.

In **Chapter 2**, Campos *et al*. Present an example of reverse engineering aimed at facilitating the development and maintenance of software applications that include a substantial user interface source code. Starting from the observation that understanding (and thus maintenance) of user interface code is highly challenging due to the typically non-modular structure of such code and its interactions with the remainder of the application, they present a technique and tool that is able to extract user interface behavioral models from the source code of Java applications, and show how these models can be used to reason about the application's usability and

In **Chapter 3**, Favre presents a model-driven architecture approach aimed at supporting program understanding during the evolution and maintenance of large software systems and modernization of legacy systems. Using a combination of static and dynamic analysis, augmented with formal specification techniques, and a new metamodeling language, they show how platform-independent models can be extracted from object-oriented (Java) source code and refined up to the level that they

In **Chapter 4**, Rama *et al.* Show how platform-independent models can be extracted from large, complex business applications. Given that such applications are typically highly heterogeneous, *e.g.* involve several programming languages and systems interacting in a distributed manner, fine-grained reverse engineering as usually done for desktop or embedded applications may not be optimal. The proposed approach focuses on information at the service level. By reusing the platform-independent models extracted, the authors show how substantial cost savings can be done in the development of new

In **Chapter 5**, Li *et al.* present a different aspect of software reverse engineering. Rather than aiming to recover information from source code, they analyze the behavior of several peer-to-peer (P2P) protocols, as implemented by current P2P applications. The aim is to reverse engineer the high-level behavior of such protocols, and how this behavior depends on various parameters such as user behavior and application settings, in order to optimize the protocols for video streaming purposes. As compared to the previous chapters, the target of reverse engineering is here the behavior of an entire set of distributed P2P applications, rather than the structure or behavior of a

In part 2, our focus changes from software artifacts to *physical shapes*. Two main usecases are discussed here. First, methods and techniques for the reverse-engineering of

implementation quality.

single program.

**Part 2: Reverse Engineering Shapes** 

can be reused in different development contexts.

applications on the IBM WebSphere and SAP NetWeaver platforms.

Given the above factors, reverse engineering applications, techniques, and tools have shown a strong development and diversification. However, in the same time, the types of questions asked by end users and stakeholders have become increasingly complex. For example, while a decade ago the reverse engineering of a software application would typically imply extracting the static structure of an isolated code base of tens of thousands of lines of code written in a single programming language, current software reverse engineering aims at extracting structural, behavioral, and evolutionary patterns from enterprise applications of millions of lines of code written in several programming languages, running on several machines, and developed by hundreds of individuals over many years. Similarly, reverse engineering the geometric and mechanical properties of physical shapes has evolved from the extraction of coarse surface models to the generation of part-whole descriptions of complex articulated shapes with the submillimeter accuracy required for manufacturing processes. This has fostered the creation of new reverse engineering techniques and tools.

This book gives an overview of recent advances in reverse engineering techniques, tools, and application domains. The aim of the book is, on the one hand, to provide the reader with a comprehensive sample of the possibilities that reverse engineering currently offers in various application domains, and on the other hand to highlight the current research-level and practical challenges that reverse engineering techniques and tools are faced.

#### **Structure of this book**

To provide a broad view on reverse engineering, the book is divided into three parts: software reverse engineering, reverse engineering shapes, and reverse engineering in medical and life sciences. Each part contains several chapters covering applications, techniques, and tools for reverse engineering relevant to specific use-cases in the respective application domain. An overview of the structure of the book is given below.

#### **Part 1: Software Reverse Engineering**

In part 1, we look at reverse engineering the function, structure, and behavior of large *software-intensive applications*. The main business driver behind software reverse engineering is the increased effort and cost related to maintainting existing software applications and designing new applications that wish to reuse existing legacy software. As this cost increases, getting detailed information on the structure, run-time behavior, and quality attributes of existing software applications becomes highly valuable.

In **Chapter 1**, Kienle *et al.* Give a comprehensive overview of reverse engineering tools and techniques applied to embedded software. Apart from detailing the various pro's and con's related to the applicability of existing reverse engineering technology to embedded software, they also discuss the specific challenges that embedded software poses to classical reverse engineering, and outline potential directions for improvement.

In **Chapter 2**, Campos *et al*. Present an example of reverse engineering aimed at facilitating the development and maintenance of software applications that include a substantial user interface source code. Starting from the observation that understanding (and thus maintenance) of user interface code is highly challenging due to the typically non-modular structure of such code and its interactions with the remainder of the application, they present a technique and tool that is able to extract user interface behavioral models from the source code of Java applications, and show how these models can be used to reason about the application's usability and implementation quality.

In **Chapter 3**, Favre presents a model-driven architecture approach aimed at supporting program understanding during the evolution and maintenance of large software systems and modernization of legacy systems. Using a combination of static and dynamic analysis, augmented with formal specification techniques, and a new metamodeling language, they show how platform-independent models can be extracted from object-oriented (Java) source code and refined up to the level that they can be reused in different development contexts.

In **Chapter 4**, Rama *et al.* Show how platform-independent models can be extracted from large, complex business applications. Given that such applications are typically highly heterogeneous, *e.g.* involve several programming languages and systems interacting in a distributed manner, fine-grained reverse engineering as usually done for desktop or embedded applications may not be optimal. The proposed approach focuses on information at the service level. By reusing the platform-independent models extracted, the authors show how substantial cost savings can be done in the development of new applications on the IBM WebSphere and SAP NetWeaver platforms.

In **Chapter 5**, Li *et al.* present a different aspect of software reverse engineering. Rather than aiming to recover information from source code, they analyze the behavior of several peer-to-peer (P2P) protocols, as implemented by current P2P applications. The aim is to reverse engineer the high-level behavior of such protocols, and how this behavior depends on various parameters such as user behavior and application settings, in order to optimize the protocols for video streaming purposes. As compared to the previous chapters, the target of reverse engineering is here the behavior of an entire set of distributed P2P applications, rather than the structure or behavior of a single program.

#### **Part 2: Reverse Engineering Shapes**

X Preface

tools are faced.

**Structure of this book**

**Part 1: Software Reverse Engineering** 

answering specific questions on the function, structure, and behavior of the artifact under study. Secondly, new data sources, such as 3D scanners, cell microarrays, and a large variety of sensors, has made new types of data sources available from which

Given the above factors, reverse engineering applications, techniques, and tools have shown a strong development and diversification. However, in the same time, the types of questions asked by end users and stakeholders have become increasingly complex. For example, while a decade ago the reverse engineering of a software application would typically imply extracting the static structure of an isolated code base of tens of thousands of lines of code written in a single programming language, current software reverse engineering aims at extracting structural, behavioral, and evolutionary patterns from enterprise applications of millions of lines of code written in several programming languages, running on several machines, and developed by hundreds of individuals over many years. Similarly, reverse engineering the geometric and mechanical properties of physical shapes has evolved from the extraction of coarse surface models to the generation of part-whole descriptions of complex articulated shapes with the submillimeter accuracy required for manufacturing processes. This

detailed insights about mechanical and living structures can be extracted.

has fostered the creation of new reverse engineering techniques and tools.

This book gives an overview of recent advances in reverse engineering techniques, tools, and application domains. The aim of the book is, on the one hand, to provide the reader with a comprehensive sample of the possibilities that reverse engineering currently offers in various application domains, and on the other hand to highlight the current research-level and practical challenges that reverse engineering techniques and

To provide a broad view on reverse engineering, the book is divided into three parts: software reverse engineering, reverse engineering shapes, and reverse engineering in medical and life sciences. Each part contains several chapters covering applications, techniques, and tools for reverse engineering relevant to specific use-cases in the respective application domain. An overview of the structure of the book is given below.

In part 1, we look at reverse engineering the function, structure, and behavior of large *software-intensive applications*. The main business driver behind software reverse engineering is the increased effort and cost related to maintainting existing software applications and designing new applications that wish to reuse existing legacy software. As this cost increases, getting detailed information on the structure, run-time behavior,

In **Chapter 1**, Kienle *et al.* Give a comprehensive overview of reverse engineering tools and techniques applied to embedded software. Apart from detailing the various pro's

and quality attributes of existing software applications becomes highly valuable.

In part 2, our focus changes from software artifacts to *physical shapes*. Two main usecases are discussed here. First, methods and techniques for the reverse-engineering of

#### XIV Preface

the geometry and topology of complex shapes from low-level unorganized 3D scanning data, such as point clouds, are presented. The focus here is on robust extraction of shape information with guaranteed quality properties from such 3D scans, and also on the efficient computation of such shapes from raw scans involving millions of sample points. Secondly, methods and techniques are presented which help the process of *manufacturing* 3D shapes from information which is reverse engineered from previously manufactured shapes. Here, the focus is on guaranteeing required quality and cost related metrics throughout the entire mechanical manufacturing process.

Preface XIII

the step-by-step application of integrated RE and DFMA and highlight the possible

In part 3, our focus changes from industrial artifacts to artifacts related to *medical and life sciences*. Use-cases in this context relate mainly to the increased amounts of data acquired from such application domains which can support more detailed and/or accurate modeling and understanding of medical and biological phenomena. As such, reverse engineering has here a different flavor than in the first two parts of the book: Rather than recovering information lost during an earlier design process, the aim is to extract new information on natural processes in order to best understand the

In **Chapter 10**, Yuji *et al*. present a method to reverse engineer the structure and dynamics of gene regulatory networks (GRNs). High amounts of gene-related data are available from various information sources, *e.g.* gene expression experoments, molecular interaction, and gene ontology databases. The challenges is how to find relationships between transcription factors and their potential target genes, given that one has to deal with noisy datasets containing tens of thousands of genes that act according to different temporal and spatial patterns, strongly interact among each others, and exhibit subsampling. A computational data mining framework is presented which integrates all above-mentioned information sources, and uses genetic algorithms based on particle swarm optimization techniques to find relationships of

In **Chapter 11**, Mayo *et al.* present a reverse engineering activity that aims to create a predictive model of the dynamics of gas transfer (oxygen uptake) in mammalian lungs. The solution involves a combination of geometric modeling of the mammalian lung coarse-scale structure (lung airways), mathematical modeling of the gas transport equations, and an efficient way to solve the emerging system of diffusion-reaction equations by several modeling and numerical approximations. The proposed model is next validated in terms of predictive power by comparing its results with actual experimental measurements. All in all, the reverse engineering of the complex respiratory physical process can be used as an addition or replacement to more costly

In **Chapter 12**, Cernescu *et al.* present a reverse engineering application in the context of dental engineering. The aim is to efficiently and effectively assess the mechanical quality of manufactured complete dentures in terms of their behavior to mechanical stresses *e.g.* detect areas likely to underperform or crack in normal operation mode. The reverse engineering pipeline presented covers the steps of 3D model acquisition by means of scanning and surface reconstruction, creation of a finite element mesh suitable for numerical simulations, and the actual computation of stress and strain

cost savings and related challenges.

dynamics of such processes.

measuring experiments.

factors in presence of induced model defects.

**Part 3: Reverse Engineering in Medical and Life Sciences** 

interest. Results are presented on two different cell datasets.

In **Chapter 6**, Keller *et al.* present a multiresolution method for the extraction of accurate 3D surfaces from unorganized point clouds. Attractive aspects of the method are its simplicity of implementation, ability to capture the shape of complex surface structures with guaranteed connectivity properties, and scalability to real-world point clouds of millions of samples. The method is demonstrated for surface reconstruction of detail object scans as well as for spatially large point clouds obtained from environmental LiDaR scans.

In **Chapter 7**, Kaisarlis presents a systematic approach for geometric and dimensional tolerancing in reverse engineering mechanical parts. Tolerancing is a vital component of the accurate manufacturing process of such parts, both in terms of capturing such variability in a physical model and in terms of extracting tolerancing-related information from existing models and design artifacts using reverse engineering. A methodology is presented where tolerancing is explicitly modeled by means of a family of parameterizable tolerancing elements which can be assembled in tolerance chains. Applications are presented by means of three case studies related to the manufacturing of complex mechanical assemblies for optical sensor devices.

In **Chapter 8**, Chang presents a review of shape design and parameterization in the context of shape reverse engineering. Extracting 3D parameterizable NURBS surfaces from low-level scanned information, also called auto-surfacing, is an important modeling tool, as it allows designers to further modify the extracted surfaces on a high level. Although several auto-surfacing tools and techniques exist, not all satisfy the same requirements and up to the same level. The review discusses nine auto-surfacing tools from the viewpoint of 22 functional and non-functional requirements, and presents detailed evaluations of four such tools in real-world case studies involving auto-surfacing.

In **Chapter 9**, Mello *et al.* present a model for integration of mechanical reverse engineering (RE) with design for manufacturing and assembly (DFMA). Their work is motivated by the perceived added value in terms of lean development and manufacturing for organizations that succeed in combining the two types of activities. Using action research, they investigate the use of integrated RE and DFMA in two companies involved in manufacturing home fixture assemblies and machine measuring instruments respectively. Their detailed studies show concrete examples of the step-by-step application of integrated RE and DFMA and highlight the possible cost savings and related challenges.

#### **Part 3: Reverse Engineering in Medical and Life Sciences**

XII Preface

manufacturing process.

environmental LiDaR scans.

auto-surfacing.

the geometry and topology of complex shapes from low-level unorganized 3D scanning data, such as point clouds, are presented. The focus here is on robust extraction of shape information with guaranteed quality properties from such 3D scans, and also on the efficient computation of such shapes from raw scans involving millions of sample points. Secondly, methods and techniques are presented which help the process of *manufacturing* 3D shapes from information which is reverse engineered from previously manufactured shapes. Here, the focus is on guaranteeing required quality and cost related metrics throughout the entire mechanical

In **Chapter 6**, Keller *et al.* present a multiresolution method for the extraction of accurate 3D surfaces from unorganized point clouds. Attractive aspects of the method are its simplicity of implementation, ability to capture the shape of complex surface structures with guaranteed connectivity properties, and scalability to real-world point clouds of millions of samples. The method is demonstrated for surface reconstruction of detail object scans as well as for spatially large point clouds obtained from

In **Chapter 7**, Kaisarlis presents a systematic approach for geometric and dimensional tolerancing in reverse engineering mechanical parts. Tolerancing is a vital component of the accurate manufacturing process of such parts, both in terms of capturing such variability in a physical model and in terms of extracting tolerancing-related information from existing models and design artifacts using reverse engineering. A methodology is presented where tolerancing is explicitly modeled by means of a family of parameterizable tolerancing elements which can be assembled in tolerance chains. Applications are presented by means of three case studies related to the

In **Chapter 8**, Chang presents a review of shape design and parameterization in the context of shape reverse engineering. Extracting 3D parameterizable NURBS surfaces from low-level scanned information, also called auto-surfacing, is an important modeling tool, as it allows designers to further modify the extracted surfaces on a high level. Although several auto-surfacing tools and techniques exist, not all satisfy the same requirements and up to the same level. The review discusses nine auto-surfacing tools from the viewpoint of 22 functional and non-functional requirements, and presents detailed evaluations of four such tools in real-world case studies involving

In **Chapter 9**, Mello *et al.* present a model for integration of mechanical reverse engineering (RE) with design for manufacturing and assembly (DFMA). Their work is motivated by the perceived added value in terms of lean development and manufacturing for organizations that succeed in combining the two types of activities. Using action research, they investigate the use of integrated RE and DFMA in two companies involved in manufacturing home fixture assemblies and machine measuring instruments respectively. Their detailed studies show concrete examples of

manufacturing of complex mechanical assemblies for optical sensor devices.

In part 3, our focus changes from industrial artifacts to artifacts related to *medical and life sciences*. Use-cases in this context relate mainly to the increased amounts of data acquired from such application domains which can support more detailed and/or accurate modeling and understanding of medical and biological phenomena. As such, reverse engineering has here a different flavor than in the first two parts of the book: Rather than recovering information lost during an earlier design process, the aim is to extract new information on natural processes in order to best understand the dynamics of such processes.

In **Chapter 10**, Yuji *et al*. present a method to reverse engineer the structure and dynamics of gene regulatory networks (GRNs). High amounts of gene-related data are available from various information sources, *e.g.* gene expression experoments, molecular interaction, and gene ontology databases. The challenges is how to find relationships between transcription factors and their potential target genes, given that one has to deal with noisy datasets containing tens of thousands of genes that act according to different temporal and spatial patterns, strongly interact among each others, and exhibit subsampling. A computational data mining framework is presented which integrates all above-mentioned information sources, and uses genetic algorithms based on particle swarm optimization techniques to find relationships of interest. Results are presented on two different cell datasets.

In **Chapter 11**, Mayo *et al.* present a reverse engineering activity that aims to create a predictive model of the dynamics of gas transfer (oxygen uptake) in mammalian lungs. The solution involves a combination of geometric modeling of the mammalian lung coarse-scale structure (lung airways), mathematical modeling of the gas transport equations, and an efficient way to solve the emerging system of diffusion-reaction equations by several modeling and numerical approximations. The proposed model is next validated in terms of predictive power by comparing its results with actual experimental measurements. All in all, the reverse engineering of the complex respiratory physical process can be used as an addition or replacement to more costly measuring experiments.

In **Chapter 12**, Cernescu *et al.* present a reverse engineering application in the context of dental engineering. The aim is to efficiently and effectively assess the mechanical quality of manufactured complete dentures in terms of their behavior to mechanical stresses *e.g.* detect areas likely to underperform or crack in normal operation mode. The reverse engineering pipeline presented covers the steps of 3D model acquisition by means of scanning and surface reconstruction, creation of a finite element mesh suitable for numerical simulations, and the actual computation of stress and strain factors in presence of induced model defects.

#### **Challenges and Opportunities**

From the material presented in this book, we conclude that reverse engineering is an active and growing field with an increasing number of applications. Technological progress is making increasingly more data sources of high accuracy and data volume available. There is also an increased demand across all application domains surveyed for cost-effective techniques able to reduce the total cost of development, operation, and maintenance of complex technical solutions. This demand triggers the need for increasingly accurate and detailed information of the structure, dynamics, and semantics of processes and artifacts involved in such solutions.

Reverse engineering can provide answers in the above directions. However, several challenges still exist. Numerous reverse engineering technologies and tools are still in the research phase, and need to be refined to deliver robust and detailed results on real-world datasets. Moreover, the increase in types of datasets and technologies available to reverse engineering poses challenges in terms of the cost-effective development of end-to-end solutions able to extract valuable insights from such data.

> **Prof. Dr. Alexandru C. Telea** Faculty of Mathematical and Natural Science Institute Johann Bernoulli University of Groningen The Netherlands

**Part 1** 

**Software Reverse Engineering** 

**0**

**1**

<sup>1</sup>*Mälardalen University* <sup>2</sup>*University of Victoria*

> <sup>1</sup>*Sweden* <sup>2</sup>*Canada*

**Software Reverse Engineering in the Domain of**

This chapter focuses on tools and techniques for software reverse engineering in the domain of complex embedded systems. While there are many "generic" reverse engineering techniques that are applicable across a broad range of systems (e.g., slicing (Weiser, 1981)), complex embedded system have a set of characteristics that make it highly desirable to augment these "generic" techniques with more specialized ones. There are also characteristics of complex embedded systems that can require more sophisticated techniques compared to what is typically offered by mainstream tools (e.g., dedicated slicing techniques for embedded systems (Russell & Jacome, 2009; Sivagurunathan et al., 1997)). Graaf et al. (2003) state that "the many available software development technologies don't take into account the specific needs of embedded-systems development . . . Existing development technologies don't address their specific impact on, or necessary customization for, the embedded domain. Nor do these technologies give developers any indication of how to apply them to specific areas in this domain." As we will see, this more general observations applies to reverse

Specifically, our chapter is motivated by the observation that the bulk of reverse engineering research targets software that is outside of the embedded domain (e.g., desktop and enterprise applications). This is reflected by a number of existing review/survey papers on software reverse engineering that have appeared over the years, which do not explicitly address the embedded domain (Canfora et al., 2011; Confora & Di Penta, 2007; Kienle & Müller, 2010; Müller & Kienle, 2010; Müller et al., 2000; van den Brand et al., 1997). Our chapter strives to help closing this gap in the literature. Conversely, the embedded systems community seems to be mostly oblivious of reverse engineering. This is surprising given that maintainability of software is an important concern in this domain according to a study in the vehicular domain (Hänninen et al., 2006). The study's authors "believe that facilitating maintainability of the applications will be a more important activity to consider due to the increasing complexity, long product life cycles and demand on upgradeability of the [embedded] applications."

Embedded systems are an important domain, which we opine should receive more attention of reverse engineering research. First, a significant part of software evolution is happening in this domain. Second, the reach and importance of embedded systems are growing with

**1. Introduction**

engineering as well.

**Complex Embedded Systems**

Holger M. Kienle1, Johan Kraft1 and Hausi A. Müller<sup>2</sup>
