Liliana Favre

*Universidad Nacional del Centro de la Provincia de Buenos Aires Comisión de Investigaciones Científicas de la Provincia de Buenos Aires Argentina* 

#### **1. Introduction**

24 Will-be-set-by-IN-TECH

54 Reverse Engineering – Recent Advances and Applications

Miller, S. P., Tribble, A. C., Whalen, M. W. & Heimdahl, M. P. (2004). Proving the shalls early

Moore, M. M. (1996). Rule-based detection for reverse engineering user interfaces, *Proceedings of the Third Working Conference on Reverse Engineering, pages 42-8, Monterey, CA* . Müller, H. A., Jahnke, J. H., Smith, D. B., Storey, M.-A., Tilley, S. R. & Wong, K. (2000).

Myers, B. A. (1991). Separating application code from toolkits: Eliminating the spaghetti of

Paiva, A. C. R., Faria, J. C. P. & Mendes, P. M. C. (eds) (2007). *Reverse Engineered Formal Models*

SC4, I. S.-C. (1994). Draft International ISO DIS 9241-11 Standard, International Organization

Stamelos, I., Angelis, L., Oikonomou, A. & Bleris, G. L. (2002). Code quality analysis in open

Stoll, P., John, B. E., Bass, L. & Golden, E. (2008). Preparing usability supporting

Thimbleby, H. & Gow, J. (2008). Applying graph theory to interaction design, *EIS 2007*

Tidwell, J. (2005). *Designing Interfaces: Patterns for Effective Interaction Design*, O' Reilly Media,

Yoon, Y. S. & Yoon, W. C. (2007). Development of quantitative metrics to support

*Interaction Design and Usability*, Springer Berlin / Heidelberg, pp. 316–324. Young, R. M., Green, T. R. G. & Simon, T. (1989). Programmable user models for predictive

UI designer decision-making in the design process, *Human-Computer Interaction.*

evaluation of interface designs, *in* K. Bice & C. Lewis (eds), *CHI'89 Proceedings*, ACM

source software development, *Information Systems Journal* 12: 43–60.

Malardalen University, School of Innovation, Design and Engineering. Systa, T. (2001). Dynamic reverse engineering of Java software, *Technical report*, University of

*of Software Engineering*, ACM, New York, NY, USA, pp. 47–60.

Shan, S. Y. & et al. (2009). Fast centrality approximation in modular networks.

Thomas, J. M. (1976). A complexity measure, *Intern. J. Syst. Sci.* 2(4): 308.

Nielsen, J. (1993). *Usability Engineering*, Academic Press, San Diego, CA.

call-backs, *School of Computer Science* .

*Systems*. Berlin, Germany.

for Standardization.

Tampere, Finland.

Press, NY, pp. 15–19.

pp. 501–519.

Inc.

*and Engineering* .

validation of requirements through formal methods, *Department of Computer Science*

Reverse engineering: a roadmap, *ICSE '00: Proceedings of the Conference on The Future*

*for GUI Testing, 10th International Workshop on Formal Methods for Industrial Critical*

architectural patterns for industrial use. Computer science, Datavetenskap,

Nowadays, almost companies are facing the problematic of having to modernize or replace their legacy software systems. These old systems have involved the investment of money, time and other resources through the ages. Many of them are still business-critical and there is a high risk in replacing them. Therefore, reverse engineering is one of the major challenges for software engineering today.

The most known definition of reverse engineering was given by Chikofsky and Cross (1990): "the process of analyzing a subject system to (i) identify the system's components and their interrelationships and (ii) create representations of the system in another form or at a higher-level of abstraction". Reverse engineering is the process of discovering and understanding software artifacts or systems with the objective of extracting information and providing high-level views of them that can be later on manipulated or re-implemented. That is to say, it is the processes of examination, not a process of change such as forward engineering and reengineering. Forward engineering is the traditional process of moving from high-level abstractions and implementation-independent designs to the physical implementation of a system. On the other hand, software reengineering includes a reverse engineering phase in which abstractions of the software artifacts to be reengineered are built, and a forward engineering phase that moves from abstractions to implementations (Sommerville, 2004) (Canfora &Di Penta, 2007).

Reverse Engineering is also related with software evolution and maintenance. Software evolution is the process of initial development of a software artifact, followed by its maintenance. The ANSI/IEEE standard 729-1983 (Ansi/IEEE, 1984) defines software maintenance "as the modification of a software product after delivery to correct faults, to improve performance or other attributes, or to adapt the product to a changed environment". Reverse engineering techniques can be used as a mean to design software systems by evolving existing ones based on new requirements or technologies. It can start from any level of abstraction or at any stage of the life cycle.

Reverse engineering is hardly associated with modernization of legacy systems that include changes not only in software but in hardware, business processes and organizational strategies and politics. Changes are motivated for multiple reasons, for instance the constantly changing IT technology and the constantly changing business world.

MDA-Based Reverse Engineering 57

The initial diffusion of MDA was focused on its relation with UML as modeling language. However, there are UML users who do not use MDA, and MDA users who use other modeling languages such as some DSL (Domain Specific Language). The essence of MDA is the meta-metamodel MOF (Meta Object Facility) that allows different kinds of artifacts from multiple vendors to be used together in a same project (MOF, 2006). The MOF 2.0 Query, View, Transformation (QVT) metamodel is the standard for expressing transformations

The success of MDA-based reverse engineering depends on the existence of CASE (Computer Aided Software Engineering) tools that make a significant impact on the process automation. Commercial MDA tools have recently begun to emerge. In general, pre-existing UML tools have been extended to support MDA. The current techniques available in these tools provide forward engineering and limited facilities for reverse engineering (CASE MDA, 2011). The Eclipse Modeling Framework (EMF) (Eclipse, 2011) was created for facilitating system modeling and the automatic generation of Java code and several tools aligned with MDA are been developed. Modisco is an official Eclipse project dedicated to Model Driven Reverse Engineering (MDRE) from IT legacy systems and supports reverse

Validation, verification and consistency are crucial activities in the modernization of legacy systems that are critical to safety, security and economic profits. One of the important features for a rigorous development is the combination of tests and proofs. When artifacts at different levels of abstraction are available, a continuous consistency check between them could help to reduce development mistakes, for example checking whether the code is

The progress in the last decade in scalability and incremental verification of basic formal methods could be used as a complement of static and dynamic analysis with tests, assertions and partial formal specification. In this light, this chapter describes MDA reverse engineering of object-oriented code that is based on the integration of traditional compiler techniques such as static and dynamic analysis, metamodeling techniques based on MDA standards and, partial formal specification. We propose to exploit the source code as the most reliable description of both, the behavior of the software and the organization and its business rules. Different principles of reverse engineering are covered, with special emphasis on consistency, testing and verification. We propose a formal metamodeling technique to control the evolution of metamodels that are the essence to achieve interoperability between different software artifacts involved in reverse engineering processes. Rather than requiring that users of transformation tools manipulate formal specification, we want to provide formal semantic to graphical metamodeling notations and develop rigorous tools that permit users to directly manipulate metamodels they have created. As an example, we analyze the reverse engineering of Java code however the bases

The following sections include background on MDA and Case tools, foundations of innovative processes based on formal specification and, challenges and strategic directions

engineering of UML class diagrams (Modisco, 2011).

**1.2 Outline of the chapter** 

consistent with the design or is in compliance with assertions.

of our approach can be easily applied to other object-oriented languages.

that can be adopted in the field of MDA reverse engineering.

(QVT, 2008).

Large number of software systems have been developed and successfully used. These systems resume today key knowledge acquired over the life of the underlying organization; however, many of them have been written for technology which is expensive to maintain and which may not be aligned with current organizational politics.

Over the past two decades, reverse engineering techniques focused mainly on recovering high-level architectures or diagrams from procedural code to face up to problems such as comprehending data structures or databases or the Y2K problem. By the year 2000, many different kinds of slicing techniques were developed and several studies were carried out to compare them.

Over time, a growing demand of object-oriented reengineering appeared on the stage. New approaches were developed to identify objects into legacy code (e.g. legacy code in COBOL) and translate this code into an object-oriented language. Object-oriented programs are essentially dynamic and present particular problems linked to polymorphism, late binding, abstract classes and dynamically typed languages. For example, some object-oriented languages introduce concepts such as reflection and the possibility of loading dynamically classes; although these mechanisms are powerful, they affect the effectiveness of reverse engineering techniques. During the time of object-oriented programming, the focus of software analysis moved from static analysis to dynamic analysis, more precisely static analysis was complemented with dynamic analysis (Fanta & Rajlich, 1998) (Systa, 2000).

When the Unified Modeling Language (UML) comes into the world, a new problem was how to extract higher-level views of the system expressed by different kind of UML diagrams (UML, 2010a) (UML, 2010b). Relevant work for extracting UML diagrams (e.g. class diagram, state diagram, sequence diagram, object diagram, activity diagram and package diagram) from source code was developed (Tonella & Potrich, 2005).

#### **1.1 Reverse engineering today**

To date there are billions upon billions of lines of legacy code in existence, which must be maintained with a high cost. Instead of building from scratch, software industry has realized the advantages of modernizing existing systems. As the demands for modernized legacy systems rise, so does the need for frameworks for integrating or transforming existing systems. The Object Management Group (OMG) has adopted the Model Driven Architecture (MDA) which is an evolving conceptual architecture that aligns with this demand (OMG, 2011) (MDA, 2005).

MDA raises the level of reasoning to a more abstract level that places change and evolution in the center of software development process. The original inspiration around the definition of MDA had to do with the middleware integration problem in internet. Beyond interoperability reasons, there are other good benefits to use MDA such as to improve the productivity, process quality and maintenance costs.

The outstanding ideas behind MDA are separating the specification of the system functionality from its implementation on specific platforms, managing the software evolution from abstract models to implementations increasing the degree of automation and achieving interoperability with multiple platforms, programming languages and formal languages (MDA, 2005).

56 Reverse Engineering – Recent Advances and Applications

Large number of software systems have been developed and successfully used. These systems resume today key knowledge acquired over the life of the underlying organization; however, many of them have been written for technology which is expensive to maintain

Over the past two decades, reverse engineering techniques focused mainly on recovering high-level architectures or diagrams from procedural code to face up to problems such as comprehending data structures or databases or the Y2K problem. By the year 2000, many different kinds of slicing techniques were developed and several studies were carried out to

Over time, a growing demand of object-oriented reengineering appeared on the stage. New approaches were developed to identify objects into legacy code (e.g. legacy code in COBOL) and translate this code into an object-oriented language. Object-oriented programs are essentially dynamic and present particular problems linked to polymorphism, late binding, abstract classes and dynamically typed languages. For example, some object-oriented languages introduce concepts such as reflection and the possibility of loading dynamically classes; although these mechanisms are powerful, they affect the effectiveness of reverse engineering techniques. During the time of object-oriented programming, the focus of software analysis moved from static analysis to dynamic analysis, more precisely static analysis was complemented with dynamic analysis (Fanta & Rajlich, 1998) (Systa, 2000).

When the Unified Modeling Language (UML) comes into the world, a new problem was how to extract higher-level views of the system expressed by different kind of UML diagrams (UML, 2010a) (UML, 2010b). Relevant work for extracting UML diagrams (e.g. class diagram, state diagram, sequence diagram, object diagram, activity diagram and

To date there are billions upon billions of lines of legacy code in existence, which must be maintained with a high cost. Instead of building from scratch, software industry has realized the advantages of modernizing existing systems. As the demands for modernized legacy systems rise, so does the need for frameworks for integrating or transforming existing systems. The Object Management Group (OMG) has adopted the Model Driven Architecture (MDA) which is an evolving conceptual architecture that aligns with this

MDA raises the level of reasoning to a more abstract level that places change and evolution in the center of software development process. The original inspiration around the definition of MDA had to do with the middleware integration problem in internet. Beyond interoperability reasons, there are other good benefits to use MDA such as to improve the

The outstanding ideas behind MDA are separating the specification of the system functionality from its implementation on specific platforms, managing the software evolution from abstract models to implementations increasing the degree of automation and achieving interoperability with multiple platforms, programming languages and formal

package diagram) from source code was developed (Tonella & Potrich, 2005).

and which may not be aligned with current organizational politics.

compare them.

**1.1 Reverse engineering today** 

demand (OMG, 2011) (MDA, 2005).

languages (MDA, 2005).

productivity, process quality and maintenance costs.

The initial diffusion of MDA was focused on its relation with UML as modeling language. However, there are UML users who do not use MDA, and MDA users who use other modeling languages such as some DSL (Domain Specific Language). The essence of MDA is the meta-metamodel MOF (Meta Object Facility) that allows different kinds of artifacts from multiple vendors to be used together in a same project (MOF, 2006). The MOF 2.0 Query, View, Transformation (QVT) metamodel is the standard for expressing transformations (QVT, 2008).

The success of MDA-based reverse engineering depends on the existence of CASE (Computer Aided Software Engineering) tools that make a significant impact on the process automation. Commercial MDA tools have recently begun to emerge. In general, pre-existing UML tools have been extended to support MDA. The current techniques available in these tools provide forward engineering and limited facilities for reverse engineering (CASE MDA, 2011). The Eclipse Modeling Framework (EMF) (Eclipse, 2011) was created for facilitating system modeling and the automatic generation of Java code and several tools aligned with MDA are been developed. Modisco is an official Eclipse project dedicated to Model Driven Reverse Engineering (MDRE) from IT legacy systems and supports reverse engineering of UML class diagrams (Modisco, 2011).

Validation, verification and consistency are crucial activities in the modernization of legacy systems that are critical to safety, security and economic profits. One of the important features for a rigorous development is the combination of tests and proofs. When artifacts at different levels of abstraction are available, a continuous consistency check between them could help to reduce development mistakes, for example checking whether the code is consistent with the design or is in compliance with assertions.

#### **1.2 Outline of the chapter**

The progress in the last decade in scalability and incremental verification of basic formal methods could be used as a complement of static and dynamic analysis with tests, assertions and partial formal specification. In this light, this chapter describes MDA reverse engineering of object-oriented code that is based on the integration of traditional compiler techniques such as static and dynamic analysis, metamodeling techniques based on MDA standards and, partial formal specification. We propose to exploit the source code as the most reliable description of both, the behavior of the software and the organization and its business rules. Different principles of reverse engineering are covered, with special emphasis on consistency, testing and verification. We propose a formal metamodeling technique to control the evolution of metamodels that are the essence to achieve interoperability between different software artifacts involved in reverse engineering processes. Rather than requiring that users of transformation tools manipulate formal specification, we want to provide formal semantic to graphical metamodeling notations and develop rigorous tools that permit users to directly manipulate metamodels they have created. As an example, we analyze the reverse engineering of Java code however the bases of our approach can be easily applied to other object-oriented languages.

The following sections include background on MDA and Case tools, foundations of innovative processes based on formal specification and, challenges and strategic directions that can be adopted in the field of MDA reverse engineering.

MDA-Based Reverse Engineering 59

migration. Architecture Driven Modernization (ADM) is an OMG initiative related to extending the modeling approach to the existing software systems and to the concept of reverse engineering (ADM, 2010). One of ADM standards is Knowledge Discovery Metamodel (KDM) to facilitate the exchange of existing systems meta-data for various modernization tools (KDM, 2011). The following section presents the concepts of model,

A model is a simplified view of a (part of) system and its environments. Models are expressed in a well-defined modeling language. They are centered in a set of diagrams and

For instance, a model could be a set of UML diagrams, OCL specifications and text. MDA distinguishes different kinds of models which go from abstract models that specify the system functionality to platform-dependent and concrete models linked to specific platforms, technologies and implementations. MDA distinguishes at least the following

A CIM describes a system from the computation independent viewpoint that focuses on the environment of and the requirements for the system. It is independent of how the system is implemented. In general, it is called domain model and may be expressed using business models. The CIM helps to bridge the gap between the experts about the domain and the software engineer. A CIM could consist of UML models and other models of requirements. In the context of MDA, a platform "is a set of subsystems and technologies that provides a coherent set of functionality through interfaces and specified usage patterns, which any application supported by that platform can use without concern for the details of how the functionality provided by the platform is implemented". (MDA, 2005). An application refers to a functionality being developed. A system can be described in terms of one or more applications supported by one or more platforms. MDA is based on platform models

A PIM is a view of the system that focuses on the operation of a system from the platform independent viewpoint. Analysis and logical models are typically independent of

A PIM is defined as a set of components and functionalities, which are defined independently of any specific platforms, and which can be realized in platform specific models. A PIM can be viewed as a system model for a technology-neutral virtual machine that includes parts and services defined independently of any specific platform. It can be viewed as an abstraction of a system that can be realized by different platform-specific ways

textual notations that allow specifying, visualizing and documenting systems.

expressed in UML, OCL, and stored in a repository aligned with MOF.

implementation and specific platforms and can be considered PIMs.

on which the virtual machine can be implemented.

metamodel and transformation in more detail.

 Computation Independent Model (CIM) Platform Independent Model (PIM) Platform Specific Model (PSM) Implementation Specific Model (ISM)

**2.1 Basic MDA concepts** 

**2.1.1 Models** 

ones:
