Preface

Chapter 9 **Discretization of Random Fields Representing Material**

Chapter 11 **Use of Renewable Energy for Electrification of Rural**

**Urban: A Case Study of Tanzania 183**

Chapter 12 **Time Series and Renewable Energy Forecasting 207** Mahmoud Ghofrani and Anthony Suherli

Chapter 10 **Energy Savings in EAF Steelmaking by Process Simulation and**

Urbanus F Melkior, Josef Tlustý and Zdeněk Müller

Ireneusz Czmoch

**VI** Contents

Panagiotis Sismanis

**Properties and Distributed Loads in FORM Analysis 141**

**Data-Science Modeling on the Reproduced Results 163**

**Community to Stop Migration of Youth from Rural Area to**

The new technology and system communication advances are being employed in any sys‐ tem, being more complex. The system dependability considers the technical complexity, size, and interdependency of the system. The stochastic characteristic together with the com‐ plexity of the systems as dependability requires to be under control the Reliability, Availa‐ bility, Maintainability, and Safety (RAMS). The dependability contemplates, therefore, the faults/failures, downtimes, stoppages, worker errors, etc. Dependability also refers to emer‐ gent properties, i.e., properties generated indirectly from other systems by the system ana‐ lyzed [1]. Dependability, understood as general description of system performance, requires advanced analytics that are considered in this book. Dependability management and engi‐ neering are covered with case studies and best practices [2].

This book presents 12 chapters. Chapter 1 is an Introductory Chapter. Chapter 2 shows the modeling strategies to improve the dependability of cloud infrastructures. Continuous any‐ thing for distributed research projects is considered in Chapter 3. A practical perspective of software fault injection is studied in Chapter 4. Chapter 5 shows a stochastic reward netbased modeling approach for availability quantification of data center systems. Chapter 6 presents a reliability and aging analysis on SRAMS within microprocessor systems. Advan‐ ces in engineering software for multicore systems are described in Chapter 7. Modeling quality of service techniques for packetswitched networks is analyzed in Chapter 8. Discreti‐ zation of random fields representing material properties and distributed loads in FORM analysis is drawn in Chapter 9. Chapter 10 considers the energy savings in EAF steelmaking by process simulation and data science modeling on the reproduced results. Chapter 11 presents the use of renewable energy for electrification of rural community to stop migra‐ tion of youth from rural area to urban, with a case study of Tanzania. Finally, a case study of reliability in renewable energy systems is studied in Chapter 12.

The diversity of the issues is covered in this book from algorithms, mathematical models, and software engineering, by design methodologies and technical or practical solutions. This book intends to provide the reader with a comprehensive overview of the current state of the art, case studies, hardware and software solutions, analytics, and data science in de‐ pendability engineering.

> **Fausto Pedro García Márquez** Ingenium Research Group University of Castilla-La Mancha, Spain

**Mayorkinos Papaelias** School of Metallurgy and Materials Birmingham University, United Kingdom

#### **References**

[1] F. P. G. Márquez and J. M. C. Muñoz, "A pattern recognition and data analysis method for maintenance management," International Journal of Systems Science, vol. 43, pp. 1014-1028, 2012.

**Chapter 1**

**Provisional chapter**

**Introductory Chapter: Introduction to Dependability**

**Introductory Chapter: Introduction to Dependability** 

Cloud computing presents some challenges that are needed to be overcome, such as planning infrastructures that maintain availability when failure events and repair activities occur [1]. Cloud infrastructure planning, which addresses the dependability aspects, is an essential activity because it ensures business continuity and client satisfaction. Redundancy mechanisms cold standby, warm standby, and hot standby can be allocated to components of the cloud infrastructure to maintain the availability levels agreed in SLAs. Mathematical formalisms based on state space, such as stochastic Petri nets and based on combinatorial as reliability block diagrams [2], can be adopted to evaluate the dependability of cloud infrastructures considering the allocation of different redundancy mechanisms to its components [3]. Chapter 1 shows the adoption of the mathematical formalisms' stochastic Petri nets and reliability block diagrams to dependability evaluation of cloud infrastructures with different

International research projects involve large distributed teams made up of multiple institutions. Chapter 2 describes research artifacts that need to work together in order to demonstrate and ship the project results. Yet, in these settings, the project itself is almost never in the core interest of the partners in the consortium. This leads to a weak integration incentive and, consequently, to last minute efforts. This in turn results in Big Bang Integration that imposes huge stress on the consortium and produces only non-sustainable results. In contrast, the industry has been profiting from the introduction of agile development methods backed by "continuous delivery," "continuous integration," and "continuous deployment" [4]. Chapter 2 identifies shortcomings of this approach for research projects. It shows how to overcome those in

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.5772/intechopen.77013

**Engineering**

**1. Introduction**

redundancy mechanisms.

**Engineering**

Mayorkinos Papaelias

Fausto Pedro García Márquez and

Fausto Pedro García Márquez and

http://dx.doi.org/10.5772/intechopen.77013

Additional information is available at the end of the chapter

Mayorkinos PapaeliasAdditional information is available at the end of the chapter

[2] F. P. G. Márquez, I. P. G. Pardo, and M. R. M. Nieto, "Competitiveness based on logistic management: a real case study," Annals of Operations Research, vol. 233, pp. 157-169, 2015.

#### **Introductory Chapter: Introduction to Dependability Engineering Introductory Chapter: Introduction to Dependability Engineering**

DOI: 10.5772/intechopen.77013

Fausto Pedro García Márquez and Mayorkinos Papaelias Fausto Pedro García Márquez and

Additional information is available at the end of the chapter Mayorkinos PapaeliasAdditional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.77013

#### **1. Introduction**

**References**

VIII Preface

vol. 43, pp. 1014-1028, 2012.

pp. 157-169, 2015.

[1] F. P. G. Márquez and J. M. C. Muñoz, "A pattern recognition and data analysis method for maintenance management," International Journal of Systems Science,

[2] F. P. G. Márquez, I. P. G. Pardo, and M. R. M. Nieto, "Competitiveness based on logistic management: a real case study," Annals of Operations Research, vol. 233,

> Cloud computing presents some challenges that are needed to be overcome, such as planning infrastructures that maintain availability when failure events and repair activities occur [1]. Cloud infrastructure planning, which addresses the dependability aspects, is an essential activity because it ensures business continuity and client satisfaction. Redundancy mechanisms cold standby, warm standby, and hot standby can be allocated to components of the cloud infrastructure to maintain the availability levels agreed in SLAs. Mathematical formalisms based on state space, such as stochastic Petri nets and based on combinatorial as reliability block diagrams [2], can be adopted to evaluate the dependability of cloud infrastructures considering the allocation of different redundancy mechanisms to its components [3]. Chapter 1 shows the adoption of the mathematical formalisms' stochastic Petri nets and reliability block diagrams to dependability evaluation of cloud infrastructures with different redundancy mechanisms.

> International research projects involve large distributed teams made up of multiple institutions. Chapter 2 describes research artifacts that need to work together in order to demonstrate and ship the project results. Yet, in these settings, the project itself is almost never in the core interest of the partners in the consortium. This leads to a weak integration incentive and, consequently, to last minute efforts. This in turn results in Big Bang Integration that imposes huge stress on the consortium and produces only non-sustainable results. In contrast, the industry has been profiting from the introduction of agile development methods backed by "continuous delivery," "continuous integration," and "continuous deployment" [4]. Chapter 2 identifies shortcomings of this approach for research projects. It shows how to overcome those in

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

order to adopt all three continuous methodologies regarding that scope. It also presents a conceptual, as well as a tooling framework, to realize the approach as "continuous anything." As a result, integration becomes a core element of the project plan. It distributes and shares the responsibility of integration work among all partners, while at the same time clearly holding individuals responsible for dedicated software components. Through a high degree of automation, it keeps the overall integration work low, but still provides immediate feedback on the quality of the software. Overall, it is found that this concept is useful and beneficial in several EU-funded research projects, where it significantly lowered integration effort and improved quality of the software components, while also enhancing collaboration as a whole.

hardware parallelism. For both new and legacy applications, successful parallelization is often subject to high cost and price [11]. Chapter 6 proposes a set of methods that employ an optimistic semiautomatic approach, which enables programmers to exploit parallelism on modern hardware architectures. It provides a set of methods, including an LLVM-based tool, to help programmers identify the most promising parallelization targets and understand the key types of parallelism. The approach reduces the manual effort needed for parallelization. A contribution of this work is an efficient profiling method to determine the control and data dependences for performing parallelism discovery or other types of code analysis. A method for detecting code sections is presented, where parallel design patterns might be applicable and suggesting relevant code transformations. The approach efficiently reports detailed runtime data dependences. It accurately identifies opportunities for parallelism and the appro-

Introductory Chapter: Introduction to Dependability Engineering

http://dx.doi.org/10.5772/intechopen.77013

3

Quality of service is the ability to provide different priorities to applications, users or data flows, or to guarantee a certain level of performance to a data flow [12, 13]. Chapter 7 uses timed Petri nets to model techniques that provide the quality of service in packet-switched networks and illustrate the behavior of developed models by performance characteristics of simple examples. These performance characteristics are obtained by discrete event simulation

Condition monitoring system is usually employed in structural health monitoring [16, 17]. The reliability analysis of more complicated structures usually deals with the finite element method (FEM) models. The random fields (material properties and loads) have to be represented by random variables assigned to random field elements. The adequate distribution functions and covariance matrices should be determined for a chosen set of random variables [18]. This procedure is called discretization of a random field. Chapter 8 presents the discretization of the random field for material properties with the help of the spatial averaging method of the one-dimensional homogeneous random field and midpoint method of discretization of the random field. The second part of Chapter 8 deals with the discretization of random fields representing distributed loads. In particular, the discretization of the distributed load imposed on a Bernoulli beam is presented in detail. A numerical example demonstrates very good agreement of the reliability indices computed with the help of stochastic finite element method (SFEM) and first-order reliability method (FORM) analyses with the results

Electric arc furnace (EAF)-based process route in modern steelmaking for the production of plates and special quality bars requires a series of stations for the secondary metallurgy treatment (ladle furnace (LF), and potentially vacuum degasser), till the final casting for the production of slabs and blooms in the corresponding continuous casting machines. However, since every steel grade has its own melting characteristics, the melting (liquidus) temperature per grade is generally different and plays an important role to the final casting temperature, which has to exceed by somewhat the melting temperature by an amount called superheat. The superheat is adjusted at the LF station by the operator who decides mostly on personal experience but, since the ladle has to pass from downstream processes, the liquid steel loses temperature, not only due to the duration of the processes till casting but also due to the ladle

priate type of parallelism to use as task based or loop based.

of analyzed models [14, 15].

obtained from analytical formulae.

Software fault injection (SFI) is an acknowledged method for assessing the dependability of software systems. After reviewing the state of the art of SFI, Chapter 3 addresses the challenge of integrating it deeper into software development practice. It is presented with a well-defined development methodology incorporating SFI (fault injection driven development, FIDD), which begins by systematically constructing a dependability and failure cause model [5], from which relevant injection techniques, points, and campaigns are derived [6]. The possibilities and challenges are analyzed for the end-to-end automation of such campaigns. The suggested approach can substantially improve the accessibility of dependability assessment in everyday software engineering practice.

Availability quantification and prediction of IT infrastructure in data centers are of paramount importance for online business enterprises. Chapter 4 presents comprehensive availability models for practical case studies in order to demonstrate a state space stochastic reward net model for typical data center systems for quantitative assessment of system availability [7]. A stochastic reward net model of a virtualized server system, and also a data center network based on DCell topology and a conceptual data center for disaster tolerance are presented. The systems are then evaluated against various metrics of interest, including steady state availability, downtime and downtime cost, and sensitivity analysis.

A majority of transistors in a modern microprocessor are used to implement static random access memories (SRAM) [8]. Therefore, it is important to analyze the reliability of SRAM blocks. During SRAM design, it is important to build in design margins to achieve an adequate lifetime. The two main wear-out mechanisms that increase a transistor's threshold voltage are bias temperature instability (BTI) and hot carrier injections (HCI). BTI and HCI can degrade transistors' driving strength, and further weaken circuit performance. In a microprocessor, first level (L1) caches are frequently accessed, which makes it especially vulnerable to BTI and HCI. In Chapter 5, the cache lifetimes due to BTI and HCI are studied for different cache configurations, namely, cache size, associativity, cache line size, and replacement algorithm. To give a case study, the failure probability (reliability) and the hit rate (performance) of the L1 cache in a LEON3 microprocessor are analyzed while the microprocessor is running a set of benchmarks [9]. Essential insights can be provided from the results to give better performance reliability trade-offs for cache designers.

The vast amounts of data to be processed by today's applications demand higher computational power [10]. To meet application requirements and achieve reasonable application performance, it becomes increasingly profitable or even necessary, to exploit any available hardware parallelism. For both new and legacy applications, successful parallelization is often subject to high cost and price [11]. Chapter 6 proposes a set of methods that employ an optimistic semiautomatic approach, which enables programmers to exploit parallelism on modern hardware architectures. It provides a set of methods, including an LLVM-based tool, to help programmers identify the most promising parallelization targets and understand the key types of parallelism. The approach reduces the manual effort needed for parallelization. A contribution of this work is an efficient profiling method to determine the control and data dependences for performing parallelism discovery or other types of code analysis. A method for detecting code sections is presented, where parallel design patterns might be applicable and suggesting relevant code transformations. The approach efficiently reports detailed runtime data dependences. It accurately identifies opportunities for parallelism and the appropriate type of parallelism to use as task based or loop based.

order to adopt all three continuous methodologies regarding that scope. It also presents a conceptual, as well as a tooling framework, to realize the approach as "continuous anything." As a result, integration becomes a core element of the project plan. It distributes and shares the responsibility of integration work among all partners, while at the same time clearly holding individuals responsible for dedicated software components. Through a high degree of automation, it keeps the overall integration work low, but still provides immediate feedback on the quality of the software. Overall, it is found that this concept is useful and beneficial in several EU-funded research projects, where it significantly lowered integration effort and improved

Software fault injection (SFI) is an acknowledged method for assessing the dependability of software systems. After reviewing the state of the art of SFI, Chapter 3 addresses the challenge of integrating it deeper into software development practice. It is presented with a well-defined development methodology incorporating SFI (fault injection driven development, FIDD), which begins by systematically constructing a dependability and failure cause model [5], from which relevant injection techniques, points, and campaigns are derived [6]. The possibilities and challenges are analyzed for the end-to-end automation of such campaigns. The suggested approach can substantially improve the accessibility of dependability assessment

Availability quantification and prediction of IT infrastructure in data centers are of paramount importance for online business enterprises. Chapter 4 presents comprehensive availability models for practical case studies in order to demonstrate a state space stochastic reward net model for typical data center systems for quantitative assessment of system availability [7]. A stochastic reward net model of a virtualized server system, and also a data center network based on DCell topology and a conceptual data center for disaster tolerance are presented. The systems are then evaluated against various metrics of interest, including steady state

A majority of transistors in a modern microprocessor are used to implement static random access memories (SRAM) [8]. Therefore, it is important to analyze the reliability of SRAM blocks. During SRAM design, it is important to build in design margins to achieve an adequate lifetime. The two main wear-out mechanisms that increase a transistor's threshold voltage are bias temperature instability (BTI) and hot carrier injections (HCI). BTI and HCI can degrade transistors' driving strength, and further weaken circuit performance. In a microprocessor, first level (L1) caches are frequently accessed, which makes it especially vulnerable to BTI and HCI. In Chapter 5, the cache lifetimes due to BTI and HCI are studied for different cache configurations, namely, cache size, associativity, cache line size, and replacement algorithm. To give a case study, the failure probability (reliability) and the hit rate (performance) of the L1 cache in a LEON3 microprocessor are analyzed while the microprocessor is running a set of benchmarks [9]. Essential insights can be provided from the results to give better per-

The vast amounts of data to be processed by today's applications demand higher computational power [10]. To meet application requirements and achieve reasonable application performance, it becomes increasingly profitable or even necessary, to exploit any available

availability, downtime and downtime cost, and sensitivity analysis.

formance reliability trade-offs for cache designers.

quality of the software components, while also enhancing collaboration as a whole.

in everyday software engineering practice.

2 Dependability Engineering

Quality of service is the ability to provide different priorities to applications, users or data flows, or to guarantee a certain level of performance to a data flow [12, 13]. Chapter 7 uses timed Petri nets to model techniques that provide the quality of service in packet-switched networks and illustrate the behavior of developed models by performance characteristics of simple examples. These performance characteristics are obtained by discrete event simulation of analyzed models [14, 15].

Condition monitoring system is usually employed in structural health monitoring [16, 17]. The reliability analysis of more complicated structures usually deals with the finite element method (FEM) models. The random fields (material properties and loads) have to be represented by random variables assigned to random field elements. The adequate distribution functions and covariance matrices should be determined for a chosen set of random variables [18]. This procedure is called discretization of a random field. Chapter 8 presents the discretization of the random field for material properties with the help of the spatial averaging method of the one-dimensional homogeneous random field and midpoint method of discretization of the random field. The second part of Chapter 8 deals with the discretization of random fields representing distributed loads. In particular, the discretization of the distributed load imposed on a Bernoulli beam is presented in detail. A numerical example demonstrates very good agreement of the reliability indices computed with the help of stochastic finite element method (SFEM) and first-order reliability method (FORM) analyses with the results obtained from analytical formulae.

Electric arc furnace (EAF)-based process route in modern steelmaking for the production of plates and special quality bars requires a series of stations for the secondary metallurgy treatment (ladle furnace (LF), and potentially vacuum degasser), till the final casting for the production of slabs and blooms in the corresponding continuous casting machines. However, since every steel grade has its own melting characteristics, the melting (liquidus) temperature per grade is generally different and plays an important role to the final casting temperature, which has to exceed by somewhat the melting temperature by an amount called superheat. The superheat is adjusted at the LF station by the operator who decides mostly on personal experience but, since the ladle has to pass from downstream processes, the liquid steel loses temperature, not only due to the duration of the processes till casting but also due to the ladle refractory history. Simulation software was developed in Chapter 9 in order to reproduce the phenomena involved in a melt shop and influence downstream superheats. Data science models were deployed in order to check the potential of controlling casting temperatures by adjusting liquid steel exit temperatures at LF [19].

**References**

[1] Sousa E, Lins F, Tavares E, Cunha P, Maciel P. A modeling approach for cloud infrastructure planning considering dependability and cost requirements. IEEE Transactions on

Introductory Chapter: Introduction to Dependability Engineering

http://dx.doi.org/10.5772/intechopen.77013

5

[2] Jiménez AA, Muñoz CQG, Márquez FPG. Dirt and mud detection and diagnosis on a wind turbine blade employing guided waves and supervised learning classifiers.

[3] Sousa E, Lins F, Tavares E, Maciel P. Cloud infrastructure planning considering different

[4] Booch G. Object Oriented Design with Applications. The Benjamin/Cummings Publi-

[5] Muñoz CQG, Marquez FPG, Lev B, Arcos A. New pipe notch detection and location method for short distances employing ultrasonic guided waves. Acta Acustica United

[6] Feinbube L, Pirl L, Tröger P, Polze A. Software fault injection campaign generation for cloud infrastructures. In: 2017 IEEE International Conference on Software Quality,

[7] Nguyen TA, Min D, Park JS. A comprehensive sensitivity analysis of a data center network with server virtualization for business continuity. Mathematical Problems in

[8] Liu T, Chen C-C, Wu J, Milor L. SRAM stability analysis for different cache configurations due to bias temperature instability and hot carrier injection. In: 2016 IEEE 34th

[9] Keller AM, Wirthlin MJ. Benefits of complementary SEU mitigation for the LEON3 soft processor on SRAM-based FPGAs. IEEE Transactions on Nuclear Science. 2017;**64**:519-528

[10] Márquez FPG, Pedregal DJ, Roberts C. New methods for the condition monitoring of

[11] Papaelias M, Cheng L, Kogia M, Mohimi A, Kappatos V, Selcuk C, et al. Inspection and structural health monitoring techniques for concentrated solar power plants. Renewable

[12] Manupati V, Anand R, Thakkar J, Benyoucef L, Garsia FP, Tiwari M. Adaptive production control system for a flexible manufacturing cell using support vector machine-based approach. The International Journal of Advanced Manufacturing Technology. 2013;**67**:969-981

[13] García Márquez FP, Pliego Marugán A, Pinar Pérez JM, Hillmansen S, Papaelias M. Optimal dynamic analysis of electrical/electronic components in wind turbines.

International Conference on Computer Design (ICCD). 2016. pp. 225-232

level crossings. International Journal of Systems Science. 2015;**46**:878-884

Systems, Man, and Cybernetics: Systems. 2015;**45**:549-558

Reliability Engineering & System Safety. 2018. In Press

redundancy mechanisms. Computing. 2017;**99**:841-864

Reliability and Security Companion (QRS-C). 2017. pp. 622-623

shing, Pearson Education; 1991

with Acustica. 2017;**103**:772-781

Engineering. 2015;**2015**:1-20

Energy. 2016;**85**:1178-1191

Energies. 2017;**10**:1111

The electricity industry worldwide is turning increasingly to renewable sources of energy to generate electricity [20, 21]. Rural electrification is the key in developing countries to encourage youth and skilled personnel to stay in the rural area for production/income generation activities. Current situation of lack of grid network discourage skilled personnel to live in the rural areas, rather they migrate to urban. Tanzania as other countries has diverse renewable energy which needs to be developed for electricity generation. Most of these sources are found in rural areas, where there is no reliable electricity, that is, grid network is not extended due to low population density. The government of Tanzania has put in place the policy which encourages small power producers (up 10 MW) to develop and install electricity generation using renewable energy resources. Energy produced by small power producer would be sold to the community directly or to the government-owned company for grid integration. Chapter 10 discussed three major renewable energy sources which are environmentally friendly found in Tanzania, such as wind energy, solar energy, and hydropower energy. Also, the government is setting the strategies of empowering people in rural areas, particularly women and youth through organizations, such as local cooperatives, and by applying the bottom-up approach, so that livelihoods in the rural areas, to be enhanced through effective participation of rural people and rural communities in the management of their own social, economic, and environmental.

Reliability is a key important criterion in every single system in the world, and it is not different in engineering [22]. Reliability in power systems or electric grids can be generally defined as the availability time (capable of fully supplying the demand) of the system compared to the amount of time it is unavailable (incapable of supplying the demand) [23]. For systems with high uncertainties, such as renewable energy-based power systems, achieving a high level of reliability is a formidable challenge due to the increased penetrations of the intermittent renewable sources, such as wind and solar [24]. A careful and accurate planning is of the utmost importance to achieve high reliability in renewable energy-based systems [25]. Chapter 11 assesses wind-based power system's reliability issues and provides a case study that proposes a solution to enhance the reliability of the system.

#### **Author details**

Fausto Pedro García Márquez1 \* and Mayorkinos Papaelias2


#### **References**

refractory history. Simulation software was developed in Chapter 9 in order to reproduce the phenomena involved in a melt shop and influence downstream superheats. Data science models were deployed in order to check the potential of controlling casting temperatures by

The electricity industry worldwide is turning increasingly to renewable sources of energy to generate electricity [20, 21]. Rural electrification is the key in developing countries to encourage youth and skilled personnel to stay in the rural area for production/income generation activities. Current situation of lack of grid network discourage skilled personnel to live in the rural areas, rather they migrate to urban. Tanzania as other countries has diverse renewable energy which needs to be developed for electricity generation. Most of these sources are found in rural areas, where there is no reliable electricity, that is, grid network is not extended due to low population density. The government of Tanzania has put in place the policy which encourages small power producers (up 10 MW) to develop and install electricity generation using renewable energy resources. Energy produced by small power producer would be sold to the community directly or to the government-owned company for grid integration. Chapter 10 discussed three major renewable energy sources which are environmentally friendly found in Tanzania, such as wind energy, solar energy, and hydropower energy. Also, the government is setting the strategies of empowering people in rural areas, particularly women and youth through organizations, such as local cooperatives, and by applying the bottom-up approach, so that livelihoods in the rural areas, to be enhanced through effective participation of rural people and rural communities in the management of their own social,

Reliability is a key important criterion in every single system in the world, and it is not different in engineering [22]. Reliability in power systems or electric grids can be generally defined as the availability time (capable of fully supplying the demand) of the system compared to the amount of time it is unavailable (incapable of supplying the demand) [23]. For systems with high uncertainties, such as renewable energy-based power systems, achieving a high level of reliability is a formidable challenge due to the increased penetrations of the intermittent renewable sources, such as wind and solar [24]. A careful and accurate planning is of the utmost importance to achieve high reliability in renewable energy-based systems [25]. Chapter 11 assesses wind-based power system's reliability issues and provides a case study

\* and Mayorkinos Papaelias2

that proposes a solution to enhance the reliability of the system.

\*Address all correspondence to: Faustopedro.garcia@uclm.es

1 Ingenium Research Group, University of Castilla-La Mancha, Spain

2 School of Metallurgy and Materials, Birmingham University, United Kingdom

adjusting liquid steel exit temperatures at LF [19].

4 Dependability Engineering

economic, and environmental.

**Author details**

Fausto Pedro García Márquez1


[14] Strzeciwilk D, Zuberek WM. Modeling and performance analysis of QoS data. In: Romaniuk RS, editors. Proceedings of SPIE 10031, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments. Vol. 10031. SPIE Proceedings; 28 September 2016. p. 1003158. DOI: 10.1117/12.2249385

**Chapter 2**

Provisional chapter

**Modeling Strategies to Improve the Dependability of**

DOI: 10.5772/intechopen.71498

Cloud computing presents some challenges that need to be overcome, such as planning infrastructures that maintain availability when failure events and repair activities occur. Cloud infrastructure planning that addresses the dependability aspects is an essential activity because it ensures business continuity and client satisfaction. Redundancy mechanisms cold standby, warm standby and hot standby can be allocated to components of the cloud infrastructure to maintain the availability levels agreed in service level agreement (SLAs). Mathematical formalisms based on state space such as stochastic Petri nets and based on combinatorial as reliability block diagrams can be adopted to evaluate the dependability of cloud infrastructures considering the allocation of different redundancy mechanisms to its components. This chapter shows the adoption of the mathematical formalisms stochastic Petri nets and reliability block diagrams to dependability evalua-

tion of cloud infrastructures with different redundancy mechanisms.

reduced availability, reliability and performance of these services [1].

redundancy mechanisms, maintenance policies

Keywords: dependability evaluation, state space models, non-state space models,

Ensuring the availability levels required by the different services hosted in the private cloud is a great challenge. The occurrence of defects in these services can cause the degradation of their response times and the interruption of service of a request due to unavailability of the required resource. The interruption of these services can be caused by the occurrence of failure events in the hardware, software, power system, cooling system and private cloud network. When the occurrence of defects is constant, users give less preference to hiring service providers due to

> © The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

Modeling Strategies to Improve the Dependability

**Cloud Infrastructures**

of Cloud Infrastructures

Fernando Antonio Aires Lins

Fernando Antonio Aires Lins

Abstract

1. Introduction

Erica Teixeira Gomes de Sousa and

Erica Teixeira Gomes de Sousa and

http://dx.doi.org/10.5772/intechopen.71498

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter


Provisional chapter

#### **Modeling Strategies to Improve the Dependability of Cloud Infrastructures** Modeling Strategies to Improve the Dependability of Cloud Infrastructures

DOI: 10.5772/intechopen.71498

Erica Teixeira Gomes de Sousa and Fernando Antonio Aires Lins Erica Teixeira Gomes de Sousa and

Fernando Antonio Aires Lins

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.71498

#### Abstract

[14] Strzeciwilk D, Zuberek WM. Modeling and performance analysis of QoS data. In: Romaniuk RS, editors. Proceedings of SPIE 10031, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments. Vol. 10031. SPIE

[15] Jiménez AA, Muñoz CQG, Márquez FPG. Machine learning for wind turbine blades

[16] Muñoz CQG, Marquez FPG, Liang C, Maria K, Abbas M, Mayorkinos P. A new condition monitoring approach for maintenance management in concentrate solar plants. In: Proceedings of the Ninth International Conference on Management Science and

[17] García Márquez FP, Chacón Muñoz JM, Tobias AM. B-spline approach for failure detection and diagnosis on railway point mechanisms case study. Quality Engineering.

[18] Fedor K, Czmoch I. Structural analysis of tension tower subjected to exceptional loads during installation of line conductors. Procedia Engineering. 2016;**153**:136-143

[19] Sismanis P. Using data-science models to predict technological factors affecting the mechanical properties of flat products. Journal of Chemical Technology & Metallurgy.

[20] Pérez JMP, Márquez FPG, Hernández DR. Economic viability analysis for icing blades detection in wind turbines. Journal of Cleaner Production. 2016;**135**:1150-1160

[21] Melkior UF, Čerňan M, Müller Z, Tlustý J, Kasembe AG. The reliability of the system with wind power generation. In: 2016 17th International Scientific Conference on Electric

[22] Muñoz CQG, Jiménez AA, Márquez FPG. Wavelet transforms and pattern recognition on ultrasonic guides waves for frozen surface state diagnosis. Renewable Energy.

[23] Pliego Marugán A, García Márquez FP, Lev B. Optimal decision-making via binary decision diagrams for investments under a risky environment. International Journal of

[24] Gómez Muñoz CQ, Arcos Jimenez A, García Marquez FP, Kogia M, Cheng L, Mohimi A, et al. Cracks and welds detection approach in solar receiver tubes employing electro-

[25] Arabali A, Ghofrani M, Etezadi-Amoli M, Fadali MS, Moeini-Aghtaie M. A multi-objective transmission expansion planning framework in deregulated power systems with

wind generation. IEEE Transactions on Power Systems. 2014;**29**:3003-3011

magnetic acoustic transducers. Structural Health Monitoring; 2017

Proceedings; 28 September 2016. p. 1003158. DOI: 10.1117/12.2249385

maintenance management. Energies. 2017;**11**:1-16

Engineering Management; 2015. pp. 999-1008

Power Engineering (EPE); 2016. pp. 1-6

Production Research. 2017;**55**:5271-5286

2015;**27**:177-185

6 Dependability Engineering

2017;**52**:299-313

2018;**116**:42-54

Cloud computing presents some challenges that need to be overcome, such as planning infrastructures that maintain availability when failure events and repair activities occur. Cloud infrastructure planning that addresses the dependability aspects is an essential activity because it ensures business continuity and client satisfaction. Redundancy mechanisms cold standby, warm standby and hot standby can be allocated to components of the cloud infrastructure to maintain the availability levels agreed in service level agreement (SLAs). Mathematical formalisms based on state space such as stochastic Petri nets and based on combinatorial as reliability block diagrams can be adopted to evaluate the dependability of cloud infrastructures considering the allocation of different redundancy mechanisms to its components. This chapter shows the adoption of the mathematical formalisms stochastic Petri nets and reliability block diagrams to dependability evaluation of cloud infrastructures with different redundancy mechanisms.

Keywords: dependability evaluation, state space models, non-state space models, redundancy mechanisms, maintenance policies

#### 1. Introduction

Ensuring the availability levels required by the different services hosted in the private cloud is a great challenge. The occurrence of defects in these services can cause the degradation of their response times and the interruption of service of a request due to unavailability of the required resource. The interruption of these services can be caused by the occurrence of failure events in the hardware, software, power system, cooling system and private cloud network. When the occurrence of defects is constant, users give less preference to hiring service providers due to reduced availability, reliability and performance of these services [1].

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited.

The dependability assessment can minimize the occurrence of faults and failure events [2] in the private cloud and promote the levels of availability and reliability defined in the SLAs, avoiding the payment of contractual fines. One option to ensure the availability of services offered in the private cloud is to assign redundant equipment to its components. Redundant devices allow service reestablishment, minimizing the effects of failure events. The major problem with this assignment is the estimation of the number of redundant equipment and the choice of the type of redundancy that must be considered to guarantee the quality of the service offered. The estimation of the type and number of redundant equipment should also consider the cost of the quantitative of each type of redundancy mechanism attributed to the components of the cloud computing [3, 4].

#### 2. Basic concepts

The dependability evaluation denotes the ability of a system to deliver a reliably service. Dependability measures are reliability, availability, maintainability, performability, safety, testability, confidentiality, and integrity [2].

Dependability evaluation is related to the study of the effect of errors, defects and failures in the system, since these have a negative impact on the dependability attributes. A fault is defined as the failure of a component, subsystem or system that interacts with the system in question [5]. An error is defined as a state that can lead to a failure. A defect represents the deviation from the correct operation of a system. A summary of the main measures of dependability is shown below.

The reliability of a system is the probability (P) that this system performs its function satisfactorily, without the occurrence of defects, for a certain period of time (T). Reliability is represented by Eq. (1), where T is a random variable that represents the time for occurrence of defects in the system [3, 4].

$$\mathbf{R}(t) = \mathbf{P}\{\mathbf{T} > t\}, \mathbf{t} \ge \mathbf{0} \tag{1}$$

MTTF ¼

aging phase.

Figure 1. Bathtub curve.

ð∞ 0

R tð Þdt ¼

The failures can be classified in relation to the time, according to the mechanism that originated them. The behavior of the failure rate can be represented graphically through the bathtub curve, which presents three distinct phases: infant mortality (1), useful life (2) and aging (3). Figure 1 shows the variation of the failure rate of hardware components as a function of time [7].

During the infant mortality phase (1), a reduction in the failure rate occurs. Failures during this period are due to equipment manufacturing defects. In order to shorten this period, manufacturers submit the equipment to a process called burn-in, where they are exposed to high operating temperatures. In the useful stage (2), the failures occur randomly. Equipment reliability values provided by manufacturers apply to this period. The service life of the equipment is not normally a constant. It depends on the level of stress in which the equipment is subjected during that period. During the aging phase (3), an increase in the failure rate occurs. In high availability environments, one must be sure that the infant mortality phase has passed. In some cases, it is necessary to leave the equipment running in a test environment during this time. At the same time, care must be taken to have the equipment replaced before entering the

The availability of a system is the probability that this system is operational for a certain period of time, or has been restored after a defect has occurred. Uptime is the period of time in which the system is operational, downtime is the period of time when the system is not operational due to a defect or repair activity occurring, and uptime + downtime is the time period of

observation of the system. Eq. (5) represents the availability of a system [3, 4, 6].

<sup>A</sup> <sup>¼</sup> uptime

Computational systems and applications require different levels of availability and therefore can be classified according to these levels. U.S. Federal Aviation Administration's National Airspace System's Reliability Handbook classifies computer systems and applications according to their criticality levels [1]. These computational systems and applications can be

uptime <sup>þ</sup> downtime (5)

ð∞ 0

<sup>e</sup>ð Þ �<sup>λ</sup> <sup>t</sup> <sup>¼</sup> <sup>1</sup>

Modeling Strategies to Improve the Dependability of Cloud Infrastructures

<sup>λ</sup> (4)

9

http://dx.doi.org/10.5772/intechopen.71498

The probability of the occurrence of defects up to a time t, is represented by Eq. (2), where T is a random variable that represents the time for system failures [3, 4].

$$\mathbf{F(t)} = \mathbf{1} - \mathbf{R(t)} = \mathbf{P(T \le t)} \tag{2}$$

Eq. (3) represents the reliability, considering the density function F(t) of the time for occurrence of failures (T) in the system [3, 4, 6].

$$\mathbf{R}(\mathbf{t}) = \mathbf{P}\{\mathbf{T} > t\} = \int\_{\mathbf{t}}^{\infty} \mathbf{F}(\mathbf{t}) \mathbf{d}\mathbf{t} \tag{3}$$

The Mean Time to Failure (MTTF) is the average time for defects to occur in the system. When this average time follows the exponential distribution with parameter λ, the MTTF is represented by Eq. (4) [3, 4, 6].

Modeling Strategies to Improve the Dependability of Cloud Infrastructures http://dx.doi.org/10.5772/intechopen.71498 9

$$\text{MTTF} = \int\_0^\infty \mathbf{R}(\mathbf{t}) \mathbf{d}\mathbf{t} = \int\_0^\infty \mathbf{e}^{(-\lambda)\mathbf{t}} = \frac{1}{\lambda} \tag{4}$$

The failures can be classified in relation to the time, according to the mechanism that originated them. The behavior of the failure rate can be represented graphically through the bathtub curve, which presents three distinct phases: infant mortality (1), useful life (2) and aging (3). Figure 1 shows the variation of the failure rate of hardware components as a function of time [7].

During the infant mortality phase (1), a reduction in the failure rate occurs. Failures during this period are due to equipment manufacturing defects. In order to shorten this period, manufacturers submit the equipment to a process called burn-in, where they are exposed to high operating temperatures. In the useful stage (2), the failures occur randomly. Equipment reliability values provided by manufacturers apply to this period. The service life of the equipment is not normally a constant. It depends on the level of stress in which the equipment is subjected during that period. During the aging phase (3), an increase in the failure rate occurs.

In high availability environments, one must be sure that the infant mortality phase has passed. In some cases, it is necessary to leave the equipment running in a test environment during this time. At the same time, care must be taken to have the equipment replaced before entering the aging phase.

The availability of a system is the probability that this system is operational for a certain period of time, or has been restored after a defect has occurred. Uptime is the period of time in which the system is operational, downtime is the period of time when the system is not operational due to a defect or repair activity occurring, and uptime + downtime is the time period of observation of the system. Eq. (5) represents the availability of a system [3, 4, 6].

$$\mathbf{A} = \frac{\text{uptime}}{\text{uptime} + \text{downtime}} \tag{5}$$

Computational systems and applications require different levels of availability and therefore can be classified according to these levels. U.S. Federal Aviation Administration's National Airspace System's Reliability Handbook classifies computer systems and applications according to their criticality levels [1]. These computational systems and applications can be

Figure 1. Bathtub curve.

The dependability assessment can minimize the occurrence of faults and failure events [2] in the private cloud and promote the levels of availability and reliability defined in the SLAs, avoiding the payment of contractual fines. One option to ensure the availability of services offered in the private cloud is to assign redundant equipment to its components. Redundant devices allow service reestablishment, minimizing the effects of failure events. The major problem with this assignment is the estimation of the number of redundant equipment and the choice of the type of redundancy that must be considered to guarantee the quality of the service offered. The estimation of the type and number of redundant equipment should also consider the cost of the quantitative of each type of redundancy mechanism attributed to the

The dependability evaluation denotes the ability of a system to deliver a reliably service. Dependability measures are reliability, availability, maintainability, performability, safety, testability,

Dependability evaluation is related to the study of the effect of errors, defects and failures in the system, since these have a negative impact on the dependability attributes. A fault is defined as the failure of a component, subsystem or system that interacts with the system in question [5]. An error is defined as a state that can lead to a failure. A defect represents the deviation from the correct operation of a system. A summary of the main measures of depend-

The reliability of a system is the probability (P) that this system performs its function satisfactorily, without the occurrence of defects, for a certain period of time (T). Reliability is represented by Eq. (1), where T is a random variable that represents the time for occurrence

The probability of the occurrence of defects up to a time t, is represented by Eq. (2), where T is

Eq. (3) represents the reliability, considering the density function F(t) of the time for occurrence

The Mean Time to Failure (MTTF) is the average time for defects to occur in the system. When this average time follows the exponential distribution with parameter λ, the MTTF is repre-

ð∞ t

R tðÞ¼ P Tf g > t ¼

a random variable that represents the time for system failures [3, 4].

R tðÞ¼ P Tf g > t , t ≥ 0 (1)

F tðÞ¼ 1 � R tðÞ¼ P Tf g ≤ t (2)

F tð Þdt (3)

components of the cloud computing [3, 4].

2. Basic concepts

8 Dependability Engineering

ability is shown below.

of defects in the system [3, 4].

of failures (T) in the system [3, 4, 6].

sented by Eq. (4) [3, 4, 6].

confidentiality, and integrity [2].

considered critical critics when the required availability is 99.99999%, critical when the required availability is 99.999%, essential when the required availability is 99.9% and routine when the required availability is 99% [1].

The maintainability is the probability that a system can be repaired in a given period of time (TR). The maintainability is described by Eq. (6), where TR denotes the repair time. This equation represents maintainability, since the repair time TR has a density function G(t) [3, 4, 6].

$$\mathbf{V}(\mathbf{t}) = \mathbf{P}\{\mathbf{TR} \le \mathbf{tr}\} = \int\_0^{\mathbf{tr}} \mathbf{G}(\mathbf{t}) \mathbf{dt} \tag{6}$$

The active-standby redundancy mechanisms are employed when the primary components meet the requests of the system users and the secondary components are on hold. When the primary components fail, the secondary components will be responsible for servicing the system users' requests. The active-standby redundancy mechanisms can be classified as hot

Modeling Strategies to Improve the Dependability of Cloud Infrastructures

http://dx.doi.org/10.5772/intechopen.71498

11

In the hot standby redundancy mechanism, redundant modules that are in standby function in synchronization with the operating module, without their computation being considered in the system, and in case the occurrence of a failure event is detected, it is ready to make

In the cold standby redundancy mechanism, the redundant modules are turned off and only when a failure event occurs will they be activated after a time interval. In the cold standby redundancy mechanism, inactive modules that are de-energized, by hypothesis, do not fail,

In the warm standby redundancy mechanism, the redundant modules that are in standby function in sync with the operating module, without their computation being considered in the system. If a fault event is detected, the redundant module is ready to become operational after a time interval. Systems with standby sparing of cold standby sparing and warm standby sparing need more time for recovery compared to hot standby sparing, but systems with cold standby sparing and warm standby sparing have the advantage of lower power consumption

The models adopted for dependability evaluation can be classified as combinatorial and state space. The combinatorial models capture the conditions that cause failures in the system or allow its operation when considering the structural relationships of its components. The best known combinatorial models are Reliability Block Diagram (RBD) and Fault Tree (FT) [9, 10]. State space-based models represent the behavior of the system (occurrence of failures and repair activities) through its states and the occurrence of events. These models allow the representation of dependency relations between the components of the systems. The most widely used state space-based models are Markov Chains (MC) and Stochastic Petri Net

The SPN models provide great flexibility in the representation of aspects of dependability. However, these models suffer from problems related to the size of the state space for computational systems with large number of components [11, 12]. RBD models are simple, easy to understand, and their solution methods have been extensively studied. These models can represent the components of cloud computing that do not have a dependency relation to allow an efficient representation, avoiding growth problems too much of the space of states [9, 10].

standby, cold standby and warm standby [1].

whereas the active module has a constant failure rate λ.

operational immediately [1, 4].

and no wear standby systems [1, 4].

4. Modeling techniques

(SPN) [9–12].

The Mean Time to Repair (MTTR) is the average time to repair the system. When the time distribution function of repair is represented by an exponential distribution with parameter μ, the MTTR is represented by Eq. (7) [3, 4, 6].

$$\text{MTTR} = \int\_0^\infty 1 - \text{G(tr)dt} = \int\_0^\infty 1 - \text{e}^{(\mu) \text{tr}} = \frac{1}{\mu} \tag{7}$$

Mean Time Between Failures (MTBF) is the mean time between system defects, represented by Eq. (8) [3, 4, 6].

$$\text{MTBF} = \text{MTTR} + \text{MTTF} \tag{8}$$

Performability describes the degradation of system performance caused by the occurrence of defects [3, 6].

#### 3. Redundancy mechanisms

The redundancy mechanisms provide greater availability and reliability to the system during the occurrence of failure events due to the maintenance of components operating in parallel, that is, a redundant system has a secondary component that will be available when the primary component fails. Thus, redundancy mechanisms are designed to avoid single points of failure and therefore provide high availability and disaster recovery if necessary [1, 8].

The redundancy mechanisms can be classified as active-active and active-standby. Activeactive redundancy mechanisms are employed when the primary and secondary components share the workload of the system. When any of these components fails, the other component will be responsible for servicing the system users' requests. These redundancy mechanisms can be classified as N + K, where K secondary components identical to N primary components are required for system workload sharing. In the N + 1, configuration, a secondary component identical to the primary N components is required for sharing the system workload. In the N + 2, configuration, two secondary components identical to the N primary components are required for system workload sharing [1].

The active-standby redundancy mechanisms are employed when the primary components meet the requests of the system users and the secondary components are on hold. When the primary components fail, the secondary components will be responsible for servicing the system users' requests. The active-standby redundancy mechanisms can be classified as hot standby, cold standby and warm standby [1].

In the hot standby redundancy mechanism, redundant modules that are in standby function in synchronization with the operating module, without their computation being considered in the system, and in case the occurrence of a failure event is detected, it is ready to make operational immediately [1, 4].

In the cold standby redundancy mechanism, the redundant modules are turned off and only when a failure event occurs will they be activated after a time interval. In the cold standby redundancy mechanism, inactive modules that are de-energized, by hypothesis, do not fail, whereas the active module has a constant failure rate λ.

In the warm standby redundancy mechanism, the redundant modules that are in standby function in sync with the operating module, without their computation being considered in the system. If a fault event is detected, the redundant module is ready to become operational after a time interval. Systems with standby sparing of cold standby sparing and warm standby sparing need more time for recovery compared to hot standby sparing, but systems with cold standby sparing and warm standby sparing have the advantage of lower power consumption and no wear standby systems [1, 4].

#### 4. Modeling techniques

considered critical critics when the required availability is 99.99999%, critical when the required availability is 99.999%, essential when the required availability is 99.9% and routine

The maintainability is the probability that a system can be repaired in a given period of time (TR). The maintainability is described by Eq. (6), where TR denotes the repair time. This equation

The Mean Time to Repair (MTTR) is the average time to repair the system. When the time distribution function of repair is represented by an exponential distribution with parameter μ,

1 � G tr ð Þdt ¼

Mean Time Between Failures (MTBF) is the mean time between system defects, represented by

Performability describes the degradation of system performance caused by the occurrence of

The redundancy mechanisms provide greater availability and reliability to the system during the occurrence of failure events due to the maintenance of components operating in parallel, that is, a redundant system has a secondary component that will be available when the primary component fails. Thus, redundancy mechanisms are designed to avoid single points of failure and therefore provide high availability and disaster recovery if necessary [1, 8].

The redundancy mechanisms can be classified as active-active and active-standby. Activeactive redundancy mechanisms are employed when the primary and secondary components share the workload of the system. When any of these components fails, the other component will be responsible for servicing the system users' requests. These redundancy mechanisms can be classified as N + K, where K secondary components identical to N primary components are required for system workload sharing. In the N + 1, configuration, a secondary component identical to the primary N components is required for sharing the system workload. In the N + 2, configuration, two secondary components identical to the N primary components are

ðtr 0

ð∞ 0

<sup>1</sup> � <sup>e</sup>ð Þ <sup>μ</sup> tr <sup>¼</sup> <sup>1</sup>

MTBF ¼ MTTR þ MTTF (8)

G tð Þdt (6)

<sup>μ</sup> (7)

represents maintainability, since the repair time TR has a density function G(t) [3, 4, 6].

V tðÞ¼ P TR f g ≤ tr ¼

when the required availability is 99% [1].

the MTTR is represented by Eq. (7) [3, 4, 6].

3. Redundancy mechanisms

required for system workload sharing [1].

Eq. (8) [3, 4, 6].

10 Dependability Engineering

defects [3, 6].

MTTR ¼

ð∞ 0

> The models adopted for dependability evaluation can be classified as combinatorial and state space. The combinatorial models capture the conditions that cause failures in the system or allow its operation when considering the structural relationships of its components. The best known combinatorial models are Reliability Block Diagram (RBD) and Fault Tree (FT) [9, 10]. State space-based models represent the behavior of the system (occurrence of failures and repair activities) through its states and the occurrence of events. These models allow the representation of dependency relations between the components of the systems. The most widely used state space-based models are Markov Chains (MC) and Stochastic Petri Net (SPN) [9–12].

> The SPN models provide great flexibility in the representation of aspects of dependability. However, these models suffer from problems related to the size of the state space for computational systems with large number of components [11, 12]. RBD models are simple, easy to understand, and their solution methods have been extensively studied. These models can represent the components of cloud computing that do not have a dependency relation to allow an efficient representation, avoiding growth problems too much of the space of states [9, 10].

### 5. Reliability block diagram

Reliability block diagram (RBD) is one of the most used techniques for reliability analysis of systems [5].

The RBD allows the calculation of availability and reliability by means of closed formulas, since it is a combinational model. These closed formulas make the calculation of the result faster than the simulation, for example [6].

In a reliability block diagram, components are represented with blocks combined with other blocks (i.e., components) in series, parallel or combinations of those structures. A diagram that has components connected in series requires each component to be running for the system to be operational. A diagram that has components connected in parallel requires that only one component is working for the system to be operational [13]. Thus, the system is described as a set of interconnected functional blocks to represent the effect of availability and reliability of each block on the availability and reliability of the system [14].

The availability and reliability of two blocks connected in series is obtained through Eq. (9) [6].

$$\mathbf{P}\_{\mathbf{s}} = \prod\_{i=1}^{n} \mathbf{P}\_{\mathbf{i}}(\mathbf{t}) \tag{9}$$

6. Petri nets

Figure 3. Reliability block diagram in parallel.

determinism [15].

administrative systems, among others) [16].

The concept of Petri nets was introduced by Carl Adam Petri in 1962 with the presentation of his doctoral thesis "Kommunikation mit Automaten" (Communication with Automata) [15] at the Faculty of Mathematics and Physics of Darmstadt University in Germany. Petri nets are graphical and mathematical tools used for formal description of systems characterized by properties of concurrency, parallelism, synchronization, distribution, asynchronism, and non-

Modeling Strategies to Improve the Dependability of Cloud Infrastructures

http://dx.doi.org/10.5772/intechopen.71498

13

The applicability of Petri nets as a tool for systems study is important because it allows for mathematical representation, analysis of models and also for providing useful information about the structure and dynamic behavior of the modeled systems. The applications of Petri nets can occur in many areas (systems of manufacture, development and testing of software,

The Petri nets presents some characteristics that are: the dynamic representation of the modeling system with the desired level of detail; The graphical and formal description that allows to obtain information on the behavior of the modeled system through its behavioral and structural properties; The representation of synchronism, asynchronism, competition, resource

Petri nets are formed by places (1), transitions (2), arcs (3) and marking (4). The places correspond to state variables and the transitions, actions or events performed by the system. The performance of an action is associated with some preconditions, that is, there is a relation

sharing, among other behaviors; And the wide applicability and documentation.

where:

Pi(t) describes the reliability Ri(t), the instantaneous availability Ai(t) e a and the steady state availability Ai of the block Bi.

The availability and reliability of two blocks connected in parallel is obtained through Eq. (10) [6].

$$\mathbf{P\_p} = 1 - \prod\_{i=1}^{n} \left( 1 - \mathbf{P\_i(t)} \right) \tag{10}$$

where Pi(t) describes the reliability Ri(t), the instantaneous availability Ai(t) e a and the steady state availability Ai of the block Bi.

Figure 2 shows the connection of the blocks in series and Figure 3 shows the connection of the blocks in parallel.

The reliability block diagram is mainly used in modular systems consisting of many independent modules, where each can be easily represented by a block.

Figure 2. Reliability block diagram in series.

Modeling Strategies to Improve the Dependability of Cloud Infrastructures http://dx.doi.org/10.5772/intechopen.71498 13

Figure 3. Reliability block diagram in parallel.

#### 6. Petri nets

5. Reliability block diagram

faster than the simulation, for example [6].

each block on the availability and reliability of the system [14].

dent modules, where each can be easily represented by a block.

systems [5].

12 Dependability Engineering

where:

availability Ai of the block Bi.

state availability Ai of the block Bi.

Figure 2. Reliability block diagram in series.

blocks in parallel.

Reliability block diagram (RBD) is one of the most used techniques for reliability analysis of

The RBD allows the calculation of availability and reliability by means of closed formulas, since it is a combinational model. These closed formulas make the calculation of the result

In a reliability block diagram, components are represented with blocks combined with other blocks (i.e., components) in series, parallel or combinations of those structures. A diagram that has components connected in series requires each component to be running for the system to be operational. A diagram that has components connected in parallel requires that only one component is working for the system to be operational [13]. Thus, the system is described as a set of interconnected functional blocks to represent the effect of availability and reliability of

The availability and reliability of two blocks connected in series is obtained through Eq. (9) [6].

Pi(t) describes the reliability Ri(t), the instantaneous availability Ai(t) e a and the steady state

The availability and reliability of two blocks connected in parallel is obtained through Eq. (10) [6].

i¼1

where Pi(t) describes the reliability Ri(t), the instantaneous availability Ai(t) e a and the steady

Figure 2 shows the connection of the blocks in series and Figure 3 shows the connection of the

The reliability block diagram is mainly used in modular systems consisting of many indepen-

Pp <sup>¼</sup> <sup>1</sup> � <sup>Y</sup><sup>n</sup>

Pið Þt (9)

ð Þ 1 � Pið Þt (10)

Ps <sup>¼</sup> <sup>Y</sup><sup>n</sup> i¼1

> The concept of Petri nets was introduced by Carl Adam Petri in 1962 with the presentation of his doctoral thesis "Kommunikation mit Automaten" (Communication with Automata) [15] at the Faculty of Mathematics and Physics of Darmstadt University in Germany. Petri nets are graphical and mathematical tools used for formal description of systems characterized by properties of concurrency, parallelism, synchronization, distribution, asynchronism, and nondeterminism [15].

> The applicability of Petri nets as a tool for systems study is important because it allows for mathematical representation, analysis of models and also for providing useful information about the structure and dynamic behavior of the modeled systems. The applications of Petri nets can occur in many areas (systems of manufacture, development and testing of software, administrative systems, among others) [16].

> The Petri nets presents some characteristics that are: the dynamic representation of the modeling system with the desired level of detail; The graphical and formal description that allows to obtain information on the behavior of the modeled system through its behavioral and structural properties; The representation of synchronism, asynchronism, competition, resource sharing, among other behaviors; And the wide applicability and documentation.

> Petri nets are formed by places (1), transitions (2), arcs (3) and marking (4). The places correspond to state variables and the transitions, actions or events performed by the system. The performance of an action is associated with some preconditions, that is, there is a relation

between the places and the transitions that allows or not the accomplishment of a certain action. After performing a certain action, some places will have their information changed, that is, the action will create a post condition. The arcs represent the flow of the marking through the Petri net, and the tokens represent the state in which the system is at a given moment. Graphically, places are represented by ellipses or circles, transitions, by rectangles, arcs, by arrows and marking, by means of dots (Figure 4) [16].

Let M a place pi ∈ P, of a Petri net marked RM = (R, M0), this place is k-bounded (k ∈ IN) or

Modeling Strategies to Improve the Dependability of Cloud Infrastructures

http://dx.doi.org/10.5772/intechopen.71498

15

The limited k is the maximum number of marking that a place can accumulate. A Petri net labeled RM = (R, M0) is k-bounded if the number of marking at each RM site does not exceed k

Safeness is a particularization of limited property. The concept of limited defines that a pi place is k-bounded if the number of marking that this place can accumulate is limited to the number k. A

Liveness is defined according to the triggering possibilities of the transitions. A Petri net is considered live if, regardless of the marking that are reachable from M0, it is always possible to trigger any transition of the Petri net through a sequence of transitions L(M0). The absence of deadlock in systems is strongly linked to the concept of vivacity, since deadlock in a Petri net is the impossibility of triggering any transition of the Petri net. The fact that a system is deadlock free does not mean that it is live, however a live system implies a deadlock free system.

The concept of coverage is associated with the concept of reachability and live. An Mi marking

The structural properties are those that depend only on the structure of the Petri net. These properties reflect independent marking characteristics. The properties analyzed in this work

A Petri net R = (P, T, F, W, M0) is classified as structurally limited if it is limited to any initial

The Petri net is considered to be consistent if, by triggering a sequence of enabled transitions from an M0 marking, it returns to M0, however all transitions of the Petri net are fired at least

Let RM = (R, M0) be a marked Petri net and a sequence s of transitions, RM is consistent if

Petri Net (SNP) [11] is one of the Petri net extensions (PN) [15] used for performance and dependability modeling. A stochastic Petri net adds time to Petri net formalism, with the difference that the times associated with the timed transitions are exponentially distributed, while the time associated with the immediate transitions is zero. The timed transitions model activities through the associated times, so that the timing transition period corresponds to the activity execution period, and the timed transition trigger corresponds to the end of the activity. Different levels of priority can be assigned to transitions. The trigger priority of the immediate transitions is higher than the timed transitions. Priorities can solve situations of confusion [12]. The firing probabilities associated with immediate transitions can resolve conflict situations [4, 5].

simply limited if for every accessible marking M ∈ CA (R, M0), M (pi) ≤ k.

at any accessible RM marking (max (M (p)) = k, ∀ p ∈ P).

is covered if there is a marking Mj 6¼ Mi, such that Mj ≥ Mi.

M0 [s > M0] and every transition Ti, firing at least once in s.

6.1.2. Structural properties

6.2. Stochastic Petri net

marking.

once.

are structural limitation and consistency.

place that is 1-limited can simply be called insurance.

The two elements, place and transition, are interconnected by directed arcs as shown in Figure 5. The arcs that interconnect places to the transitions (Place ! Transition) correspond to the relationship between the true conditions (precondition), which enable the execution Of the shares. The arcs that interconnect transitions to places (Transition ! Place) represent the relationship between actions and conditions that become true with the execution of actions (post condition) [16].

The formal mathematical representation of a model in Petri net (Petri net—PN) is the quintuple PN = P, T, F, W, M0 [15], where:

#### 6.1. Properties of Petri nets

The study of the properties of Petri nets allows the analysis of the modeling system. Property types can be divided into two categories: the initial marking-dependent properties, named behavioral properties, and the non-marking properties, named structural properties [15, 16].

#### 6.1.1. Behavioral properties

The behavioral properties are those that depend only on the initial marking of the Petri net. The properties covered are reachability, limitation, safeness, liveness and coverage.

Reachability indicates the possibility that a given marking can be reached by firing a finite number of transitions from an initial marking. Given a Petri net marked RM = (R,M0), the triggering of a transition t0 alters the marking of the Petri net. An M' label is accessible from M0 if there is a sequence of transitions which, triggered, lead to the M' label. That is, if the marking M0 enables the transition t0, by triggering this transition, the marking M1 is reached. The marking M1 enables t1 which, upon being triggered, reaches the marking M2 and so on until the marking M' is obtained.

Figure 4. Elements of Petri net.

Figure 5. Example of Petri net.

Let M a place pi ∈ P, of a Petri net marked RM = (R, M0), this place is k-bounded (k ∈ IN) or simply limited if for every accessible marking M ∈ CA (R, M0), M (pi) ≤ k.

The limited k is the maximum number of marking that a place can accumulate. A Petri net labeled RM = (R, M0) is k-bounded if the number of marking at each RM site does not exceed k at any accessible RM marking (max (M (p)) = k, ∀ p ∈ P).

Safeness is a particularization of limited property. The concept of limited defines that a pi place is k-bounded if the number of marking that this place can accumulate is limited to the number k. A place that is 1-limited can simply be called insurance.

Liveness is defined according to the triggering possibilities of the transitions. A Petri net is considered live if, regardless of the marking that are reachable from M0, it is always possible to trigger any transition of the Petri net through a sequence of transitions L(M0). The absence of deadlock in systems is strongly linked to the concept of vivacity, since deadlock in a Petri net is the impossibility of triggering any transition of the Petri net. The fact that a system is deadlock free does not mean that it is live, however a live system implies a deadlock free system.

The concept of coverage is associated with the concept of reachability and live. An Mi marking is covered if there is a marking Mj 6¼ Mi, such that Mj ≥ Mi.

#### 6.1.2. Structural properties

between the places and the transitions that allows or not the accomplishment of a certain action. After performing a certain action, some places will have their information changed, that is, the action will create a post condition. The arcs represent the flow of the marking through the Petri net, and the tokens represent the state in which the system is at a given moment. Graphically, places are represented by ellipses or circles, transitions, by rectangles, arcs, by arrows and

The two elements, place and transition, are interconnected by directed arcs as shown in Figure 5. The arcs that interconnect places to the transitions (Place ! Transition) correspond to the relationship between the true conditions (precondition), which enable the execution Of the shares. The arcs that interconnect transitions to places (Transition ! Place) represent the relationship between actions and conditions that become true with the execution of actions

The formal mathematical representation of a model in Petri net (Petri net—PN) is the quintuple

The study of the properties of Petri nets allows the analysis of the modeling system. Property types can be divided into two categories: the initial marking-dependent properties, named behavioral properties, and the non-marking properties, named structural properties [15, 16].

The behavioral properties are those that depend only on the initial marking of the Petri net.

Reachability indicates the possibility that a given marking can be reached by firing a finite number of transitions from an initial marking. Given a Petri net marked RM = (R,M0), the triggering of a transition t0 alters the marking of the Petri net. An M' label is accessible from M0 if there is a sequence of transitions which, triggered, lead to the M' label. That is, if the marking M0 enables the transition t0, by triggering this transition, the marking M1 is reached. The marking M1 enables t1 which, upon being triggered, reaches the marking M2 and so on

The properties covered are reachability, limitation, safeness, liveness and coverage.

marking, by means of dots (Figure 4) [16].

(post condition) [16].

14 Dependability Engineering

PN = P, T, F, W, M0 [15], where:

6.1. Properties of Petri nets

6.1.1. Behavioral properties

until the marking M' is obtained.

Figure 5. Example of Petri net.

Figure 4. Elements of Petri net.

The structural properties are those that depend only on the structure of the Petri net. These properties reflect independent marking characteristics. The properties analyzed in this work are structural limitation and consistency.

A Petri net R = (P, T, F, W, M0) is classified as structurally limited if it is limited to any initial marking.

The Petri net is considered to be consistent if, by triggering a sequence of enabled transitions from an M0 marking, it returns to M0, however all transitions of the Petri net are fired at least once.

Let RM = (R, M0) be a marked Petri net and a sequence s of transitions, RM is consistent if M0 [s > M0] and every transition Ti, firing at least once in s.

#### 6.2. Stochastic Petri net

Petri Net (SNP) [11] is one of the Petri net extensions (PN) [15] used for performance and dependability modeling. A stochastic Petri net adds time to Petri net formalism, with the difference that the times associated with the timed transitions are exponentially distributed, while the time associated with the immediate transitions is zero. The timed transitions model activities through the associated times, so that the timing transition period corresponds to the activity execution period, and the timed transition trigger corresponds to the end of the activity. Different levels of priority can be assigned to transitions. The trigger priority of the immediate transitions is higher than the timed transitions. Priorities can solve situations of confusion [12]. The firing probabilities associated with immediate transitions can resolve conflict situations [4, 5].

Timed transitions can be characterized by different memory policies such as Resampling, Enabling memory and Age memory [5]. The timed transitions can also be characterized by different firing semantics named single server, multiple server and infinite server [5].

#### 6.3. Phase approximation technique

SPN models consider only immediate transitions and timed transitions with exponentially distributed trigger times. These transitions model actions, activities, and events. A variety of activities can be modeled through the use of constructor throughput subnets and s-transitions. These constructs are used to represent expolinomial distributions, such as the Erlang, hypoexponential and hyperexponential distributions [9].

The phase approximation technique can be applied to model non-exponential actions, activities, and events through moment matching. The presented method calculates the first moment around the origin (average) and the second central moment (variance) and estimates the respective moments of the s-transition [9].

Performance and dependability data measured or obtained from a system (empirical distribution) with mean μ and standard deviation σ may have their approximate stochastic behavior through the phase approximation technique. The inverse of the variation coefficient of the data measured or obtained from a system Eq. (11) allows the selection of the expolinomial distribution that best adapts to the empirical distribution. This empirical distribution can be continuous or discrete. Among the continuous distributions, there are: Normal, Lognormal, Weibull, Gamma, Continuous Uniform, Pareto, Beta and Triangular and among the discrete distributions there are: Geometric, Poisson and Discrete Uniform [8].

$$\frac{1}{CV} = \frac{\mu\_D}{\sigma\_D} \tag{11}$$

7. Modeling strategy

systems [11].

Figure 7. Basic SPN model.

Table 1. Attributes of the SPN model transitions.

The dependability metrics can be calculated using state space-based models (e.g., SPN) and combinatorial models (e.g., RBD). The RBDs have an advantage over the provision of results, as present faster calculations through their formulas than the simulations and the numerical

Modeling Strategies to Improve the Dependability of Cloud Infrastructures

http://dx.doi.org/10.5772/intechopen.71498

17

State space-based models can describe dependencies that allow the representation of complex redundancy mechanisms. However, these models can generate a very large or even infinite

The combination of state space-based models and combinatorial models allows for the reduction of complexity in the representation of systems. RBD models can represent the components of the cloud computing [6]. These RBD models are used to estimate the availability and downtime of the cloud computing when there is little dependency relation between the components of this environment and the redundancy mechanisms adopted. If there is a need to represent a greater dependency between the components of the cloud computing and the redundancy mechanisms used, SPN models are used to represent the computational cloud

Figure 7 shows a basic SPN model that allows a representation of the cloud computing. In this SPN model, the ON and OFF places represent the working or faulted computational cloud.

Figure 8 shows a basic RBD model that allows a representation of the cloud computing. The

Transition Type Time Weight Concurrence

MTTF exp XMTTF – SS MTTR exp XMTTR – SS

analyzes of the SPNs. However, SPNs have a greater power of representation [3, 17].

number of states when they represent highly complex systems [3, 12, 17].

The attributes of the transitions of this SPN model are presented in Table 1.

parameters of the RBD model are presented in the Table 2.

The Petri net described in Figure 6 represents a timed activity with generic probability distribution.

Depending on the inverse of the variation coefficient of the measured data (Eq. (11)), the respective activity has one of these distributions attributed: Erlang, Hypoexponential or Hyperexponential. When the inverse of the variation coefficient is an integer and different from one, the data must be characterized by the Erlang distribution. When the inverse of the variation coefficient is a number greater than one (but not an integer), the data are represented by the hypoexponential distribution. When the inverse of the variation coefficient is a number smaller than one, the data must be characterized by a hyperexponential distribution.

Figure 6. Empirical distribution.

#### 7. Modeling strategy

Timed transitions can be characterized by different memory policies such as Resampling, Enabling memory and Age memory [5]. The timed transitions can also be characterized by

SPN models consider only immediate transitions and timed transitions with exponentially distributed trigger times. These transitions model actions, activities, and events. A variety of activities can be modeled through the use of constructor throughput subnets and s-transitions. These constructs are used to represent expolinomial distributions, such as the Erlang,

The phase approximation technique can be applied to model non-exponential actions, activities, and events through moment matching. The presented method calculates the first moment around the origin (average) and the second central moment (variance) and estimates the

Performance and dependability data measured or obtained from a system (empirical distribution) with mean μ and standard deviation σ may have their approximate stochastic behavior through the phase approximation technique. The inverse of the variation coefficient of the data measured or obtained from a system Eq. (11) allows the selection of the expolinomial distribution that best adapts to the empirical distribution. This empirical distribution can be continuous or discrete. Among the continuous distributions, there are: Normal, Lognormal, Weibull, Gamma, Continuous Uniform, Pareto, Beta and Triangular and among the discrete distribu-

> 1 CV <sup>¼</sup> <sup>μ</sup><sup>D</sup> σ<sup>D</sup>

smaller than one, the data must be characterized by a hyperexponential distribution.

The Petri net described in Figure 6 represents a timed activity with generic probability distribution. Depending on the inverse of the variation coefficient of the measured data (Eq. (11)), the respective activity has one of these distributions attributed: Erlang, Hypoexponential or Hyperexponential. When the inverse of the variation coefficient is an integer and different from one, the data must be characterized by the Erlang distribution. When the inverse of the variation coefficient is a number greater than one (but not an integer), the data are represented by the hypoexponential distribution. When the inverse of the variation coefficient is a number

(11)

different firing semantics named single server, multiple server and infinite server [5].

6.3. Phase approximation technique

16 Dependability Engineering

respective moments of the s-transition [9].

Figure 6. Empirical distribution.

hypoexponential and hyperexponential distributions [9].

tions there are: Geometric, Poisson and Discrete Uniform [8].

The dependability metrics can be calculated using state space-based models (e.g., SPN) and combinatorial models (e.g., RBD). The RBDs have an advantage over the provision of results, as present faster calculations through their formulas than the simulations and the numerical analyzes of the SPNs. However, SPNs have a greater power of representation [3, 17].

State space-based models can describe dependencies that allow the representation of complex redundancy mechanisms. However, these models can generate a very large or even infinite number of states when they represent highly complex systems [3, 12, 17].

The combination of state space-based models and combinatorial models allows for the reduction of complexity in the representation of systems. RBD models can represent the components of the cloud computing [6]. These RBD models are used to estimate the availability and downtime of the cloud computing when there is little dependency relation between the components of this environment and the redundancy mechanisms adopted. If there is a need to represent a greater dependency between the components of the cloud computing and the redundancy mechanisms used, SPN models are used to represent the computational cloud systems [11].

Figure 7 shows a basic SPN model that allows a representation of the cloud computing. In this SPN model, the ON and OFF places represent the working or faulted computational cloud. The attributes of the transitions of this SPN model are presented in Table 1.

Figure 8 shows a basic RBD model that allows a representation of the cloud computing. The parameters of the RBD model are presented in the Table 2.

Figure 7. Basic SPN model.


Table 1. Attributes of the SPN model transitions.

Figure 8. Basic RBD model.


Table 2. Parameters of the RBD model.

#### 7.1. Cloud computing model

Cloud computing consists of the Cloud Controller (Controller), Node Controller (Node) and Network equipment (Network). Figure 9 shows the RBD model of the cloud computing. The parameters of the RBD model of cloud computing are presented in Table 3. Figure 10 shows the SPN model of the cloud computing. The attributes of the SPN Model Transitions of cloud computing are presented in Table 4. In RBD and SPN models, the cloud controller and node controller are configured on different physical machines. The node controller enables the instantiation of virtual machines. The physical machines where the components of cloud computing are configured are connected through a switch and a router. All components of cloud computing must be operational for the cloud computing to be operational. These components can be described as Controller, Node, and Network. In this way, the operating mode of cloud computing is OM = (Controller ∧ Node ∧ Network).

controller in hot standby is OM = ((ControllerMain ∨ ControllerStandby) Λ Node Λ Network)). Figure 11 shows the RBD model adopted to estimate the availability of cloud computing with

Transition Type Time Weight Concurrence Enable function

exp XMTTF – SS –

exp XMTTR – SS –

CloudMTTF imme – 1 – ((#ControllerON = 0)OR(#NodeON = 0)OR

CloudMTTR imme – 1 – NOT((#ControllerON = 0)OR(#NodeON = 0)

(#NetworkON = 0))

Modeling Strategies to Improve the Dependability of Cloud Infrastructures

http://dx.doi.org/10.5772/intechopen.71498

19

OR(#NetworkON = 0))

Cloud computing consists of the Cloud Controller (Controller), Node Controller (Node) and Network equipment (Network). The Cloud Controller (Controller) has a cold standby redundancy, but the other components (Node and Network) can also be assigned this redundancy. The main cloud controller (ControllerMain) is operational and the redundant cloud controller (ControllerStandby) is non-active. The redundant cloud controller is not operational waiting to

Figure 11. RBD model of the cloud computing with redundant cloud controller in hot standby.

redundant cloud controller in hot standby.

Table 4. Attributes of the SPN model transitions of the cloud computing.

Figure 10. SPN model of the cloud computing.

ControllerMTTF, NodeMTTF,

ControllerMTTR, NodeMTTR,

NetworkMTTF

NetworMTTR

Cloud computing consists of the Cloud Controller (Controller), Node Controller (Node) and Network equipment (Network). The Cloud Controller (Controller) has a hot standby redundancy, but the other components (Node and Network) can also be assigned this redundancy. The main cloud controller (ControllerMain) and the redundant cloud controller (ControllerStandby) in hot standby are operational [3, 4]. The operating mode of cloud computing with redundant cloud

Figure 9. RBD model of the cloud computing.


Table 3. Parameters of the RBD model of the cloud computing.

Modeling Strategies to Improve the Dependability of Cloud Infrastructures http://dx.doi.org/10.5772/intechopen.71498 19

Figure 10. SPN model of the cloud computing.

7.1. Cloud computing model

Figure 9. RBD model of the cloud computing.

Table 2. Parameters of the RBD model.

Figure 8. Basic RBD model.

18 Dependability Engineering

of cloud computing is OM = (Controller ∧ Node ∧ Network).

Parameters Description

Table 3. Parameters of the RBD model of the cloud computing.

Parameters Description

MTTFBlock Mean Time to Failure MTTRBlock Mean Time to Repair

Cloud computing consists of the Cloud Controller (Controller), Node Controller (Node) and Network equipment (Network). Figure 9 shows the RBD model of the cloud computing. The parameters of the RBD model of cloud computing are presented in Table 3. Figure 10 shows the SPN model of the cloud computing. The attributes of the SPN Model Transitions of cloud computing are presented in Table 4. In RBD and SPN models, the cloud controller and node controller are configured on different physical machines. The node controller enables the instantiation of virtual machines. The physical machines where the components of cloud computing are configured are connected through a switch and a router. All components of cloud computing must be operational for the cloud computing to be operational. These components can be described as Controller, Node, and Network. In this way, the operating mode

Cloud computing consists of the Cloud Controller (Controller), Node Controller (Node) and Network equipment (Network). The Cloud Controller (Controller) has a hot standby redundancy, but the other components (Node and Network) can also be assigned this redundancy. The main cloud controller (ControllerMain) and the redundant cloud controller (ControllerStandby) in hot standby are operational [3, 4]. The operating mode of cloud computing with redundant cloud

MTTFController, MTTFNode, MTTFNetwork Mean Time to Failure of the controller, node and network MTTRController, MTTRNode, MTTRNetwork Mean Time to Repair of the controller, node and network


Table 4. Attributes of the SPN model transitions of the cloud computing.

controller in hot standby is OM = ((ControllerMain ∨ ControllerStandby) Λ Node Λ Network)). Figure 11 shows the RBD model adopted to estimate the availability of cloud computing with redundant cloud controller in hot standby.

Cloud computing consists of the Cloud Controller (Controller), Node Controller (Node) and Network equipment (Network). The Cloud Controller (Controller) has a cold standby redundancy, but the other components (Node and Network) can also be assigned this redundancy. The main cloud controller (ControllerMain) is operational and the redundant cloud controller (ControllerStandby) is non-active. The redundant cloud controller is not operational waiting to

Figure 11. RBD model of the cloud computing with redundant cloud controller in hot standby.

be activated when the main cloud controller fails. Thus, when the main cloud controller, the activation of the redundant cloud controller occurs in a certain period of time. This period is named Mean Time to Active (MTA) [3, 4]. The operating mode of cloud computing with redundant cloud controller in cold standby is OM = ((ControllerMain ∨ ControllerStandby) Λ Node Λ Network)). Figure 12 shows the SPN model adopted to estimate the availability of cloud computing with redundant cloud controller in cold standby.

Cloud computing consists of the Cloud Controller (Controller), Node Controller (Node) and Network equipment (Network). The Cloud Controller (Controller) has a warm standby redundancy, but the other components (Node and Network) can also be assigned this redundancy. The main cloud controller (ControllerMain) is based on a non-active redundant cloud controller (ControllerStandby) that waits to be activated when the main cloud controller fails. The difference with respect to cold standby redundancy is that the main cloud controller and the redundant cloud controller have an λ failure rate when they are in operation, but the redundant cloud controller has a failure rate φ when it is de-energized, considering that 0 ≤ λ ≤ φ [3, 4]. The redundant cloud controller (ControllerStandby) starts in idle mode. When the main cloud controller (ControllerMain) fails, the timed SpareActive transition triggers. This fire represents the start of the redundant cloud controller operation. The time associated with the SpareActive timed transition represents the Mean Time to Active (MTA). The SpareNActive immediate transition represents the return of the main module to the operational mode. The operating mode of cloud computing with redundant cloud controller in warm standby is OM = ((ControllerMain ∨ ControllerStandby) Λ Node Λ Network)). Figure 13 shows the SPN model adopted to estimate the availability of cloud computing with redundant cloud controller in warm standby.

8. Conclusions

Author details

References

Library; 2012

This chapter presents concepts on dependability, redundancy mechanisms, stochastic Petri nets and reliability block diagram. In addition, this chapter also shows how the mathematical formalisms stochastic Petri nets and reliability block diagrams can be adopted for modeling cloud infrastructures with cold standby, warm standby and hot standby redundancy mechanisms. Reliability block diagrams is adopted to model cloud infrastructures with the redundancy mechanism cold standby and stochastic Petri nets is used to model cloud infrastructures

Modeling Strategies to Improve the Dependability of Cloud Infrastructures

http://dx.doi.org/10.5772/intechopen.71498

21

Department of Statistics and Informatics, Federal Rural University of Pernambuco, Brazil

[1] Bauer E, Adams R. Reliability and Availability of Cloud Computing. Wiley Online

[2] Laprie JCC, Avizienis A, Kopetz H. Dependability: Basic Concepts and Terminology.

with the redundancy mechanisms warm standby and hot standby.

Figure 13. SPN model of the cloud computing with redundant cloud controller in warm standby.

Erica Teixeira Gomes de Sousa\* and Fernando Antonio Aires Lins

Secaucus, NJ, USA: Springer-Verlag New York, Inc; 1992

\*Address all correspondence to: erica.sousa@ufrpe.br

Figure 12. SPN model of the cloud computing with redundant cloud controller in cold standby.

Modeling Strategies to Improve the Dependability of Cloud Infrastructures http://dx.doi.org/10.5772/intechopen.71498 21

Figure 13. SPN model of the cloud computing with redundant cloud controller in warm standby.

#### 8. Conclusions

be activated when the main cloud controller fails. Thus, when the main cloud controller, the activation of the redundant cloud controller occurs in a certain period of time. This period is named Mean Time to Active (MTA) [3, 4]. The operating mode of cloud computing with redundant cloud controller in cold standby is OM = ((ControllerMain ∨ ControllerStandby) Λ Node Λ Network)). Figure 12 shows the SPN model adopted to estimate the availability of

Cloud computing consists of the Cloud Controller (Controller), Node Controller (Node) and Network equipment (Network). The Cloud Controller (Controller) has a warm standby redundancy, but the other components (Node and Network) can also be assigned this redundancy. The main cloud controller (ControllerMain) is based on a non-active redundant cloud controller (ControllerStandby) that waits to be activated when the main cloud controller fails. The difference with respect to cold standby redundancy is that the main cloud controller and the redundant cloud controller have an λ failure rate when they are in operation, but the redundant cloud controller has a failure rate φ when it is de-energized, considering that 0 ≤ λ ≤ φ [3, 4]. The redundant cloud controller (ControllerStandby) starts in idle mode. When the main cloud controller (ControllerMain) fails, the timed SpareActive transition triggers. This fire represents the start of the redundant cloud controller operation. The time associated with the SpareActive timed transition represents the Mean Time to Active (MTA). The SpareNActive immediate transition represents the return of the main module to the operational mode. The operating mode of cloud computing with redundant cloud controller in warm standby is OM = ((ControllerMain ∨ ControllerStandby) Λ Node Λ Network)). Figure 13 shows the SPN model adopted to estimate the availability of cloud computing with redundant cloud controller

cloud computing with redundant cloud controller in cold standby.

Figure 12. SPN model of the cloud computing with redundant cloud controller in cold standby.

in warm standby.

20 Dependability Engineering

This chapter presents concepts on dependability, redundancy mechanisms, stochastic Petri nets and reliability block diagram. In addition, this chapter also shows how the mathematical formalisms stochastic Petri nets and reliability block diagrams can be adopted for modeling cloud infrastructures with cold standby, warm standby and hot standby redundancy mechanisms. Reliability block diagrams is adopted to model cloud infrastructures with the redundancy mechanism cold standby and stochastic Petri nets is used to model cloud infrastructures with the redundancy mechanisms warm standby and hot standby.

#### Author details

Erica Teixeira Gomes de Sousa\* and Fernando Antonio Aires Lins

\*Address all correspondence to: erica.sousa@ufrpe.br

Department of Statistics and Informatics, Federal Rural University of Pernambuco, Brazil

#### References


[3] Kuo W, Zuo MJ. Optimal Reliability Modeling: Principles and Applications. Wiley; 2002

**Chapter 3**

**Provisional chapter**

**Continuous Anything for Distributed Research Projects**

International research projects involve large, distributed teams made up from multiple institutions. These teams create research artefacts that need to work together in order to demonstrate and ship the project results. Yet, in these settings the project itself is almost never in the core interest of the partners in the consortium. This leads to a weak integration incentive and consequently to last minute efforts. This in turn results in Big Bang integration that imposes huge stress on the consortium and produces only nonsustainable results. In contrast, industry has been profiting from the introduction of agile development methods backed by Continuous Delivery, Continuous Integration, and Continuous Deployment. In this chapter, we identify shortcomings of this approach for research projects. We show how to overcome those in order to adopt all three methodologies regarding that scope. We also present a conceptual, as well as a tooling framework to realise the approach as Continuous Anything. As a result, integration becomes a core element of the project plan. It distributes and shares responsibility of integration work among all partners, while at the same time clearly holding individuals responsible for dedicated software components. Through a high degree of automation, it keeps the overall integration work low, but still provides immediate feedback on the quality of the software. Overall, we found this concept useful and beneficial in several EU-funded research projects, where it significantly lowered integration effort and improved quality of the

**Continuous Anything for Distributed Research Projects**

DOI: 10.5772/intechopen.72045

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

and reproduction in any medium, provided the original work is properly cited.

The rise of agile software engineering strategies has leveraged the realisation of Continuous Integration proposed by Grady Booch as early as 1991 [18]. Continuous Integration propagates a constant integration of changes to code as opposed to Big Bang integration done at the end

software components while also enhancing collaboration as a whole.

**Keywords:** Continuous Delivery, Continuous Integration, Continuous Deployment, project management, software quality, DevOps, distributed software development

Simon Volpert, Frank Griesinger and

http://dx.doi.org/10.5772/intechopen.72045

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

Simon Volpert, Frank Griesinger and Jörg Domaschka

Jörg Domaschka

**Abstract**

**1. Introduction**


**Provisional chapter**

## **Continuous Anything for Distributed Research Projects Continuous Anything for Distributed Research Projects**

DOI: 10.5772/intechopen.72045

Simon Volpert, Frank Griesinger and Jörg Domaschka Simon Volpert, Frank Griesinger and Jörg Domaschka Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.72045

#### **Abstract**

[3] Kuo W, Zuo MJ. Optimal Reliability Modeling: Principles and Applications. Wiley; 2002 [4] Rupe JW. Reliability of computer systems and networks fault tolerance, analysis, and

[5] Maciel P, Trivedi K, Matias R, Kim D. Performance and Dependability in Service Com-

[6] Xie M, Dai YS, Poh KL. Computing System Reliability: Models and Analysis. US: Springer;

[7] Ebeling CE. An Introduction to Reliability and Maintainability Engineering. Waveland Pr

[8] Schmidt K. High Availability and Disaster Recovery: Concepts, Design, Implementation.

[9] Sahner RA, Trivedi K, Puliafito A. Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package. New York,

[10] Trivedi KS. Probability & Statistics with Reliability, Queuing and Computer Science Appli-

[11] German R. Performance Analysis of Communication Systems with Non-Markovian Sto-

[12] Marsan MA, Balbo G, Conte G, Donatelli S, Franceschinis G. Modelling with Generalized Stochastic Petri Nets, ACM SIGMETRICS Performance Evaluation Review. Vol. 26. New

[13] Trivedi KS, Hunter S, Garg S, Fricks R. Reliability analysis techniques explored through a communication network example. Citeseer, International Workshop on Computer-Aided

[14] Smith DJ. Reliability, Maintainability and Risk: Practical Methods for Engineers. Butterworth-

[15] Murata T. Petri nets: Properties, analysis and applications. IEEE, Proceedings of the IEEE.

[16] Maciel PRM, Lins RD, Cunha PRF. Introduction of the Petri Net and Applied. Campinas,

[17] Balbo G. Introduction to Stochastic Petri Nets. Lectures on Formal Methods and Performance Analysis: First EEF/Euro Summer School on Trends in Computer Science. Berg en

Dal, The Netherlands, July 3–7, 2000: Revised Lectures: Springer; 2000

chastic Petri Nets. New York, NY, USA: John Wiley & Sons, Inc; 2000

puting: Concepts, Techniques and Research Directions. IGI Global; 2011

design. IIE Transactions. 2003;35(6):586-587

Vol. 22. Berlin Heidelberg: Springer-Verlag; 2006

Design, Test, and Evaluation for Dependability; 1996

2004

22 Dependability Engineering

Inc; 2009

US: Springer; 1996

York, NY, USA; 1998

Heinemann; 2011

1989;77(4):541-580

SP: X Escola de Computação; 1996

cations. 2nd ed. Wiley; 2001

International research projects involve large, distributed teams made up from multiple institutions. These teams create research artefacts that need to work together in order to demonstrate and ship the project results. Yet, in these settings the project itself is almost never in the core interest of the partners in the consortium. This leads to a weak integration incentive and consequently to last minute efforts. This in turn results in Big Bang integration that imposes huge stress on the consortium and produces only nonsustainable results. In contrast, industry has been profiting from the introduction of agile development methods backed by Continuous Delivery, Continuous Integration, and Continuous Deployment. In this chapter, we identify shortcomings of this approach for research projects. We show how to overcome those in order to adopt all three methodologies regarding that scope. We also present a conceptual, as well as a tooling framework to realise the approach as Continuous Anything. As a result, integration becomes a core element of the project plan. It distributes and shares responsibility of integration work among all partners, while at the same time clearly holding individuals responsible for dedicated software components. Through a high degree of automation, it keeps the overall integration work low, but still provides immediate feedback on the quality of the software. Overall, we found this concept useful and beneficial in several EU-funded research projects, where it significantly lowered integration effort and improved quality of the software components while also enhancing collaboration as a whole.

**Keywords:** Continuous Delivery, Continuous Integration, Continuous Deployment, project management, software quality, DevOps, distributed software development

#### **1. Introduction**

The rise of agile software engineering strategies has leveraged the realisation of Continuous Integration proposed by Grady Booch as early as 1991 [18]. Continuous Integration propagates a constant integration of changes to code as opposed to Big Bang integration done at the end

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

of a development cycle. It has, in turn, paved the path to Continuous Delivery and Continuous Deployment of software components and entire software platforms. The core idea of Continuous Delivery is to be able to roll out new releases at any time and not only at the end of larger development cycles. In order to reach this goal, Continuous Delivery demands the automation of all steps required to compile, bundle, test, and release the software. Testing ranges from unit tests targeting a single software component, over integration tests, acceptance test to user acceptance tests. While this approach is emergently successful in industry, it is barely used for scientific software, neither is it used in collaborative research environments.

of the task, automating the necessary steps, monitoring the status of, and gaining confidence in the produced software. The main contribution of this chapter is a concise technical and organisational framework for Continuous Integration, Continuous Delivery, and Continuous Deployment in distributed (research) projects. The framework is based on our experience in half a dozen midsized EC projects of 5–25 partners and several smaller sized national-funded research projects. It

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 25

The rest of this chapter is structured as follows. Section 2 identifies the requirements in more detail and presents related work. Sections 3–5 introduce background on Continuous Integration, Continuous Delivery, and Continuous Deployment, respectively. Section 6 presents our framework on both conceptual and tooling level while Section 7 discusses the approach. Section 8

International ICT research projects involve large, distributed teams (consortia) made up from multiple companies or institutions, so called partners or beneficiaries. These teams create research software artefacts that need to work together in order to demonstrate and ship the project results. In the following, we analyse the challenges of such constellations, and why this requires a special integration strategy. Finally, we carve out the requirements towards

Development in distributed, that is, non co-located, teams is challenging, as the distribution aspect hinders communication. For instance, meetings and synchronisation actions barely can happen in a timely or even ad hoc manner, causing delays. From a technical point of view this may lead to diverging developments at different locations. From an organisational point of view,

Koetter et al. [10] identify major problems in distributed teams and particularly with respect to research projects. At the core of their analysis, they identify the team distribution and lacking stakeholder commitment as major problems. The former complicates team communication leading to a lack of internal communication. The latter, a consequence of diverging goals and different

causes is further increased by different cultural and technical background, etiquette, company

In order to cope with the diversity and resulting centrifugal forces, it is common that project management applies rules: these range from a common toolset and document template to regular virtual and physical meetings; both intended to improve communication. Additionally,

Please note that "integration" as in "Continuous Integration" has a different meaning than "prototype integration". As defined later, the first tackles integration on code level, while the later addresses the integration of distributed software

The impact of these

(research) interests, leads to a lack of incentives for prototype integration.<sup>1</sup>

policies, and high personal fluctuation in research projects.

architecture. In order to avoid misunderstandings, we will always use the full terms.

fairly distributes work among partners and improves overall code quality.

concludes and gives an outlook on future work.

**2. Problem statement and related work**

such an integration strategy and discuss related work.

**2.1. Challenges with distributed research teams**

it causes overhead.

1

In contrast to industrial projects and even open source software projects, distributed research projects (for instance, large(r) EU-funded ICT projects), the project itself is almost never in the core interest of the partners forming the project consortium. Instead, every partner is interested in the niche aspect that made him join the project and that makes the consortium look complete and gives the consortium a complementary appearance. In reality, though, the agendas of the project partners are often driven by their individual interests and particularly academic partners do have an obvious interest in the actual research aspect of the work and are less focussed on the provisioning of dependable, sustainable software artefacts. Neither are they preliminary interested in the common, integrated, stable software platform. In practise, this may mean that Partner A wants to improve on an algorithm they have, while Partner B would define a domain specific language (DSL) for a specific scope and Partner C will provide an improved kernel module for handling I/O on solid state disks. From a research point of view, this means that the main work for Partner A will be on the definition of the algorithm, its implementation, and evaluation in some limited, publishable scope. For Partner B, the main work will be on the definition of the DSL and on applying and realising it in two or three use case scenarios. Partner C's work will be on the definition of the new approach and on the realisation and evaluation of the kernel module, probably for one specific version of Linux.

While the behaviour of all three partners is fully legitimate and understandable, it is the nature of a distributed research project that interdependencies between parts of the software exist. Usually software integration is required at certain project milestones where prototypes should be released and new, emergent features be demonstrated to the public or at least the funders. Here, the lack of common interest in the project in combination with the described "research style" code quality makes the integration a painful, cumbersome, and frustrating task. Our experience shows that in many projects the task of integrating software from different partners is outsourced in an own project work package and then assigned to one or at most two partners that were not or only marginally involved in improving the algorithm from Partner A, developing the DSL from Partner B, and realising the kernel module of Partner C. Furthermore, in many projects the whole integration of all software components is done in a Big Bang style before a review or before an obligatory software release and even worse often performed by a single individual. This poor devil ends up integrating and fixing several dozen software components (s)he has not developed, is not owning, and has never been responsible for.

We argue that instead of putting all integration responsibility and work on the shoulders of a single individual, it is way better to spread out work among project partners and make it everybody's task. We further believe that the techniques and strategies offered by **Continuous Integration**, **Continuous Delivery**, and **Continuous Deployment** are beneficial for enforcing the distribution of the task, automating the necessary steps, monitoring the status of, and gaining confidence in the produced software. The main contribution of this chapter is a concise technical and organisational framework for Continuous Integration, Continuous Delivery, and Continuous Deployment in distributed (research) projects. The framework is based on our experience in half a dozen midsized EC projects of 5–25 partners and several smaller sized national-funded research projects. It fairly distributes work among partners and improves overall code quality.

The rest of this chapter is structured as follows. Section 2 identifies the requirements in more detail and presents related work. Sections 3–5 introduce background on Continuous Integration, Continuous Delivery, and Continuous Deployment, respectively. Section 6 presents our framework on both conceptual and tooling level while Section 7 discusses the approach. Section 8 concludes and gives an outlook on future work.

#### **2. Problem statement and related work**

of a development cycle. It has, in turn, paved the path to Continuous Delivery and Continuous Deployment of software components and entire software platforms. The core idea of Continuous Delivery is to be able to roll out new releases at any time and not only at the end of larger development cycles. In order to reach this goal, Continuous Delivery demands the automation of all steps required to compile, bundle, test, and release the software. Testing ranges from unit tests targeting a single software component, over integration tests, acceptance test to user acceptance tests. While this approach is emergently successful in industry, it is barely used for scientific

In contrast to industrial projects and even open source software projects, distributed research projects (for instance, large(r) EU-funded ICT projects), the project itself is almost never in the core interest of the partners forming the project consortium. Instead, every partner is interested in the niche aspect that made him join the project and that makes the consortium look complete and gives the consortium a complementary appearance. In reality, though, the agendas of the project partners are often driven by their individual interests and particularly academic partners do have an obvious interest in the actual research aspect of the work and are less focussed on the provisioning of dependable, sustainable software artefacts. Neither are they preliminary interested in the common, integrated, stable software platform. In practise, this may mean that Partner A wants to improve on an algorithm they have, while Partner B would define a domain specific language (DSL) for a specific scope and Partner C will provide an improved kernel module for handling I/O on solid state disks. From a research point of view, this means that the main work for Partner A will be on the definition of the algorithm, its implementation, and evaluation in some limited, publishable scope. For Partner B, the main work will be on the definition of the DSL and on applying and realising it in two or three use case scenarios. Partner C's work will be on the definition of the new approach and on the realisation and evaluation of the kernel module, probably for one specific version of Linux.

While the behaviour of all three partners is fully legitimate and understandable, it is the nature of a distributed research project that interdependencies between parts of the software exist. Usually software integration is required at certain project milestones where prototypes should be released and new, emergent features be demonstrated to the public or at least the funders. Here, the lack of common interest in the project in combination with the described "research style" code quality makes the integration a painful, cumbersome, and frustrating task. Our experience shows that in many projects the task of integrating software from different partners is outsourced in an own project work package and then assigned to one or at most two partners that were not or only marginally involved in improving the algorithm from Partner A, developing the DSL from Partner B, and realising the kernel module of Partner C. Furthermore, in many projects the whole integration of all software components is done in a Big Bang style before a review or before an obligatory software release and even worse often performed by a single individual. This poor devil ends up integrating and fixing several dozen software com-

ponents (s)he has not developed, is not owning, and has never been responsible for.

We argue that instead of putting all integration responsibility and work on the shoulders of a single individual, it is way better to spread out work among project partners and make it everybody's task. We further believe that the techniques and strategies offered by **Continuous Integration**, **Continuous Delivery**, and **Continuous Deployment** are beneficial for enforcing the distribution

software, neither is it used in collaborative research environments.

24 Dependability Engineering

International ICT research projects involve large, distributed teams (consortia) made up from multiple companies or institutions, so called partners or beneficiaries. These teams create research software artefacts that need to work together in order to demonstrate and ship the project results. In the following, we analyse the challenges of such constellations, and why this requires a special integration strategy. Finally, we carve out the requirements towards such an integration strategy and discuss related work.

#### **2.1. Challenges with distributed research teams**

Development in distributed, that is, non co-located, teams is challenging, as the distribution aspect hinders communication. For instance, meetings and synchronisation actions barely can happen in a timely or even ad hoc manner, causing delays. From a technical point of view this may lead to diverging developments at different locations. From an organisational point of view, it causes overhead.

Koetter et al. [10] identify major problems in distributed teams and particularly with respect to research projects. At the core of their analysis, they identify the team distribution and lacking stakeholder commitment as major problems. The former complicates team communication leading to a lack of internal communication. The latter, a consequence of diverging goals and different (research) interests, leads to a lack of incentives for prototype integration.<sup>1</sup> The impact of these causes is further increased by different cultural and technical background, etiquette, company policies, and high personal fluctuation in research projects.

In order to cope with the diversity and resulting centrifugal forces, it is common that project management applies rules: these range from a common toolset and document template to regular virtual and physical meetings; both intended to improve communication. Additionally,

<sup>1</sup> Please note that "integration" as in "Continuous Integration" has a different meaning than "prototype integration". As defined later, the first tackles integration on code level, while the later addresses the integration of distributed software architecture. In order to avoid misunderstandings, we will always use the full terms.

most projects announce a central technical responsible whose role is to break ties in technical discussions. Finally, technical work is often separated such that local teams at partner sites work independently on certain sub topics producing isolated assets.

*2.3.1. General requirements*

of the architecture.

demo, or a webinar.

This section covers requirements towards automating prototype integration. We present them from the perspective of a research project, but they can be applied in different settings.

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 27

**R.1 (distribute work among partners):** As partners do not have common interest and no incentives for prototype integration, it is necessary to not burden one team or even one indi-

**R.2 (reduce manual burden):** Integration work distracts researchers from their work and so does testing. Due to that, as much as possible of the prototype integration work and testing

**R.3 (denote responsible persons):** With high automation degree and the capability to identify non-working portions of the code, denoting responsible persons for individual software components, libraries, and features, allows explicitly tasking those for fixing the non-working parts

**R.4 (make the product easy to start and use):** Having a project outcome that is easy to install, to start, and to demonstrate, tremendously reduces the burden when planning for a review, a

**R.5 (clarify big picture and software dependencies):** In large software projects, it is often the case that the big picture is forgotten. In particular, in research projects, researchers lose themselves in details of research questions. Hence, it is important to keep an eye on the overall

**R.6 (make the software status visible):** Making the product (including its sub-products) visible helps consortium members to understand what the others are doing. It also helps use case

Besides the generic requirements that can be found in many distributed teams, the fact that distributed research projects are often executed by loosely coupled beneficiaries creates fur-

**R.A (cater for closed code or even unavailable source code and binaries):** Due to different commercial interests of partners, it may be the case that some of them release source code only in a restricted manner or not at all. In some cases, partners do not even release binaries to the rest of the consortium. The overall prototype integration strategy has to be able to deal

**R.B (support fluctuation of team members):** Research projects have a very high fluctuation of team members. As the budget is fixed, it happens that more people come into the project towards the end, if budget is still available. On the other hand, if the project is short on budget, expensive, that is, senior researchers are moved away and juniors or even undergrads join.

This forbids that there are hidden, that is, implicit, agreements between individuals.

architecture and ensure that the interactions between components work as intended.

partners giving feedback on the project and the work currently done.

*2.3.2. Specific challenges of research projects*

ther technical challenges.

with this.

vidual with this task, but to achieve a fair distribution of work among partners.

should be automated. This includes reporting if code is currently working or not.

From our experience, these measures usually work fine and minimise the tension in the consortium. The lack of common goals usually gets masked by introducing a storyline every partner can agree on. Yet, we claim that the only aspect that cannot be handled by these measures is the work to be done for prototype integration, because it requires that components developed in isolation, work together smoothly despite the weak communication, and common goals. The often-practised Big Bang integration of artefacts causes a lot of work, troubles the consortium, and results in poor software quality.

#### **2.2. The cause for Continuous Anything**

Understanding and accepting that Big Bang integration causes pain and sub-optimal results, leads to the insight that a different prototype integration strategy is needed. Ironically, software development industry was facing similar issues decades ago [16] which led to abandoning of the waterfall model and the introduction of so called agile development methodologies. These were stated in the agile manifesto [17] and are being realised by methodologies such as extreme programming, Scrum, or Kanban.

All of these methodologies assume co-located teams with large common interests and a high intrinsic motivation to deliver high quality, usable software. In consequence, they cannot be applied directly to research projects that do not fulfil the necessary preconditions. Nonetheless, at the core of their prototype integration<sup>2</sup> methodology, agile methodologies rely on a highly automated, frequently executed, and constant process to reduce the possibility for human errors and to obtain continuously executable software artefacts.

While such an approach requires an upfront and constant invest in prototype integration, the overall amount of effort needed per partner and particularly per consortium is likely to be a lot less compared to Big Bang integration. This is due to the fact that changes are small and can be easily reviewed. Moreover, the use of automation allows dealing with the complexity of even larger and more diverse teams.

#### **2.3. Constraints and requirements**

We claim that automation can reduce the pain for prototype integration in (large) research projects. Yet, as with improving communication within the consortium, introducing an automated process, this improvement will not happen for free. Work from the project management is needed to establish and enforce such a process, which may cause resistance.

Therefore, our major aim is to minimise the upfront investment of project partners and the management effort needed to enforce the strategy. The overall goal is to develop an automated prototype integration schema that takes into account the specific needs of research consortia. This is broken down into particular requirements and challenges presented in the following sections.

<sup>2</sup> For industry it should rather be "product integration".

#### *2.3.1. General requirements*

most projects announce a central technical responsible whose role is to break ties in technical discussions. Finally, technical work is often separated such that local teams at partner sites

From our experience, these measures usually work fine and minimise the tension in the consortium. The lack of common goals usually gets masked by introducing a storyline every partner can agree on. Yet, we claim that the only aspect that cannot be handled by these measures is the work to be done for prototype integration, because it requires that components developed in isolation, work together smoothly despite the weak communication, and common goals. The often-practised Big Bang integration of artefacts causes a lot of work, troubles the consor-

Understanding and accepting that Big Bang integration causes pain and sub-optimal results, leads to the insight that a different prototype integration strategy is needed. Ironically, software development industry was facing similar issues decades ago [16] which led to abandoning of the waterfall model and the introduction of so called agile development methodologies. These were stated in the agile manifesto [17] and are being realised by methodologies such as

All of these methodologies assume co-located teams with large common interests and a high intrinsic motivation to deliver high quality, usable software. In consequence, they cannot be applied directly to research projects that do not fulfil the necessary preconditions.

rely on a highly automated, frequently executed, and constant process to reduce the possibil-

While such an approach requires an upfront and constant invest in prototype integration, the overall amount of effort needed per partner and particularly per consortium is likely to be a lot less compared to Big Bang integration. This is due to the fact that changes are small and can be easily reviewed. Moreover, the use of automation allows dealing with the complexity of even

We claim that automation can reduce the pain for prototype integration in (large) research projects. Yet, as with improving communication within the consortium, introducing an automated process, this improvement will not happen for free. Work from the project manage-

Therefore, our major aim is to minimise the upfront investment of project partners and the management effort needed to enforce the strategy. The overall goal is to develop an automated prototype integration schema that takes into account the specific needs of research consortia. This is broken down into particular requirements and challenges presented in the following sections.

ment is needed to establish and enforce such a process, which may cause resistance.

ity for human errors and to obtain continuously executable software artefacts.

methodology, agile methodologies

work independently on certain sub topics producing isolated assets.

tium, and results in poor software quality.

26 Dependability Engineering

**2.2. The cause for Continuous Anything**

extreme programming, Scrum, or Kanban.

larger and more diverse teams.

2

**2.3. Constraints and requirements**

For industry it should rather be "product integration".

Nonetheless, at the core of their prototype integration<sup>2</sup>

This section covers requirements towards automating prototype integration. We present them from the perspective of a research project, but they can be applied in different settings.

**R.1 (distribute work among partners):** As partners do not have common interest and no incentives for prototype integration, it is necessary to not burden one team or even one individual with this task, but to achieve a fair distribution of work among partners.

**R.2 (reduce manual burden):** Integration work distracts researchers from their work and so does testing. Due to that, as much as possible of the prototype integration work and testing should be automated. This includes reporting if code is currently working or not.

**R.3 (denote responsible persons):** With high automation degree and the capability to identify non-working portions of the code, denoting responsible persons for individual software components, libraries, and features, allows explicitly tasking those for fixing the non-working parts of the architecture.

**R.4 (make the product easy to start and use):** Having a project outcome that is easy to install, to start, and to demonstrate, tremendously reduces the burden when planning for a review, a demo, or a webinar.

**R.5 (clarify big picture and software dependencies):** In large software projects, it is often the case that the big picture is forgotten. In particular, in research projects, researchers lose themselves in details of research questions. Hence, it is important to keep an eye on the overall architecture and ensure that the interactions between components work as intended.

**R.6 (make the software status visible):** Making the product (including its sub-products) visible helps consortium members to understand what the others are doing. It also helps use case partners giving feedback on the project and the work currently done.

#### *2.3.2. Specific challenges of research projects*

Besides the generic requirements that can be found in many distributed teams, the fact that distributed research projects are often executed by loosely coupled beneficiaries creates further technical challenges.

**R.A (cater for closed code or even unavailable source code and binaries):** Due to different commercial interests of partners, it may be the case that some of them release source code only in a restricted manner or not at all. In some cases, partners do not even release binaries to the rest of the consortium. The overall prototype integration strategy has to be able to deal with this.

**R.B (support fluctuation of team members):** Research projects have a very high fluctuation of team members. As the budget is fixed, it happens that more people come into the project towards the end, if budget is still available. On the other hand, if the project is short on budget, expensive, that is, senior researchers are moved away and juniors or even undergrads join. This forbids that there are hidden, that is, implicit, agreements between individuals.

**R.C (keep track of targeted outcomes):** The fluctuation of team members and the dynamic of IT research lead to often changing technical goals of the research project. As a consequence, the current goals need to be documented and be accessible by all project members in order to keep a common focus. Ideally, they are immediately visible to all contributors.

Regarding academia, there is ongoing effort in bringing Continuous Integration and Continuous Delivery to teaching. Eddy et al. [7] describe how they implement a pipeline for supporting their lecture on modern development practices. An academic view on Continuous Experimentation is brought up by Fagerholm et al. [9] by investigating multiple use-cases of industry partners. They analyse the demands and propose solutions to create experimentation-happy environ-

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 29

On Academia/Industry collaboration Sandberg and Crnkovic [6] and Guillot et al. [8] investigate challenges between those parties and how to solve them with agile methods. Both analyse the adaption of the rather strict scrum methodology on said collaboration in multiple case studies with positive results. However, also that approach is highly dependent on team

Koetter et al. analyse the characteristics and problems of software development in distributed teams in research projects [10]. They give a literature review of common problems and typical solutions. With the focus on Software Architecture, the authors summarise the issues and

This section gives an introduction into the concepts of Continuous Integration. The next subsection defines the scope of the methodology and gives a definition. Later sub-sections introduce the general concept and the Continuous Integration loop in more detail and introduce

Continuous Integration describes a methodology to always have the latest successfully built and tested version of a software component available [2]. At its core, it aims at removing diverging developments of different developers by enforcing that all the code changes of every developer are integrated with each other to a shared mainline "all the time" (hence, it focuses on the integration of code of a single build artefact). Integrating small changes at high frequency

In Continuous Integration, the process of building and testing the component is usually described by scripts and hence, easy to reproduce by any developer and easy to automate. In consequence, it overcomes the issue of hard-to-reproduce builds that is a reoccurring problem in traditional development environments where developers usually have their code being

A successful adoption of Continuous Integration in any environment has to rely on automation in order to achieve a permanent feedback loop. This is illustrated in **Figure 1**. At some point in

reduces the chance of diverging code and the pain of code integration.

built and run inside their different IDE in terms of version or even brand.

ments utilising Continuous Integration and Continuous Delivery.

sketch solution approaches on a methodological level.

**3. Background: Continuous Integration**

basics to testing and best practises.

**3.1. Definition and scope**

**3.2. The integration pipeline**

agreement.

**R.D (support different configurations):** Often research projects do not build generic solutions, but only demonstrators for specific use cases. The prototype integration strategy has to be able to deal with this and provide the capability to set-up different environments.

**R.E (allow different programming languages and development methodologies):** Research projects barely start from scratch, as many partners continue earlier work. Further, the knowledge and suitability of languages is very specific to problem domains. Therefore, a prototype integration strategy has to be open to different programming languages.

#### **2.4. Approach**

In industrial contexts solving integration and communication issues is realised by introducing three kinds of orthogonal, but complementary approaches: *Continuous Integrati*on handles the integration on code level per component. It builds and tests the component whenever a new version of the code is available. *Continuous Delivery* is concerned with taking the new version of a component and packing it in a shippable box. In addition, it runs integration tests to ensure the interplay with other components. Finally, *Continuous Delivery* takes the component and installs it in a pre-defined environment.

In Sections 3–5, we show that an adapted process to Continuous Integration, Continuous Delivery, and Continuous Deployment can indeed overcome integration issues for distributed research projects. In addition, Section 6 presents a set-up that is able to deal with the requirements from Section 2.3.

#### **2.5. Related work**

While there is a lot of literature on DevOps [15], agile methods, Continuous Integration, Continuous Delivery [13], and Continuous Deployment not much can be found with respect to academia and academia/industry collaboration. Eckstein provides guidelines for distributed teams [11].

Rother [1] lays part of the foundation of what is today perceived as DevOps by presenting the production pipelines and methodologies of Toyota. Being more on the cultural side of DevOps and CI/CD spectrum, Davis and Daniels [14] and Sharma [12] give some insights on how to bring these ideas to industry.

Especially Continuous Integration was significantly influenced by Duvall et al. [2]. There, the authors describe most of the paradigms important for Continuous Integration. These are still valid today and are considered as de-facto standard. Fowler's influential articles on Continuous Integration, for example [5], and testing, for example, through micro-service scenarios [4], lay the foundation on what is being perceived as Continuous Integration along with best practices.

Regarding academia, there is ongoing effort in bringing Continuous Integration and Continuous Delivery to teaching. Eddy et al. [7] describe how they implement a pipeline for supporting their lecture on modern development practices. An academic view on Continuous Experimentation is brought up by Fagerholm et al. [9] by investigating multiple use-cases of industry partners. They analyse the demands and propose solutions to create experimentation-happy environments utilising Continuous Integration and Continuous Delivery.

On Academia/Industry collaboration Sandberg and Crnkovic [6] and Guillot et al. [8] investigate challenges between those parties and how to solve them with agile methods. Both analyse the adaption of the rather strict scrum methodology on said collaboration in multiple case studies with positive results. However, also that approach is highly dependent on team agreement.

Koetter et al. analyse the characteristics and problems of software development in distributed teams in research projects [10]. They give a literature review of common problems and typical solutions. With the focus on Software Architecture, the authors summarise the issues and sketch solution approaches on a methodological level.

#### **3. Background: Continuous Integration**

This section gives an introduction into the concepts of Continuous Integration. The next subsection defines the scope of the methodology and gives a definition. Later sub-sections introduce the general concept and the Continuous Integration loop in more detail and introduce basics to testing and best practises.

#### **3.1. Definition and scope**

**R.C (keep track of targeted outcomes):** The fluctuation of team members and the dynamic of IT research lead to often changing technical goals of the research project. As a consequence, the current goals need to be documented and be accessible by all project members in order to

**R.D (support different configurations):** Often research projects do not build generic solutions, but only demonstrators for specific use cases. The prototype integration strategy has to be able

**R.E (allow different programming languages and development methodologies):** Research projects barely start from scratch, as many partners continue earlier work. Further, the knowledge and suitability of languages is very specific to problem domains. Therefore, a prototype

In industrial contexts solving integration and communication issues is realised by introducing three kinds of orthogonal, but complementary approaches: *Continuous Integrati*on handles the integration on code level per component. It builds and tests the component whenever a new version of the code is available. *Continuous Delivery* is concerned with taking the new version of a component and packing it in a shippable box. In addition, it runs integration tests to ensure the interplay with other components. Finally, *Continuous Delivery* takes the component

In Sections 3–5, we show that an adapted process to Continuous Integration, Continuous Delivery, and Continuous Deployment can indeed overcome integration issues for distributed research projects. In addition, Section 6 presents a set-up that is able to deal with the requirements

While there is a lot of literature on DevOps [15], agile methods, Continuous Integration, Continuous Delivery [13], and Continuous Deployment not much can be found with respect to academia and academia/industry collaboration. Eckstein provides guidelines for distrib-

Rother [1] lays part of the foundation of what is today perceived as DevOps by presenting the production pipelines and methodologies of Toyota. Being more on the cultural side of DevOps and CI/CD spectrum, Davis and Daniels [14] and Sharma [12] give some insights on how to

Especially Continuous Integration was significantly influenced by Duvall et al. [2]. There, the authors describe most of the paradigms important for Continuous Integration. These are still valid today and are considered as de-facto standard. Fowler's influential articles on Continuous Integration, for example [5], and testing, for example, through micro-service scenarios [4], lay the foundation on what is being perceived as Continuous Integration along with best practices.

keep a common focus. Ideally, they are immediately visible to all contributors.

to deal with this and provide the capability to set-up different environments.

integration strategy has to be open to different programming languages.

**2.4. Approach**

28 Dependability Engineering

from Section 2.3.

**2.5. Related work**

uted teams [11].

bring these ideas to industry.

and installs it in a pre-defined environment.

Continuous Integration describes a methodology to always have the latest successfully built and tested version of a software component available [2]. At its core, it aims at removing diverging developments of different developers by enforcing that all the code changes of every developer are integrated with each other to a shared mainline "all the time" (hence, it focuses on the integration of code of a single build artefact). Integrating small changes at high frequency reduces the chance of diverging code and the pain of code integration.

In Continuous Integration, the process of building and testing the component is usually described by scripts and hence, easy to reproduce by any developer and easy to automate. In consequence, it overcomes the issue of hard-to-reproduce builds that is a reoccurring problem in traditional development environments where developers usually have their code being built and run inside their different IDE in terms of version or even brand.

#### **3.2. The integration pipeline**

A successful adoption of Continuous Integration in any environment has to rely on automation in order to achieve a permanent feedback loop. This is illustrated in **Figure 1**. At some point in

experiment-happy environment and reduces the risk introduced by possible ambiguity of requirements. Further, testing may yield information about code quality and runtime behaviour. Consequently, testing is the main vehicle to ensure reliability of and trust in the code. Obviously, this trust is higher, the higher the test coverage. Due to the many builds per day, testing can only be realised in an automated manner. In consequence, high auto-

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 31

While there is no general agreement on a fixed number for code coverage percentage, there are suggestions and guidelines [3] about that metric. In practise, however, the desired coverage

As with the whole Continuous Integration methodology, it is important that all team members have understood the importance of testing and practise it. Unit and Integration Tests are the minimum amount of tests necessary to achieve that. Consequently, they are our main concern in this chapter. Further details on testing of distributed applications are available elsewhere [1].

Unit tests target small portions of code in the codebase and usually operate on a class- or routine-level. They are built alongside the application and are executed on a successful built. Implementing them makes sure that individual parts of the component are working as expected and intended. In contrast, integration tests are run against a fully built and unit-tested component. An integration test executes the software component as a whole and runs tests against APIs

While the Continuous Integration loop as detailed earlier is simple, a true realisation of the

In order to decrease the change of a broken build and failing tests, developers have to build and test the application locally before committing their changes to the shared code repository. This practise leads to the desire that the automated test environment used on the build server and the test environment provided by the developer's IDE be compatible. Only then can the same tests be run in both environments and only then is the effort for the developer minimal

Having such a set-up, a developer will commit more often to the shared code repository when experiencing short feedback cycles from the Continuous Integration loop. Ideally, the time from committing code changes to a tested software artefact is as short as running the tests on

However, even performing local tests will not avoid that at some point a build or a test will fail leading to a broken build. While the build is broken no developer should commit to the repository. Instead, everyone in the team is encouraged to contribute to fixing whatever caused the build to break. Only then further commits to the code repository are allowed.

In order to enable developers to witness that a build breaks and to trigger the process of handling this broken build, visibility of the current build and test status is a major concern. This can be as simple as a red or green badge being shown in a dashboard, an e-mail, but could also include

approach requires flanking measures on the management and organisational side.

mated test coverage is a core demand for Continuous Integration [2].

degree is dependent on the project and the criticality of the code.

and if necessary utilises mocks.

to follow the principle of Continuous Integration.

the development machine or even shorter.

bots that report on it on a messaging platform.

**3.4. Best practices**

**Figure 1.** The Continuous Integration feedback loop realising the integration pipeline.

time, developers working on a local version of the code will be finished with their work, for example, a new feature or a bug fix. Then, they commit (step i) their changes to the version control system shared by all developers of that component. In addition to the code, the repository contains additional data and procedures to build and also test the software.

Accordingly, a new commit triggers (step ii) a new build of the software component. In case the build is successful (step iii), tests of the code get executed. Here, build automation enables that both building and testing can run automatically and do not require any human integration.

On a technical level, both build step and test step are executed on one or multiple build servers which is tightly integrated with the code repository and gets triggered through changes to the codebase. At the end of the build and test process, it will (step iv) report the status back to the users. Such a report includes information about failed builds or failed test cases. On success the build server issues a versioned and downloadable build artefact.

For closing the Continuous Integration loop, other developers react on reports issued by the build server. In case of successful build and test steps, they are supposed to immediately integrate the changes in their own code base. This core concept behind Continuous Integration ensures that the code bases of different developers evolve compatibly.

In order to successfully implement this feedback loop, it is important for every developer to commit very often (commonly interpreted as at least once a day). This ensures that merge conflicts stay minor and are easier to resolve.

#### **3.3. Testing**

Testing is necessary, as a successful build process does not give any hints whether the code is actually working. Hence, testing increases confidence on the codebase which creates an experiment-happy environment and reduces the risk introduced by possible ambiguity of requirements. Further, testing may yield information about code quality and runtime behaviour. Consequently, testing is the main vehicle to ensure reliability of and trust in the code. Obviously, this trust is higher, the higher the test coverage. Due to the many builds per day, testing can only be realised in an automated manner. In consequence, high automated test coverage is a core demand for Continuous Integration [2].

While there is no general agreement on a fixed number for code coverage percentage, there are suggestions and guidelines [3] about that metric. In practise, however, the desired coverage degree is dependent on the project and the criticality of the code.

As with the whole Continuous Integration methodology, it is important that all team members have understood the importance of testing and practise it. Unit and Integration Tests are the minimum amount of tests necessary to achieve that. Consequently, they are our main concern in this chapter. Further details on testing of distributed applications are available elsewhere [1].

Unit tests target small portions of code in the codebase and usually operate on a class- or routine-level. They are built alongside the application and are executed on a successful built. Implementing them makes sure that individual parts of the component are working as expected and intended. In contrast, integration tests are run against a fully built and unit-tested component. An integration test executes the software component as a whole and runs tests against APIs and if necessary utilises mocks.

#### **3.4. Best practices**

time, developers working on a local version of the code will be finished with their work, for example, a new feature or a bug fix. Then, they commit (step i) their changes to the version control system shared by all developers of that component. In addition to the code, the repository

Accordingly, a new commit triggers (step ii) a new build of the software component. In case the build is successful (step iii), tests of the code get executed. Here, build automation enables that both building and testing can run automatically and do not require any human integration. On a technical level, both build step and test step are executed on one or multiple build servers which is tightly integrated with the code repository and gets triggered through changes to the codebase. At the end of the build and test process, it will (step iv) report the status back to the users. Such a report includes information about failed builds or failed test cases. On suc-

For closing the Continuous Integration loop, other developers react on reports issued by the build server. In case of successful build and test steps, they are supposed to immediately integrate the changes in their own code base. This core concept behind Continuous Integration

In order to successfully implement this feedback loop, it is important for every developer to commit very often (commonly interpreted as at least once a day). This ensures that merge conflicts

Testing is necessary, as a successful build process does not give any hints whether the code is actually working. Hence, testing increases confidence on the codebase which creates an

contains additional data and procedures to build and also test the software.

**Figure 1.** The Continuous Integration feedback loop realising the integration pipeline.

cess the build server issues a versioned and downloadable build artefact.

ensures that the code bases of different developers evolve compatibly.

stay minor and are easier to resolve.

**3.3. Testing**

30 Dependability Engineering

While the Continuous Integration loop as detailed earlier is simple, a true realisation of the approach requires flanking measures on the management and organisational side.

In order to decrease the change of a broken build and failing tests, developers have to build and test the application locally before committing their changes to the shared code repository. This practise leads to the desire that the automated test environment used on the build server and the test environment provided by the developer's IDE be compatible. Only then can the same tests be run in both environments and only then is the effort for the developer minimal to follow the principle of Continuous Integration.

Having such a set-up, a developer will commit more often to the shared code repository when experiencing short feedback cycles from the Continuous Integration loop. Ideally, the time from committing code changes to a tested software artefact is as short as running the tests on the development machine or even shorter.

However, even performing local tests will not avoid that at some point a build or a test will fail leading to a broken build. While the build is broken no developer should commit to the repository. Instead, everyone in the team is encouraged to contribute to fixing whatever caused the build to break. Only then further commits to the code repository are allowed.

In order to enable developers to witness that a build breaks and to trigger the process of handling this broken build, visibility of the current build and test status is a major concern. This can be as simple as a red or green badge being shown in a dashboard, an e-mail, but could also include bots that report on it on a messaging platform.

#### **3.5. Summary**

Summarising, Continuous Integration gives reproducible builds, versioned downloadable artefacts, and quick feedback on broken builds. Hence, it addresses many of the requirements brought up in Section 2: breaking down development into small units that are independently built and tested distributes work among partners (**R.1**) and at the same time, identifies responsible persons (**R.3**). Automating the build and testing process tremendously reduces the manual burden (**R.2**). It also checks dependencies on build level (**R.5**) and makes the software status visible (**R.6**). The availability of ready-to-use binaries is a first step towards an easy-to-use prototype (**R.4**). The definition of unit tests and integration tests allow people joining the project late to confidently make changes to the source code (**R.B**). If used properly, tests also serve as a testimonial of the currently defined requirements of the project (**R.C**). The separation of build and test phase enables some support for closed source code (**R.A**).

the various possible approaches to packaging. Finally, when executing a component, various parameters may need to be configured, depending on the context the component is used in. We

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 33

Continuous Delivery takes an executable binary (e.g. a build artefact) and packages it in a ready-to-run execution environment that resolves all internal and external dependencies, for example, to the operation system kernel, third-party libraries, and remote services. At this end, this automatic process creates packaged runtime environments for binaries and other artefacts. The rational is that pre-configured and tested self-contained packages are easy to roll out in different environments increasing the reliability of the roll-out process. In addition, abandoning manual actions strengthens maintainability and trust in the

When combined with Continuous Integration, Continuous Delivery provides a methodology that ensures that at any time a packaged, tested, and reliably deployable artefact is available

The delivery pipeline starts where Continuous Integration ends. It introduces the packaging step plus further automated and manual acceptance tests. A visual example of such a pipeline is

Continuous Delivery starts with **build artefact(s)** that could be the outcome of Continuous Integration. The packaging step integrates one or more of them with any external dependencies and bundles them into packed artefacts (or simply artefacts). The pipeline is not necessarily linear and hence, can general more than one package. Depending on the process, packages for multiple architectures or use cases can be generated. While Continuous Integration contains

sketch possible design choices in Section 4.4.

based on the latest successful run of the integration pipeline.

**4.1. Definition and scope**

**4.2. The delivery pipeline**

**Figure 2.** Example delivery pipeline.

shown in **Figure 2**.

process.

On the downside, Continuous Integration does not address dependencies on service level (**R.5**) and neither allows for a full easy-to-use set-up (**R.4**). In consequence, it also does not make the full software status available (**R.6**). With respect to closed and unavailable source code (**R.A**), further means have to be established.

Yet, the use of Continuous Integration also introduces new requirements:

**R.CI.1 (additional project infrastructure):** The use of Continuous Integration requires more infrastructure to be brought into the project. These include a revision control system, a build server, and a test server. All of them have to be maintained and explained to the consortium, for example, through tutorials. The build server in addition has to support all programming languages used in the project (**R.E**).

**R.CI.2 (support for private code):** For those partners in the project that want to disclose their source code to the public, the project infrastructure needs to support a private repository.

**R.CI.3 (support for closed code):** For those partners in the project that want to disclose the source code of their components even to the project, additional mechanisms have to be established in order to connect these components to the overall application.

**R.CI.4 (team agreement):** For Continuous Integration to work properly, all project partners have to agree on its use and be willing to take their share of the load. This is a management issue that can be supported when lean technology and good documentation is applied.

#### **4. Background: Continuous Delivery**

This section details background on Continuous Delivery. It starts with a definition and the usage scope, then presents the delivery pipeline and further testing steps. In contrast to Continuous Integration that has many challenges on social level, but a clearly defined build artefact at the end of a pipeline, the exact result of a run of the delivery pipeline is a design choice; the only demand is that it packages the binaries into something executable. Section 4.3 is concerned with the various possible approaches to packaging. Finally, when executing a component, various parameters may need to be configured, depending on the context the component is used in. We sketch possible design choices in Section 4.4.

#### **4.1. Definition and scope**

**3.5. Summary**

32 Dependability Engineering

further means have to be established.

languages used in the project (**R.E**).

**4. Background: Continuous Delivery**

Summarising, Continuous Integration gives reproducible builds, versioned downloadable artefacts, and quick feedback on broken builds. Hence, it addresses many of the requirements brought up in Section 2: breaking down development into small units that are independently built and tested distributes work among partners (**R.1**) and at the same time, identifies responsible persons (**R.3**). Automating the build and testing process tremendously reduces the manual burden (**R.2**). It also checks dependencies on build level (**R.5**) and makes the software status visible (**R.6**). The availability of ready-to-use binaries is a first step towards an easy-to-use prototype (**R.4**). The definition of unit tests and integration tests allow people joining the project late to confidently make changes to the source code (**R.B**). If used properly, tests also serve as a testimonial of the currently defined requirements of the project (**R.C**). The

separation of build and test phase enables some support for closed source code (**R.A**).

Yet, the use of Continuous Integration also introduces new requirements:

lished in order to connect these components to the overall application.

that can be supported when lean technology and good documentation is applied.

On the downside, Continuous Integration does not address dependencies on service level (**R.5**) and neither allows for a full easy-to-use set-up (**R.4**). In consequence, it also does not make the full software status available (**R.6**). With respect to closed and unavailable source code (**R.A**),

**R.CI.1 (additional project infrastructure):** The use of Continuous Integration requires more infrastructure to be brought into the project. These include a revision control system, a build server, and a test server. All of them have to be maintained and explained to the consortium, for example, through tutorials. The build server in addition has to support all programming

**R.CI.2 (support for private code):** For those partners in the project that want to disclose their source code to the public, the project infrastructure needs to support a private repository.

**R.CI.3 (support for closed code):** For those partners in the project that want to disclose the source code of their components even to the project, additional mechanisms have to be estab-

**R.CI.4 (team agreement):** For Continuous Integration to work properly, all project partners have to agree on its use and be willing to take their share of the load. This is a management issue

This section details background on Continuous Delivery. It starts with a definition and the usage scope, then presents the delivery pipeline and further testing steps. In contrast to Continuous Integration that has many challenges on social level, but a clearly defined build artefact at the end of a pipeline, the exact result of a run of the delivery pipeline is a design choice; the only demand is that it packages the binaries into something executable. Section 4.3 is concerned with Continuous Delivery takes an executable binary (e.g. a build artefact) and packages it in a ready-to-run execution environment that resolves all internal and external dependencies, for example, to the operation system kernel, third-party libraries, and remote services. At this end, this automatic process creates packaged runtime environments for binaries and other artefacts. The rational is that pre-configured and tested self-contained packages are easy to roll out in different environments increasing the reliability of the roll-out process. In addition, abandoning manual actions strengthens maintainability and trust in the process.

When combined with Continuous Integration, Continuous Delivery provides a methodology that ensures that at any time a packaged, tested, and reliably deployable artefact is available based on the latest successful run of the integration pipeline.

#### **4.2. The delivery pipeline**

The delivery pipeline starts where Continuous Integration ends. It introduces the packaging step plus further automated and manual acceptance tests. A visual example of such a pipeline is shown in **Figure 2**.

Continuous Delivery starts with **build artefact(s)** that could be the outcome of Continuous Integration. The packaging step integrates one or more of them with any external dependencies and bundles them into packed artefacts (or simply artefacts). The pipeline is not necessarily linear and hence, can general more than one package. Depending on the process, packages for multiple architectures or use cases can be generated. While Continuous Integration contains

**Figure 2.** Example delivery pipeline.

a set of basic tests, Continuous Delivery introduces more sophisticated acceptance test. These are a crucial part of any useful delivery strategy and often contain both manual and automatic steps that validate if the software component behaves as expected. It will almost always include mocking of remote services.

12FactorApp3

**4.5. Summary**

integration and management.

**5.1. Definition and scope**

https://12factor.net/de/config

3

aging formats per delivery pipeline.

**5. Background: Continuous Deployment**

packages, to key/value stores or a database.

over configuration files as commonly used for components provided as Linux

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 35

The choice of a configuration approach has influence on the overall implementation of a component. In addition, there is a mutual influence between configuration and packaging format.

Summarising, Continuous Delivery gives tested, versioned, and downloadable artefacts that are shippable, installable, and configurable out-of-the-box. As with Continuous Integration, the high degree of automation reduces manual burden (**R.2**). The use of Continuous Deployment helps the installation and management of the project outcomes (**R.4**) and at the same time helps remembering the big picture due to acceptance tests (**R.5**). The latter also contributes to the visibility of the software status (**R.6**). Similarly, acceptance tests help keeping

On the downside, creating a larger deployable component from smaller parts, may counteract the equal distribution of load over partners (**R.1**) and makes the naming of responsible persons harder (**R.3**). Yet, when Continuous Integration is used in addition, logical bugs should

Continuous Delivery introduces the following additional requirements towards prototype

**R.CDel.1 (packaging format):** Continuous Delivery requires to decide on one or more pack-

**R.CDel.2 (configuration options):** Continuous Delivery requires to decide on the approach taken towards configuration per pipeline. It has to be consistent with the packaging format. **R.CDel.3 (support for closed artefacts):** As with Continuous Integration additional mecha-

This section gives background on Continuous Deployment. As before, we start with a definition and set the scope for this methodology. Then, we describe the deployment pipeline.

*Deployment* as such describes the process of enacting an application or application component. In general, it covers the steps from acquiring the necessary and possibly distributed hardware resources over installing as well as configuring the component(s) on these resources

track of desired outcomes (**R.C**) and support the fluctuation of team members (**R.B**).

have been filtered out and only those produced by acceptance test remain.

nisms have to be established for any kind of closed code or closed binaries.

Finally, we consider deployment environments and application state.

#### **4.3. Possible packing formats**

The outcome of a run of the delivery pipeline is at least one deployable artefact packing an application component. While the delivery concept *per se* does not foresee a specific format and basically an arbitrary number of formats are possible, the following four approaches have found wide-spread acceptance and are commonly used. They differ in the size of the package, the coupling between component and host, and possible interferences with other components on the same host.

Virtual machine images package the component together with a suited operating system and all required third-party libraries. Executing the image in a virtual machine on a hypervisor introduces very strong runtime isolation between component instance (inside the virtual machine), the host installation (outside the virtual machine), and the host's hardware. It also creates barely any external dependencies and no direct interferences with other components on the same host. On the downside, virtual machine images are heavy weight in terms of size and resource usage. Container images are a lightweight alternative that also bundle the component with all third-party libraries. Yet, the container's operating system strongly depends on the hosting environment in terms of operating system version and kernel configuration. Still isolation between co-located components is available.

Both virtual machine and container images create an isolated and fully self-contained environment for the component. A conceptual different approach is followed by configuration management tools and distribution packages. Both of them install software directly on the host platform and barely create any isolation between different components. Software distribution packages are special archives that wrap the binary and files it ships with, but also contains hints to packages this binary depends on. Obviously, they integrate deeply with the dependency management of the host platform and utilise shared system resources and libraries directly. Configuration management tools provide a layer of abstraction, as they attempt to (re-)configure and change the hosts environment to reach a state which ensures that the application can run. They may do so by using software distribution packages. Both approaches are rather lightweight in terms of storage size.

#### **4.4. Package configuration**

The major goal of Continuous Delivery is to always have the latest deployable package of a component available. In consequence, this means that when creating the package, it is not known in which environment it will run. For instance, IP addresses, port numbers, paths to files may not have been defined yet, or can change over time. Consequently, when preparing a component for Continuous Deployment, it is important to foresee a configuration interface. Several approaches exist ranging from environment variables as suggested by the 12FactorApp3 over configuration files as commonly used for components provided as Linux packages, to key/value stores or a database.

The choice of a configuration approach has influence on the overall implementation of a component. In addition, there is a mutual influence between configuration and packaging format.

#### **4.5. Summary**

a set of basic tests, Continuous Delivery introduces more sophisticated acceptance test. These are a crucial part of any useful delivery strategy and often contain both manual and automatic steps that validate if the software component behaves as expected. It will almost always

The outcome of a run of the delivery pipeline is at least one deployable artefact packing an application component. While the delivery concept *per se* does not foresee a specific format and basically an arbitrary number of formats are possible, the following four approaches have found wide-spread acceptance and are commonly used. They differ in the size of the package, the coupling between component and host, and possible interferences with other components

Virtual machine images package the component together with a suited operating system and all required third-party libraries. Executing the image in a virtual machine on a hypervisor introduces very strong runtime isolation between component instance (inside the virtual machine), the host installation (outside the virtual machine), and the host's hardware. It also creates barely any external dependencies and no direct interferences with other components on the same host. On the downside, virtual machine images are heavy weight in terms of size and resource usage. Container images are a lightweight alternative that also bundle the component with all third-party libraries. Yet, the container's operating system strongly depends on the hosting environment in terms of operating system version and kernel configuration.

Both virtual machine and container images create an isolated and fully self-contained environment for the component. A conceptual different approach is followed by configuration management tools and distribution packages. Both of them install software directly on the host platform and barely create any isolation between different components. Software distribution packages are special archives that wrap the binary and files it ships with, but also contains hints to packages this binary depends on. Obviously, they integrate deeply with the dependency management of the host platform and utilise shared system resources and libraries directly. Configuration management tools provide a layer of abstraction, as they attempt to (re-)configure and change the hosts environment to reach a state which ensures that the application can run. They may do so by using software distribution packages. Both approaches are

The major goal of Continuous Delivery is to always have the latest deployable package of a component available. In consequence, this means that when creating the package, it is not known in which environment it will run. For instance, IP addresses, port numbers, paths to files may not have been defined yet, or can change over time. Consequently, when preparing a component for Continuous Deployment, it is important to foresee a configuration interface. Several approaches exist ranging from environment variables as suggested by the

include mocking of remote services.

Still isolation between co-located components is available.

rather lightweight in terms of storage size.

**4.4. Package configuration**

**4.3. Possible packing formats**

on the same host.

34 Dependability Engineering

Summarising, Continuous Delivery gives tested, versioned, and downloadable artefacts that are shippable, installable, and configurable out-of-the-box. As with Continuous Integration, the high degree of automation reduces manual burden (**R.2**). The use of Continuous Deployment helps the installation and management of the project outcomes (**R.4**) and at the same time helps remembering the big picture due to acceptance tests (**R.5**). The latter also contributes to the visibility of the software status (**R.6**). Similarly, acceptance tests help keeping track of desired outcomes (**R.C**) and support the fluctuation of team members (**R.B**).

On the downside, creating a larger deployable component from smaller parts, may counteract the equal distribution of load over partners (**R.1**) and makes the naming of responsible persons harder (**R.3**). Yet, when Continuous Integration is used in addition, logical bugs should have been filtered out and only those produced by acceptance test remain.

Continuous Delivery introduces the following additional requirements towards prototype integration and management.

**R.CDel.1 (packaging format):** Continuous Delivery requires to decide on one or more packaging formats per delivery pipeline.

**R.CDel.2 (configuration options):** Continuous Delivery requires to decide on the approach taken towards configuration per pipeline. It has to be consistent with the packaging format.

**R.CDel.3 (support for closed artefacts):** As with Continuous Integration additional mechanisms have to be established for any kind of closed code or closed binaries.

#### **5. Background: Continuous Deployment**

This section gives background on Continuous Deployment. As before, we start with a definition and set the scope for this methodology. Then, we describe the deployment pipeline. Finally, we consider deployment environments and application state.

#### **5.1. Definition and scope**

*Deployment* as such describes the process of enacting an application or application component. In general, it covers the steps from acquiring the necessary and possibly distributed hardware resources over installing as well as configuring the component(s) on these resources

<sup>3</sup> https://12factor.net/de/config

and starting the necessary deliveries. The task of deciding in what order components should be started is referred to as *orchestration*, the task of enacting components to find each other is called *discovery* or *wiring*.

the actual live and fully functional environment facing users and customers. The staging environment is applied to validate updating the production environment to newer version.

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 37

It is important to note that besides their build versions, the packaged components do not differ from environment to environment. What differs is their configuration in the respective environment (cf. Section 4.4) and the process of updating them. The development environment is automatically installed from scratch with each new deployable artefact from Continuous Integration. This environment is then used by developers in order to test and validate the common application. It is also used for reviews by Q/A. If these are successful, the version of the development environment is instantiated in the staging environment by updating the previous installation. This serves as a blueprint for updating the production environment. In case it succeeds, Q/A

Usually at least one of the application components makes use of persistent state such as data stored in a database and on the file system. In order to support automatic re-deployment in case of failures and a seamlessly upgrade from one version to another, this state has to be separated from the artefact produced by the delivery pipeline. Otherwise, software and state

In consequence, state needs to remain available, even if the application environment is torn down. In IaaS clouds or containers, this can be achieved through the use of block storage/ volumes; in more traditional approaches, a remote file system, a NAS, or a SAN could be

Summarising, Continuous Deployment realises support for a constantly deployed instance of the project outcome. In addition to that, it enables the realisation of use-case specific or demospecific environment (**R.D**). Similar to Continuous Integration and Continuous Delivery, it helps distributing work among partners (**R.1**), reduces manual burden (**R.2**), and makes the software status visible (**R.6**). Its orchestration is the missing link to make available an easy to start and use environment (**R.4**) and clarifies the big picture (**R.5**). There are no immediate

**R.CDep.1 (environment planning):** The consortium has to agree on the number of environments and the desired flexibility in creating environments. In the most extreme cases the creation of a new environment is fully automatized and developers can flexibly create new

**R.CDep.2 (handling of state):** Continuous Deployment requires to decide on the approach taken towards handling application state. In addition, stateful components need to be able to find their state, to validate it exists, and to initialise the storage location, in case it does not

will enact more tests and finally decide to upgrade the production environment.

used. In consequence, the location of the state has to be configurable.

downsides to Continuous Deployment, but further requirements emerge:

exists. In case state representation was changed, this has to be tolerated.

Therefore, staging uses a snapshot of the production data.

**5.3. Application state**

**5.4. Summary**

environments.

cannot not be upgraded separately.

Continuous Deployment describes a methodology to always have the latest version of all artefacts of an application deployed; and that updates to the application are visible in the deployment shortly after changes to the codebase. As with integration and delivery pipelines, the deployment pipeline is supposed to run automatically.

As all artefacts have gone through unit, integration, and acceptance tests, there is trust that individual artefacts work as expected. What is less reliable is the interplay of the components on an application-wide level. For that reason, in practise, multiple isolated environments are used and Continuous Deployment usually tackles the least critical environment, which is not linked with production systems. Yet, some companies like Amazon and Netflix demonstrate Continuous Deployment can go directly to production.

#### **5.2. The deployment pipeline**

The safety net of having multiple environments caters for incompatible version and interface changes of individual components. **Figure 3** shows an example of three traditionally used different environments as well as transitions between them.

The development environment contains the very latest version of the components' code and is automatically updated on every commit. In contrast, the production environment contains

**Figure 3.** Deployment pipeline.

the actual live and fully functional environment facing users and customers. The staging environment is applied to validate updating the production environment to newer version. Therefore, staging uses a snapshot of the production data.

It is important to note that besides their build versions, the packaged components do not differ from environment to environment. What differs is their configuration in the respective environment (cf. Section 4.4) and the process of updating them. The development environment is automatically installed from scratch with each new deployable artefact from Continuous Integration. This environment is then used by developers in order to test and validate the common application. It is also used for reviews by Q/A. If these are successful, the version of the development environment is instantiated in the staging environment by updating the previous installation. This serves as a blueprint for updating the production environment. In case it succeeds, Q/A will enact more tests and finally decide to upgrade the production environment.

#### **5.3. Application state**

Usually at least one of the application components makes use of persistent state such as data stored in a database and on the file system. In order to support automatic re-deployment in case of failures and a seamlessly upgrade from one version to another, this state has to be separated from the artefact produced by the delivery pipeline. Otherwise, software and state cannot not be upgraded separately.

In consequence, state needs to remain available, even if the application environment is torn down. In IaaS clouds or containers, this can be achieved through the use of block storage/ volumes; in more traditional approaches, a remote file system, a NAS, or a SAN could be used. In consequence, the location of the state has to be configurable.

#### **5.4. Summary**

**Figure 3.** Deployment pipeline.

called *discovery* or *wiring*.

36 Dependability Engineering

**5.2. The deployment pipeline**

and starting the necessary deliveries. The task of deciding in what order components should be started is referred to as *orchestration*, the task of enacting components to find each other is

Continuous Deployment describes a methodology to always have the latest version of all artefacts of an application deployed; and that updates to the application are visible in the deployment shortly after changes to the codebase. As with integration and delivery pipelines,

As all artefacts have gone through unit, integration, and acceptance tests, there is trust that individual artefacts work as expected. What is less reliable is the interplay of the components on an application-wide level. For that reason, in practise, multiple isolated environments are used and Continuous Deployment usually tackles the least critical environment, which is not linked with production systems. Yet, some companies like Amazon and Netflix demonstrate

The safety net of having multiple environments caters for incompatible version and interface changes of individual components. **Figure 3** shows an example of three traditionally used dif-

The development environment contains the very latest version of the components' code and is automatically updated on every commit. In contrast, the production environment contains

the deployment pipeline is supposed to run automatically.

Continuous Deployment can go directly to production.

ferent environments as well as transitions between them.

Summarising, Continuous Deployment realises support for a constantly deployed instance of the project outcome. In addition to that, it enables the realisation of use-case specific or demospecific environment (**R.D**). Similar to Continuous Integration and Continuous Delivery, it helps distributing work among partners (**R.1**), reduces manual burden (**R.2**), and makes the software status visible (**R.6**). Its orchestration is the missing link to make available an easy to start and use environment (**R.4**) and clarifies the big picture (**R.5**). There are no immediate downsides to Continuous Deployment, but further requirements emerge:

**R.CDep.1 (environment planning):** The consortium has to agree on the number of environments and the desired flexibility in creating environments. In the most extreme cases the creation of a new environment is fully automatized and developers can flexibly create new environments.

**R.CDep.2 (handling of state):** Continuous Deployment requires to decide on the approach taken towards handling application state. In addition, stateful components need to be able to find their state, to validate it exists, and to initialise the storage location, in case it does not exists. In case state representation was changed, this has to be tolerated.

**R.CDep.3 (support for closed artefacts):** As before, additional mechanisms have to be established for any kind of closed binaries.

for closed code and binaries (**R.A**) is not naturally taken into account by any of the methodologies. It is, however, represented by the follow-up requirements R.CI.2, R.CI.3, R.CDel.3,

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 39

The following paragraphs sketch our approach on a high technical and management level.

Our approach centres around a project-wide code hosting platform that supports private repositories for code that shall not go public (**R.CI.3**). This platform is enhanced with a projectwide build and test server (**R.CI.1**) and further with an orchestration service linked to these two (**R.Dep.4**). Whenever possible, we rely on private hardware to host the needed infrastructures as well as the various environments of the deployment pipeline. In case this cannot be achieved, we fall back to a public cloud provider such as Amazon EC2 or Microsoft Azure.

In order to be able to apply Continuous Anything, decisions on the management level are required. These include first and foremost, the decision of the consortium to enact the methodology (**R.CI.4**). Once this decision is taken, the next step is to break down the overall project software into smaller components. This is a manual process that requires discussion and communication. For each of the components an individual software repository is created and a responsible gets assigned. In an ideal case, exclusively members from one local team are responsible for one of these components. For each of the components then test cases are defined that detail how the component is supposed to interact with other components and more importantly that reflect requirements and goals of the project. Finally, an early integration pipeline for each component is realised that runs the tests. Only at this point, it is necessary that the developers of a component agree on a common technology including the programming language.

In a next step, components are composed to deployable artefacts and for each of them a delivery pipeline is established. Acceptance tests are created and function as both a validation of the artefact's functionality as well as a representation of the project's requirements. Furthermore, a strategy towards packaging (**R.Del.1**) and configuration (**R.Del.2**) is decided upon. While it is principally possible to use different formats and approaches for different

In a last step, the consortium agrees on the number of environments to be used as well as the transition paths between environments (**R.CDep.1**). This is also the step where services with

As clarified in Section 3, Continuous Integration requires additional infrastructure to be provided by the project. In particular, it demands for the operation of a code repository, a build

R.CDep.3, and R.CDep.4 and has to be addressed by all three methodologies.

*6.1.1. General software set-up*

*6.1.2. Management process*

**6.2. Continuous Integration**

server, and a test server.

Different components can agree on different languages.

artefacts, the delivery pipeline benefits from unifying these.

closed binaries get integrated in the whole system (**R.CDep.3**).

**R.CDep.4 (additional infrastructure):** In order to achieve the deployment and wiring of individual components, but also the whole project software, an orchestrator is necessary.

#### **6. A research-oriented solution to software releases**

In the following, we take the requirements put up in Section 2 and describe how we apply Continuous Integration, Continuous Delivery, and Continuous Deployment to support prototype integration as well as software releases for large-scale, widely distributed research projects. We also present how we address the additional requirements put up throughout Sections 3-5.

Section 6.1 presents the overall concepts and strategy we apply. The subsequent sections deal with the realisation of the individual pipelines. In each of these, we present our approach from a conceptual as well as a technical point of view and discuss the tools used. In addition, we present similar tools available on the market that could be used to provide the same or similar functionality.

#### **6.1. Overview and concept**

Sections 3–5 detail that the combined use of Continuous Integration, Continuous Delivery, and Continuous Deployment addresses almost all of the requirements established in Section 2. **Table 1** presents the coverage of requirements and methodology taken. Only the need to cater


**Table 1.** Requirement mapping.

for closed code and binaries (**R.A**) is not naturally taken into account by any of the methodologies. It is, however, represented by the follow-up requirements R.CI.2, R.CI.3, R.CDel.3, R.CDep.3, and R.CDep.4 and has to be addressed by all three methodologies.

The following paragraphs sketch our approach on a high technical and management level.

#### *6.1.1. General software set-up*

**R.CDep.3 (support for closed artefacts):** As before, additional mechanisms have to be estab-

**R.CDep.4 (additional infrastructure):** In order to achieve the deployment and wiring of indi-

In the following, we take the requirements put up in Section 2 and describe how we apply Continuous Integration, Continuous Delivery, and Continuous Deployment to support prototype integration as well as software releases for large-scale, widely distributed research projects. We also present how we address the additional requirements put up throughout Sections 3-5. Section 6.1 presents the overall concepts and strategy we apply. The subsequent sections deal with the realisation of the individual pipelines. In each of these, we present our approach from a conceptual as well as a technical point of view and discuss the tools used. In addition, we present similar tools available on the market that could be used to provide the same or

Sections 3–5 detail that the combined use of Continuous Integration, Continuous Delivery, and Continuous Deployment addresses almost all of the requirements established in Section 2. **Table 1** presents the coverage of requirements and methodology taken. Only the need to cater

**Delivery**

**Continuous Deployment**

vidual components, but also the whole project software, an orchestrator is necessary.

**6. A research-oriented solution to software releases**

**Continuous Integration Continuous**

R.1 X X R.2 X X X

R.4 X X R.5 X X R.6 X X R.A (X) (X) (X)

R.D X

R.B X X R.C X X

lished for any kind of closed binaries.

38 Dependability Engineering

similar functionality.

R.3 X

R.E X

**Table 1.** Requirement mapping.

**6.1. Overview and concept**

Our approach centres around a project-wide code hosting platform that supports private repositories for code that shall not go public (**R.CI.3**). This platform is enhanced with a projectwide build and test server (**R.CI.1**) and further with an orchestration service linked to these two (**R.Dep.4**). Whenever possible, we rely on private hardware to host the needed infrastructures as well as the various environments of the deployment pipeline. In case this cannot be achieved, we fall back to a public cloud provider such as Amazon EC2 or Microsoft Azure.

#### *6.1.2. Management process*

In order to be able to apply Continuous Anything, decisions on the management level are required. These include first and foremost, the decision of the consortium to enact the methodology (**R.CI.4**). Once this decision is taken, the next step is to break down the overall project software into smaller components. This is a manual process that requires discussion and communication.

For each of the components an individual software repository is created and a responsible gets assigned. In an ideal case, exclusively members from one local team are responsible for one of these components. For each of the components then test cases are defined that detail how the component is supposed to interact with other components and more importantly that reflect requirements and goals of the project. Finally, an early integration pipeline for each component is realised that runs the tests. Only at this point, it is necessary that the developers of a component agree on a common technology including the programming language. Different components can agree on different languages.

In a next step, components are composed to deployable artefacts and for each of them a delivery pipeline is established. Acceptance tests are created and function as both a validation of the artefact's functionality as well as a representation of the project's requirements. Furthermore, a strategy towards packaging (**R.Del.1**) and configuration (**R.Del.2**) is decided upon. While it is principally possible to use different formats and approaches for different artefacts, the delivery pipeline benefits from unifying these.

In a last step, the consortium agrees on the number of environments to be used as well as the transition paths between environments (**R.CDep.1**). This is also the step where services with closed binaries get integrated in the whole system (**R.CDep.3**).

#### **6.2. Continuous Integration**

As clarified in Section 3, Continuous Integration requires additional infrastructure to be provided by the project. In particular, it demands for the operation of a code repository, a build server, and a test server.

#### *6.2.1. Concepts*

Due to the openness towards programming languages, the build server needs to cater for any reasonable programming language (**R.E**). We achieve that through specialised build environments. These build environments are used for the automated compiling and testing the components code (**R.2**) and are easily configurable and versioned **(R.B**) by the researchers themselves (**R.1, R.3**). The downloadable build artefacts are stored in the repository (**R.C**) with the appropriate access rights.

**6.3. Continuous Delivery**

the task of the delivery pipeline.

*6.3.1. Concept*

scenarios.

*6.3.2. Selected tooling*

*6.3.3. Tooling alternatives*

used for creating distribution packages.

the packaging format and the configuration strategy.

the application as a whole is comparatively low effort (**R.5**).

artefact, a root repository is selected that defines the delivery pipeline.

From Section 4, it is clear that the main challenge with Continuous Delivery are to decide on

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 41

For being able to easily start and use a component (**R.4**), we are packaging the artefact from Continuous Integration to make it executable. This process is automated by the build server (**R.2**). Once all the components from every partner have been packaged getting an instance of

In contrast to the integration pipeline, not all repositories will have a delivery pipeline. Instead, multiple build artefacts can be combined to one packages artefact. For each packaged

Similar to the integration pipeline, the process of packaging the component is specific to each repository. While the packaging format should usually be consistent for every component, it might be necessary to integrate with other build artefacts and external components, which is

While Continuous Anything does not demand for a specific packaging format on the concept level, the format should be (i) lightweight, to keep the delivery feedback cycle short, (ii) selfcontained, to make acceptance testing easier, and (iii) configurable to cater for usage in different

In order to support closed artefacts (**R.CDel.3**), we enhance the build server with a custom API that maintainers of closed artefacts are supposed to invoke (either automatically or manually) when a new version of their binary is available. This will then trigger the delivery pipeline for that artefact, if available or the delivery pipeline of artefacts that make use of it.

Our approach does not impose any specific packaging format. Yet, for artefacts with standard demands, we encourage the use of Docker images, as they offer a good trade-off between isolation and ease of use. Containers are not as heavy weight as virtual machines, but still the software runs isolated. The resulting Docker images are pushed to an image repository, which is internal to GitLab for private artefacts or the public Docker hub for public ones.

For configuration, we suggest the use of environment variables for Docker containers as encouraged by good practises. For acceptance testing, we use the Selenium framework that

For packaging scenarios that demand higher isolation, virtual machines are the best choice. Here, Packer is a tool for the automated generation of virtual machine images. For configuration management, Puppet is an option, whereas for instance the Debian Package Manager can be

enables record and playback of user interaction on interfaces.

Access rights to each repository can be specifically set with permissions ranging from private (**R.CI.2**) over internal to public. Sometimes one (e.g. an industry partner) is not able to share their code or configuration parameters with the whole consortium or even publicly. Therefore, we limit access to code and encrypt certain configuration variables so only the actual owner has access to them (**R.CI.2**).

Regarding completely closed and private source code, which must not reside on the shared infrastructure (**R.CI.3, R.CDel.3**), we make the assumption that the build artefacts of these component are tested by their owners and their binaries available for use in the project. For closed binaries, we establish a customised deployment process (**R.CDep.3**).

#### *6.2.2. Selected tooling*

For our approach, we selected Git enhanced with GitLab (**R.CI.1**) as a source code repository. The primary reason for selecting this combination is due to state-of-the-art version control provided by Git as well as the user interface and eco-system provided by GitLab. Each software component is stored in an own Git repository.

As a build server we use GitLab Runner. On the one hand side this is due to its deep integration with GitLab, but on the other hand side, this is also due to its openness and flexibility also in supporting exotic demands (e.g. **R.E**). It achieves this, by enabling the use of custom build environments, giving the research teams a maximum amount of control.

Technically, each repository defines the build environment of the component stored in that repository. The environment also defines the integration pipeline and contains at least the two mandatory steps compiling and testing. When triggered, builds, and tests get executed in an instance of the defined build environment.

Due to the fact that we do not impose any programming languages, we do not rely on any specific build and dependency management frameworks. The same is true for testing frameworks. Here, the only requirement is that it can be included in the pipeline.

#### *6.2.3. Tooling alternatives*

The functionality we achieve through our set-up can also be realised through the use of other tools. For instance, Mercurial or SVN could be used as source code repository. Jenkins and Travis are alternatives for build servers.

With respect to build automation Maven is the de-facto standard for Java, while C programmers rely on make and pip could be used for Python. Testing can be implemented by JUnit or one of its derivatives for other languages.

#### **6.3. Continuous Delivery**

From Section 4, it is clear that the main challenge with Continuous Delivery are to decide on the packaging format and the configuration strategy.

#### *6.3.1. Concept*

*6.2.1. Concepts*

40 Dependability Engineering

*6.2.2. Selected tooling*

with the appropriate access rights.

actual owner has access to them (**R.CI.2**).

ware component is stored in an own Git repository.

instance of the defined build environment.

Travis are alternatives for build servers.

one of its derivatives for other languages.

*6.2.3. Tooling alternatives*

Due to the openness towards programming languages, the build server needs to cater for any reasonable programming language (**R.E**). We achieve that through specialised build environments. These build environments are used for the automated compiling and testing the components code (**R.2**) and are easily configurable and versioned **(R.B**) by the researchers themselves (**R.1, R.3**). The downloadable build artefacts are stored in the repository (**R.C**)

Access rights to each repository can be specifically set with permissions ranging from private (**R.CI.2**) over internal to public. Sometimes one (e.g. an industry partner) is not able to share their code or configuration parameters with the whole consortium or even publicly. Therefore, we limit access to code and encrypt certain configuration variables so only the

Regarding completely closed and private source code, which must not reside on the shared infrastructure (**R.CI.3, R.CDel.3**), we make the assumption that the build artefacts of these component are tested by their owners and their binaries available for use in the project. For

For our approach, we selected Git enhanced with GitLab (**R.CI.1**) as a source code repository. The primary reason for selecting this combination is due to state-of-the-art version control provided by Git as well as the user interface and eco-system provided by GitLab. Each soft-

As a build server we use GitLab Runner. On the one hand side this is due to its deep integration with GitLab, but on the other hand side, this is also due to its openness and flexibility also in supporting exotic demands (e.g. **R.E**). It achieves this, by enabling the use of custom build

Technically, each repository defines the build environment of the component stored in that repository. The environment also defines the integration pipeline and contains at least the two mandatory steps compiling and testing. When triggered, builds, and tests get executed in an

Due to the fact that we do not impose any programming languages, we do not rely on any specific build and dependency management frameworks. The same is true for testing frame-

The functionality we achieve through our set-up can also be realised through the use of other tools. For instance, Mercurial or SVN could be used as source code repository. Jenkins and

With respect to build automation Maven is the de-facto standard for Java, while C programmers rely on make and pip could be used for Python. Testing can be implemented by JUnit or

closed binaries, we establish a customised deployment process (**R.CDep.3**).

environments, giving the research teams a maximum amount of control.

works. Here, the only requirement is that it can be included in the pipeline.

For being able to easily start and use a component (**R.4**), we are packaging the artefact from Continuous Integration to make it executable. This process is automated by the build server (**R.2**). Once all the components from every partner have been packaged getting an instance of the application as a whole is comparatively low effort (**R.5**).

In contrast to the integration pipeline, not all repositories will have a delivery pipeline. Instead, multiple build artefacts can be combined to one packages artefact. For each packaged artefact, a root repository is selected that defines the delivery pipeline.

Similar to the integration pipeline, the process of packaging the component is specific to each repository. While the packaging format should usually be consistent for every component, it might be necessary to integrate with other build artefacts and external components, which is the task of the delivery pipeline.

While Continuous Anything does not demand for a specific packaging format on the concept level, the format should be (i) lightweight, to keep the delivery feedback cycle short, (ii) selfcontained, to make acceptance testing easier, and (iii) configurable to cater for usage in different scenarios.

In order to support closed artefacts (**R.CDel.3**), we enhance the build server with a custom API that maintainers of closed artefacts are supposed to invoke (either automatically or manually) when a new version of their binary is available. This will then trigger the delivery pipeline for that artefact, if available or the delivery pipeline of artefacts that make use of it.

#### *6.3.2. Selected tooling*

Our approach does not impose any specific packaging format. Yet, for artefacts with standard demands, we encourage the use of Docker images, as they offer a good trade-off between isolation and ease of use. Containers are not as heavy weight as virtual machines, but still the software runs isolated. The resulting Docker images are pushed to an image repository, which is internal to GitLab for private artefacts or the public Docker hub for public ones.

For configuration, we suggest the use of environment variables for Docker containers as encouraged by good practises. For acceptance testing, we use the Selenium framework that enables record and playback of user interaction on interfaces.

#### *6.3.3. Tooling alternatives*

For packaging scenarios that demand higher isolation, virtual machines are the best choice. Here, Packer is a tool for the automated generation of virtual machine images. For configuration management, Puppet is an option, whereas for instance the Debian Package Manager can be used for creating distribution packages.

For configuration through key values stores, a myriad of different tools exists, ranging from Consul to classic databases, for example, MySQL, or even NoSQL databases, for example, MongoDB.

**7. Discussions**

consortium members and familiarity of tools.

nally. This is the task of the project management.

**7.2. Software development culture**

**7.1. Project management culture**

research work they do.

Koetter et al. [10] are arguing that due to the tight schedule and different commitment of partners, a prototype integration is hard to achieve as it is too costly. This leads to only partially integrated systems that do not fully support all features. We argue that with our system, we have a clear and easy integration workflow that can be adopted by almost any (distributed) team with modern software development lifecycles and be adapted to existing ones. Therefore, we believe that our approach tackles the reported requirements and issues. Yet, our approach described in Section 6 is just one solution, from an overwhelming number of choices to make regarding the selection of tools and methods to realise Continuous Anything. The best possible technical realisation depends on what is currently used at the sides of the

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 43

Nevertheless, team agreement as well as a clearly communicated and implemented methodology is more important than tool selection. The latter should always follow actual needs.

While Continuous Anything comes natural with the application of agile software development strategies, these are less an issue in distributed research projects. In this environment, it is hard to impossible to organise for instance daily stand up meetings or even weekly or bi-weekly sprints. As elaborated in Section 2 this is due to different schedules of the various stakeholders, the fact that barely anyone in the distributed team is dedicated full time to software development, and particularly that people travel a lot in order to promote the actual

A possible way to work around the lack of a central pillar of the overall approach is to isolate responsibilities as much as possible and to only assign people from individual organisations to particular software components and have them organise the development process inter-

A further core task of the project management besides organising the necessary infrastructure is to make sure that the integration strategy is rigorously followed from the beginning. This comprises the absence of shadow code, a common, shared understanding of how to use version

From all methodologies discussed, Continuous Integration can bring the most benefit. It is important to note, though, that everyone in the team has to agree on these practices being used. This might create some new pain points within the development team, but it is crucial that everyone understand the principles and share a common goal. This explicitly means that a non-trivial effort should be spent on test coverage. It is worth mentioning that for research-oriented environments daily commits of code are not necessary, but rather weekly or half-weekly commits are sufficient.

control systems and when to apply changes to the master branch and other branches.

#### **6.4. Continuous Deployment**

Building on the decision we made in Section 6.3 by choosing Docker as packaging format, we need to align that to the additional requirements we set in Section 5. The following shows how we implement the Continuous Deployment of said containers.

#### *6.4.1. Concept*

We usually use the three basic environments **development**, **stating,** and **production** unless the project has special demands (**R.CDep.1**). Each repository with a delivery pipeline also has a deployment pipeline that automatically updates the development environment once a new packed artefact is available. Based on the development environment, the transitions between the other stages are handled as follows (**R.CDep.2**): (i) Upon manual decision, the artefacts deployed to development get redeployed in staging by overwriting the previous version. In addition, state from production is copied to staging for testing purposes. (ii) Going from staging to production is similar, except that no data are copied.

All environments are handled by an orchestrator and operated on a project-hosted infrastructure (**R.CDep.4**). For enabling state transition (**R.CDep.2**), this infrastructure comes with volumes to persist state and support mapping of state. For dealing with disclosed artefacts (**R.CDep.3**), we allow that callbacks are registered for each of them. These callbacks are used in order to trigger a new deployment or reset of the respective deployed artefact, as well as a transition of these deployed artefacts between their environments. The realisation of the callback is dependent on the responsible for the artefact. In addition, we introduce another API at the build server that owners of the closed artefact shall use to notify the environment about changes in their environment.

#### *6.4.2. Selected tooling*

The deployment in our system is done by the Rancher orchestration tool. For artefacts realised as containers, Rancher applies rancher-compose and docker-compose.yml files. These describe the actual configuration and a representation of the artefact to be deployed. Here, we can define the container (or virtual machine) image to use, the location of the state, and the desired configuration. Rancher also enables integrating external components.

#### *6.4.3. Tooling alternatives*

For orchestration of containers and virtual machines a plethora of different tools exist. These are either cloud-provider specific such as Amazon CloudFormation and OpenStack Heat, or reside outside the platform. In earlier work, we compare the features of these tools [19].

### **7. Discussions**

For configuration through key values stores, a myriad of different tools exists, ranging from Consul to classic databases, for example, MySQL, or even NoSQL databases, for example,

Building on the decision we made in Section 6.3 by choosing Docker as packaging format, we need to align that to the additional requirements we set in Section 5. The following shows how

We usually use the three basic environments **development**, **stating,** and **production** unless the project has special demands (**R.CDep.1**). Each repository with a delivery pipeline also has a deployment pipeline that automatically updates the development environment once a new packed artefact is available. Based on the development environment, the transitions between the other stages are handled as follows (**R.CDep.2**): (i) Upon manual decision, the artefacts deployed to development get redeployed in staging by overwriting the previous version. In addition, state from production is copied to staging for testing purposes. (ii) Going from stag-

All environments are handled by an orchestrator and operated on a project-hosted infrastructure (**R.CDep.4**). For enabling state transition (**R.CDep.2**), this infrastructure comes with volumes to persist state and support mapping of state. For dealing with disclosed artefacts (**R.CDep.3**), we allow that callbacks are registered for each of them. These callbacks are used in order to trigger a new deployment or reset of the respective deployed artefact, as well as a transition of these deployed artefacts between their environments. The realisation of the callback is dependent on the responsible for the artefact. In addition, we introduce another API at the build server that owners of the closed artefact shall use to notify the environment about

The deployment in our system is done by the Rancher orchestration tool. For artefacts realised as containers, Rancher applies rancher-compose and docker-compose.yml files. These describe the actual configuration and a representation of the artefact to be deployed. Here, we can define the container (or virtual machine) image to use, the location of the state, and the

For orchestration of containers and virtual machines a plethora of different tools exist. These are either cloud-provider specific such as Amazon CloudFormation and OpenStack Heat, or reside outside the platform. In earlier work, we compare the features of these tools [19].

desired configuration. Rancher also enables integrating external components.

we implement the Continuous Deployment of said containers.

ing to production is similar, except that no data are copied.

MongoDB.

42 Dependability Engineering

*6.4.1. Concept*

**6.4. Continuous Deployment**

changes in their environment.

*6.4.2. Selected tooling*

*6.4.3. Tooling alternatives*

Koetter et al. [10] are arguing that due to the tight schedule and different commitment of partners, a prototype integration is hard to achieve as it is too costly. This leads to only partially integrated systems that do not fully support all features. We argue that with our system, we have a clear and easy integration workflow that can be adopted by almost any (distributed) team with modern software development lifecycles and be adapted to existing ones. Therefore, we believe that our approach tackles the reported requirements and issues. Yet, our approach described in Section 6 is just one solution, from an overwhelming number of choices to make regarding the selection of tools and methods to realise Continuous Anything. The best possible technical realisation depends on what is currently used at the sides of the consortium members and familiarity of tools.

Nevertheless, team agreement as well as a clearly communicated and implemented methodology is more important than tool selection. The latter should always follow actual needs.

#### **7.1. Project management culture**

While Continuous Anything comes natural with the application of agile software development strategies, these are less an issue in distributed research projects. In this environment, it is hard to impossible to organise for instance daily stand up meetings or even weekly or bi-weekly sprints. As elaborated in Section 2 this is due to different schedules of the various stakeholders, the fact that barely anyone in the distributed team is dedicated full time to software development, and particularly that people travel a lot in order to promote the actual research work they do.

A possible way to work around the lack of a central pillar of the overall approach is to isolate responsibilities as much as possible and to only assign people from individual organisations to particular software components and have them organise the development process internally. This is the task of the project management.

A further core task of the project management besides organising the necessary infrastructure is to make sure that the integration strategy is rigorously followed from the beginning. This comprises the absence of shadow code, a common, shared understanding of how to use version control systems and when to apply changes to the master branch and other branches.

#### **7.2. Software development culture**

From all methodologies discussed, Continuous Integration can bring the most benefit. It is important to note, though, that everyone in the team has to agree on these practices being used. This might create some new pain points within the development team, but it is crucial that everyone understand the principles and share a common goal. This explicitly means that a non-trivial effort should be spent on test coverage. It is worth mentioning that for research-oriented environments daily commits of code are not necessary, but rather weekly or half-weekly commits are sufficient.

Continuous Delivery is especially helpful for research projects, since the described Big Bang Integrations usually happen multiple times during a project lifecycle; each time with a high risk of failing. The risk to failure puts a lot of stress on the whole consortium. In contrast, realising Continuous Delivery does not introduce new challenges except agreeing on a common packaging format.

**References**

15, 2017]

IEEE Press; 2017

2017

pp. 646-653

[1] Rother M. Toyota Kata. United States: McGraw-Hill Professional Publishing; 2009

Reducing Risk. Pearson Education; 2007

ing-com/writings/coverage.pdf

and Software. 2017;**123**:292-305

World. United States: Addison-Wesley; 2013

Tooling at Scale. US: O'Reilly Media, Inc.; 2016

Multi-Speed IT Enterprise. United States: Wiley; 2017

and Deployment Automation. UK: Pearson Education; 2010

[2] Duvall PM, Matyas S, Glover A. Continuous Integration: Improving Software Quality and

Continuous Anything for Distributed Research Projects http://dx.doi.org/10.5772/intechopen.72045 45

[3] Marick B. How to misuse code coverage. In: Proceedings of the 16th International Conference on Testing Computer Software; 1999. pp. 16-18. http://www.exampler.com/test-

[4] Fowler M. Testing Strategies in a Microservice-Architecture [Internet]. Nov 18, 2014. Available from: https://martinfowler.com/articles/microservice-testing/ [Accessed: July

[5] Fowler M. Continuous Integration [Internet]. May 1, 2006. Available from: https://www. martinfowler.com/articles/continuousIntegration.html [Accessed: July 15, 2017]

[6] Sandberg AB, Crnkovic I. Meeting industry: Academia research collaboration challenges with agile methodologies. In: Proceedings of the 39th International Conference on Software Engineering: Software Engineering in Practice Track. Piscataway, NJ, USA:

[7] Eddy BP et al. CDEP: Continuous delivery educational pipeline. In: Proceedings of the

[8] Guillot I et al. Case studies of industry-academia research collaborations for software development with agile. In: CYTED-RITOS International Workshop on Groupware. Springer;

[9] Fagerholm F et al. The RIGHT model for continuous experimentation. Journal of Systems

[10] Koetter F, Kochanowski M, Maier F, Renner T.Together, yet apart – The research prototype architecture dilemma. CLOSER 2017 Proceedings of the 7th International Conference on Cloud Computing and Services Science. Porto, Portugal: SciTePress; April 24-26, 2017.

[11] Eckstein J. Agile Software Development with Distributed Teams: Staying Agile in a Global

[12] Sharma S, editor. The DevOps Adoption Playbook: A Guide to Adopting DevOps in a

[13] Humble J, Farley D. Continuous Delivery: Reliable Software Releases through Build, Test,

[14] Davis J, Daniels K. Effective DevOps: Building a Culture of Collaboration, Affinity, and

SouthEast Conference. New York, NY, USA: ACM; 2017

Once Continuous Delivery has been realised, implementing Continuous Deployment is low effort. It is a crucial step to lower the effort for all consortium members to get access to a running instance of the project outcome.

#### **8. Conclusions**

In this chapter, we have presented our approach of Continuous Anything, a combination of Continuous Integration, Continuous Delivery, and Continuous Deployment in order to support the prototype integration to distributed research projects.

Our approach makes prototype integration a core element of the project plan and puts it on the same level as project management and financial administration. It does so by defining a framework that distributes and shares responsibility of integration work while at the same time clearly holding individuals responsible for dedicated software components. Through a high degree of automation, it keeps the overall integration work low, but still provides immediate feedback on the quality of the integration status. It is important to note that the quality of individual software components remains in the hands of their developers. It is them who decide which and if unit tests are necessary. In contrast, our framework requires that integration tests be available that ensure that interfaces between components work as intended. This approach allows an easy isolation of errors and the identification of responsible programmers in case of failures or problems.

#### **Acknowledgements**

This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements No. 732667 (RECAP), 732258 (CloudPerfect), and 644690 (CloudSocket).

#### **Author details**

Simon Volpert, Frank Griesinger and Jörg Domaschka\*

\*Address all correspondence to: joerg.domaschka@uni-ulm.de

Institute of Information Resource Management, Ulm University, Ulm, Germany

#### **References**

Continuous Delivery is especially helpful for research projects, since the described Big Bang Integrations usually happen multiple times during a project lifecycle; each time with a high risk of failing. The risk to failure puts a lot of stress on the whole consortium. In contrast, realising Continuous Delivery does not introduce new challenges except agreeing on a common

Once Continuous Delivery has been realised, implementing Continuous Deployment is low effort. It is a crucial step to lower the effort for all consortium members to get access to a run-

In this chapter, we have presented our approach of Continuous Anything, a combination of Continuous Integration, Continuous Delivery, and Continuous Deployment in order to sup-

Our approach makes prototype integration a core element of the project plan and puts it on the same level as project management and financial administration. It does so by defining a framework that distributes and shares responsibility of integration work while at the same time clearly holding individuals responsible for dedicated software components. Through a high degree of automation, it keeps the overall integration work low, but still provides immediate feedback on the quality of the integration status. It is important to note that the quality of individual software components remains in the hands of their developers. It is them who decide which and if unit tests are necessary. In contrast, our framework requires that integration tests be available that ensure that interfaces between components work as intended. This approach allows an easy isolation of errors and the identification of responsible programmers in case of

This work has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreements No. 732667 (RECAP), 732258 (CloudPerfect), and

port the prototype integration to distributed research projects.

Simon Volpert, Frank Griesinger and Jörg Domaschka\*

\*Address all correspondence to: joerg.domaschka@uni-ulm.de

Institute of Information Resource Management, Ulm University, Ulm, Germany

packaging format.

44 Dependability Engineering

**8. Conclusions**

failures or problems.

**Acknowledgements**

644690 (CloudSocket).

**Author details**

ning instance of the project outcome.


[15] Kim G et al. The DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology Organizations: IT Revolution Press; 2016

**Chapter 4**

affected

Resiliency testing had

**Provisional chapter**

**Software Fault Injection: A Practical Perspective**

**Software Fault Injection: A Practical Perspective**

DOI: 10.5772/intechopen.70427

Software fault injection (SFI) is an acknowledged method for assessing the dependability of software systems. After reviewing the state-of-the-art of SFI, we address the challenge of integrating it deeper into software development practice. We present a well-defined development methodology incorporating SFI—fault injection driven development (FIDD)—which begins by systematically constructing a dependability and failure cause model, from which relevant injection techniques, points, and campaigns are derived. We discuss possibilities and challenges for the end-to-end automation of such campaigns. The suggested approach can substantially improve the accessibility of dependability

**Keywords:** fault injection, dependability, fault tolerance, testing, test-driven development

several popular online services for several hours. It was caused by a latent memory leak bug, activated under stress due to a failed domain name system (DNS) update after a hardware maintenance event. The leaky software agent repeatedly tried in vain to contact the replaced server. In this process, memory was leaked until customer requests could no longer be handled. AWS is a system so complex that it challenges exhaustive formal verification. The issue could, however, have been anticipated by structured experimental dependability assessment, e.g.,

On 22 October 2012, a major service degradation at Amazon Web Services (AWS)1

prepared the company for such events, and failover of Netflix servers worked quickly.

using fault injection. Indeed, Netflix customers remained unaffected.<sup>2</sup>

http://techblog.netflix.com/2012/10/post-mortem-of-october-222012-aws.html

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

and reproduction in any medium, provided the original work is properly cited.

Lena Feinbube, Lukas Pirl and Andreas Polze

Additional information is available at the end of the chapter

Lena Feinbube, Lukas Pirl and Andreas Polze

Additional information is available at the end of the chapter

assessment in everyday software engineering practice.

http://dx.doi.org/10.5772/intechopen.70427

**Abstract**

**1. Introduction**

http://aws.amazon.com/de/message/680342/

1

2


**Provisional chapter**
