**2. Background and motivation aspects**

Scientific research is a robust and dynamic practice that employs multiple methods toward investigating systems or phenomena including experimentation, description, comparison, and modeling. According to ref. [4], these methods, although often used in combination, appear more effective if used alone. Experimental methods are used to investigate the relationship(s) between two or more phenomena in a strictly controlled environment. Description methods utilize the observations and measurements of natural phenomena and their relationships, to collect the relevant data set that describes their behavior. Comparison is used to determine and quantify relationships between two or more phenomena by observing different groups that are, either by choice or circumstance, exposed to different treatments. Scientific knowledge cannot be obtained from empirical data by purely logical means because the ontologies of scientific and empirical knowledge differ significantly. Physics of Open Systems (POS), briefly introduced in the previous section [3], facilitate the generation of scientifically proven knowledge about the ontology of open systems via data mining techniques, applied on a huge amount of semi-structured, multimodal, and heterogeneous data that is dynamically generated throughout the SoS lifecycle. The identification of characteristic symmetries in an ontology model is used to simplify the structure and behavior of open systems over the state space defined by the ontology model.

The modeling is a well-established mechanism for struggle with the complexity that is the main obstacle of contemporary systems and solutions. The results of the modeling process are a single or a combination of physical and/or computer-based models of natural systems and/or phenomena that are afterward used as a framework for experiments and/or observations. Scientific development (progress) addresses the scientific approach to overall and sustainable development concerning the wide variety of contemporary problems that are either global or domain specific [5].

Concerning the SoS discipline, despite the inherent complexity, the large picture approach is usually the most promising one. The SoS exhibits an organized form of complexity and therefore cannot be accurately described by the traditional analysis techniques. The key concept of complexity science is universality, which is the idea that many systems in different domains exhibit phenomena with common underlying features that can be described using the same scientific models. Complexity science can provide a comprehensive, cross-disciplinary analytical approach that complements traditional scientific approaches that are focused on a specific observed subject in each domain. Complex systems are often characterized by many components that interact in multiple ways among each other and, potentially, with their environment too. These components form then dynamic networks of interactions, with vide variety of network topologies. They generally range from configurations with a small number *The Foundation for Open Component Analysis: A System of Systems Hyper Framework Model DOI: http://dx.doi.org/10.5772/intechopen.103830*

of components that are involved in a large number of interactions, to configurations that involve the enormous number of components involved in a small number of interactions (**Figure 1**).

Interactions may generate novel information that makes it difficult to study components in isolation or to completely predict their future structure and/or behavior. The main challenge of complexity science is not only to have a sense of the parts and their connections but also to understand how these connections give rise to the whole. Advanced mathematical and computational modeling, analysis, and simulations are almost always required to investigate how these configurations are structured and change with time.

The growth of stakeholder-driven content has fueled a rapid increase in the volume and type of data that is generated, manipulated, analyzed, and archived. In addition, varied newer sets of sources, including sensors, Global Positioning Systems (GPS), automated trackers, and monitoring systems, are generating a huge amount of data multidimensional. These larger volumes of data sets, often termed *big data,* are imposing newer challenges and opportunities considering: storage, retrieval, analysis, visualization, and long-term archival. Computer-based analysis of massive data, emerging from complex systems structure and behavior, enables the recognition of embedded data/information/knowledge/wisdom (DIKW) patterns that contribute the further understanding of structure and behavior either of the wholes and/or its parts, thereby fostering the more accurate prediction of forthcoming structure and/or behavior. The second challenge raises directly from the multilevel and multidimensional nature of the artifacts that are consciously or unconsciously reflected through the complex systems' time and configuration-dependent state transitions. In **Figure 2**, a cognitive DIKW pyramid is presented that relates LDOs that may be generated with the inherent semantics in mind.

Different forms of data representation are well established and experienced in engineering practice. It is generally not the case with Information, Knowledge, and Wisdom, because they are dominantly context-dependent. The SoS persistency layer is a crucial component of SoS based analytics and possible drawbacks of LDO concepts need careful attention while specifying a supportive framework model. Although there is a huge amount of intellectual value embedded in arbitrary real-life systems

**Figure 1.** *Network representations of composite components [6, 7].*

*The representation of LDOs genesis.*

that are naturally represented in semi-structured or unstructured form, the contemporary legacy Enterprise Systems are still dominantly operated over structured repositories. Concerning the growing shift from SQL to NoSQL persistency the hybrid repository model seems an appropriate solution to start with.

In ref. [8], the authors elaborate Big Data management in the context of three inherent supportive dimensions: *technology* (dominantly related to storage, analytics, and visualization); *people* (addressing the human aspects); and *processes* (addressing technological and business approaches to management aspects). The semantic value, quality of data, and data security are stated as dominant challenging issues concerning the Big Data foundation of arbitrary SoS artifacts. The Framework-based approach to Big Data analytics application in high-level education environments is presented in ref. [9]. Although strictly conceptual the proposed framework model may be applied beyond the scope of the education domain. In ref. [10], there is the application of Linear Mixed Modeling (LMM) promoted as a flexible approach for scientific experimental data analysis. The nature of experimental data opens a challenging question of dataset quality metrics that may be proliferated to the SoS dynamic configurations instances. The multilevel ontological generation of semantic relations extracted from the significant amount of heterogeneous linguistic data, persisting in Big Data repositories, has been proposed by Popova et al. [11]. The solution is based on a specific XML format that enforces the interoperability of information across individual levels of generated multilevel ontology, for a particular problem domain. The software engineering perspectives of the Big Data foundation are surveyed in ref. [12]. The refinements of software development activities through the challenging aspects of corresponding Big Data concepts are discussed with the particular accent on architecture design, software quality insurance, and data quality assessment. In ref. [13] the intelligent systems design processes are discussed through multidimensional modeling of knowledge and knowledge transfer between internal components and external counterparts of arbitrary intelligent systems viewed as a four-dimensional (*grade*, *atomization*, *abstractness*, *timing*) cellular architecture. The engineering aspects of spatial data is an challenging domain for engineering disciplines that are based on

## *The Foundation for Open Component Analysis: A System of Systems Hyper Framework Model DOI: http://dx.doi.org/10.5772/intechopen.103830*

different natural phenomena analysis and simulation like daylight illumination of residential buildings in ref. [14] where the representation of large data objects and its potential dimensionality reduction has an dramatic impact on urban planning.

Although computer-based analysis of massive data sets is in the field for a quite long time it is far from being a routine activity. From the architecture aspect, it is essential to separate the external repository from the internal dynamic storing and presentation layers and thereby hide the particular characteristics of persistent LDO form from its operational counterparts. This is the first pillar of the proposed SoS framework model.

The computational complexity of high-dimensional LDOs processing is the main obstacle for real-time or near real-time applications. The LDO's complexity reduction appears as a promising approach to the system/problem simplification process. Because less complex LDOs are easier to navigate, explore, visualize and analyze in different contexts they are more suitable for effective machine learning.

That is the main reason why several complexity reduction methods, based on different dimensionality reduction algorithms, have been proposed, formalized, applied, and verified. Among them, there are two main unsupervised methods worth mentioning: hierarchical clustering (HC) and principal component analysis (PCA). HC tries to build a tree-like structure with leaves representing the individual objects and nodes (pseudo objects) representing the clustering points of leaves with the highest degree of similarity. In further iterations, the individual clusters (as surrogates of clustered objects) replace the whole group and appears as individual objects with a certain accuracy payoff. PCA, on the other hand, creates a lower-dimensional representation of the initial data set on top of principal components as patterns encoding the highest variance in the data set, trying to preserve as much as possible of the original data set variance in process of dimensionality reduction.

Reducing the complexity of a particular LDO has its payoff in the accuracy of the reduced counterpart. The quality of a dimensionality reduction method is measured by its ability to gain the lowest possible complexity with the highest possible accuracy. Being unsupervised, HC and PCA methods are better suited for the generation of sustainable simpler LDOs, and consequently simpler SoSs configurations, rather than their verification.

Due to the fundamental focus of this chapter, the rest of the section is devoted to a more detailed elaboration of solely the PCA methods-related publications analysis. The mathematical elaboration is completely avoided due to the huge amount of references that have excessively addressed the foundation. A remarkable complete, simplified, step-by-step analysis of the original PCA method is presented in ref. [15], through five consecutive steps that lead to the data set dimensionality reduction. It starts with the standardization of the initial variable (dimension) range to comparable scale, to eliminate the possible supremacy of dominant instances, followed by the calculation of the covariance matrix of all possible pairs of scaled variables (dimensions) to uncover the correlation nature of each possible variable pairs. In the third step, the principal components isolation is performed by computing the eigenvectors of the covariance matrix and ordering them by their eigenvalues in descending order. This process isolates the principal components, which are the surrogates of correlated dimensions, in order of their significance. In the fourth step, the Feature vector is created by selecting the representative subset of principal components that leads to the desired dimensional reduction with preferable accuracy. The last step is the generation of reduced dimensional data set by data recasting over selected principal components. The mathematical foundation of linear PCA is gradually presented in ref. [16], and joined with the context of the previously referenced article completes its light-weighted approach. In ref. [17] the original row-based two-dimensional principal component analysis (2DPCA) and its extensions in the 2D image processing domain, have been discussed. The overview of several extension frameworks have been presented (bilateral projection nonlinear and iterative, kernel-based, supervised (with four variations), alignment-based, and random (with three variations). The robustness of all of the elaborated extensions has exhibited a performance increase in comparison to the original 2DPCA in image recognition, which has been a traditional application domain of PCA methods. PCA is discussed in ref. [18] through the extensive survey of five RPCA published models, with their comparative analysis in the context of video stream background management. In ref. [19] the modification of classical PCA (MPCA) with subspace learning framework based on multiple similarity measurements. In ref. [20] a comprehensive review and future PCA development have been presented. Although the reference is almost six years old, it has been used for the sake of the overall problem domain clarification where PCA is addressed as the linear exploratory tool for data analysis. Due to its application in different fields, the initial PCA has been modified in several ways to better suit the domain-specific characteristics. The authors enlist and discuss: static and dynamic functional PCA (FPCA), simplified PCA (SPCA), robust PCA (RPCA), and symbolic data PCA (SDPCA). In ref. [21], authors present the modification of nonlinear Kernel PCA (KPCA), with adaptive feature—Adaptive Kernel PCA (AKPCA) that is integrated with the gray relation analysis (GRA) for fault detection in complex nonlinear chemical processes.

The comprehensive analysis of different domain-specific PCA applications is far beyond the reasonable research effort. In ref. [22] the application of state-space functional PCA (SS-FPCA), as a 3-level hierarchical model built under the state-space model framework, for identification of spatiotemporal patterns based on satellite remote sensing of lake water quality in the form of time series of spatial images with missing observations. The authors of ref. [23] elaborate the Principal Componentbased support vector machine (PC-SVM) as a hybrid machine learning technique that combines PCA and SVM to cope with the potential software defects especially concerning the mission-critical software systems. In ref. [24] the contemporary challenging study of the implementation of machine learning methods for identification of patients affected by COVID-19 based on X-ray images. Two commonly used classifiers were selected: logistic regression (LR) and convolution neural networks (CNN) joined with PCA for complexity reduction and shorting the elapsed time to gain the quality diagnostic answer. The complex boundary generation method, presented in ref. [25], illustrates an practical application of dimensionality variation through the recursive search of the optimal residential building outer shape form, based on variable set of parameters.

This short survey and the much broader repertoire of similar research articles fully qualify the research motivation of this chapter, the formulation of supportive SoS Hyper Framework Model.

Physics of Open Systems is the additional paradigm for SoS Hyper Framework Model development where the system is considered as a tool where the knowledge and sense of its complexity are harvested. It is necessary to reference [3] for the rest of the relevant influencers. The directly influencing POS intellectual machine analytical core technologies that support systems: reconstruction through ontology mode variations; examination based on communication model variations; design based on state model variation; empirical context formation—the generation of LDOs in this chapter context; solutions behavior generation—the dynamic representation of varied model;

*The Foundation for Open Component Analysis: A System of Systems Hyper Framework Model DOI: http://dx.doi.org/10.5772/intechopen.103830*

visualization. Its formulation is transformed in the formulation of POS dimensions of the system model in the context of this chapters' SoS Hyper Framework Model proposal.

Software-supported frameworks of the arbitrary kind are usually targeted in model-driven software development (MDSD) and model-based system engineering (MBSE) approaches. They are closely related to the general architecture modeling paradigms and constitute the core of different contemporary enterprise architecture (EA) frameworks. There are several EA Frameworks proposed and specified, among which the most advocated are:


(governed by SEI), it is expected to evolve into a de-facto standard for the maturity classification of at least Software Development Companies.

• *International Council On Systems Engineering* (INCOSE)*—*System of Systems Framework [31]**-** is closely related to the Model-Based Systems Engineering (MBSE) approach that emerged from the INCOSE projects in the SoS engineering domain**.** It is clustered over three main concepts: *model* (*a formally simplified version of an entity of interest*), *systems thinking* (*a holistic approach to interacting entities and their components*) [32], and *systems engineering* (*transdisciplinary and integrative approach to the engineering of interacting entities, based on systems principles and concepts in the context of scientific, technological and management methods application, through the entire life cycle*). MBSE does not strictly prescribe any process framework the arbitrarily selected process model has to address the four essential systems engineering domains: requirements/capabilities; the static structure (systems architecture or topology) [33]; the dynamic structure (systems behavior); verification and validation aspects [34] (is it the right system? and if is, is the system right?). In ref. [35] there is a remarkable well-illustrated approach to SoS mission needs break down to capabilities and functions through the architecture framework and related ontology that has inspired several concepts of the SOS Hyper Framework Model specification.

The related work analysis shows that there is a tremendously large number of documents, studies, standards, procedures, and scientific articles that dominantly address particular aspects of the Principal Component Analysis approach to handle the dimensionality reduction problem, but fare fewer references concerning POS paradigm and its implementation aspects in the context of SoS.

On the other hand, there is also a lack of research concerning the interoperability framework approach with the integration mission. These facts favor the large-picturebased approach facilitating the Generic System of Systems Framework that sustainably orchestrates: Domain-Specific and Generic concepts and dimensions of complex SoS configurations that are opened for arbitrary POS and PCA methods extension. The System of Systems Hyper Framework Model, presented in the next section of this chapter, is considered as a first step toward the established goal. The collaborative frameworks, like one elaborated in ref. [36], has served as an initial framework specification of the proposed model presented in this article.
