Advances in Bioengineering

Emergence of high-throughput technology has made it possible to monitor biomolecules such as proteins, metabolites and DNA/RNA transcripts with high speed and accuracy. Pathway modelling is an important approach to study diverse biological pathways like metabolic, disease and signalling pathways that help us in investigating protein metabolomics, signal transduction and gene regulation processes occurring at the cellular level. It is possible to investigate the complex biological systems with the help of pathway modelling. Most of the reactions in the signalling pathway are enzyme catalysed protein activation reactions that are commonly referred to as “signalling cascade”. The aim of this chapter is to discuss various mathematical, computational modelling and networking approaches for the pathway modelling in the field of cancer in general and non-small cell


Introduction to Pathway Modelling
Complex biological systems are composed of numerous interacting elements which are diverse in nature and have different modes of regulation. High-throughput technologies have allowed us to unravel the cellular activities and has given an opportunity to study various biological systems (Chong et al. 2014). The biological pathways help us in the investigation of various metabolism, gene regulation and signal transduction processes occurring at the cellular level (Donaldson and Calder 2010). In a biological pathway, a signal is received by a specific cell and produces appropriate cell response (Cho and Wolkenhauer 2003). The positive or negative regulation of cell response depends on the signal being transduced, such as the mitogen-activated protein kinase (MAPK) or extracellular signal regulated kinase (ERK) pathway that can respond with cellular differentiation or cellular proliferation depending on the growth factor present (Dhillon et al. 2007). Pathway modelling has made it possible to investigate the complex biological systems. Pathway modelling refers to the study of interactions among various molecules, proteins or metabolites present within a cell that lead to a signalling in response to a specific environment (Blair et al. 2012). Signalling pathway is a nonlinear series of reaction which processes the chemical activities in response to the signals sent from the exterior of the cell to the internal receptors. Most of the reactions in the signalling pathway are enzyme catalysed protein activation reactions which are commonly referred as "signalling cascade" (Gupta 2018). Prediction of signalling cascade is possible due to a mathematical and computational modelling approach called systems biology. Systems biology is a combinatorial study of molecular biology and system network interactions that aids the understanding of various cellular and biological processes (Pfau et al. 2011). The main aim of systems biology is to reveal cellular mechanisms that can cause the modification of phenotypes and design customized novel anti-cancer drugs for therapeutic studies (Thomas et al. 2016).
Cancer is one of the most lethal diseases in the recent decades. A massive amount of high-throughput is being generated rapidly in databases which are publicly available and out of which some are exclusively dedicated to cancer data. A high mortality rate indicates that this generated data is not effectively translated into effective medicines (Garland 2017). The aim of this chapter is to discuss various past approaches for the pathway modelling in the field of cancer. Also, we focus on discussing the signalling pathways associated with cancer and protein-protein interactions involved therein.

Methods in Pathway Modelling
There are various approaches employed for pathway modelling depending on the type of biological pathway to be studied ( Fig. 1.1). Various models have been designed using these approaches. The challenge in pathway modelling is to choose an appropriate approach for designing a model. There are two well-known established methods for computational modelling of biological pathways: mathematical modelling and network-based modelling. Additionally, our group is working on a molecular dynamic simulations-based modelling approach for pathways.

Mathematical Modelling Approach
Mathematical modelling analyses the network by transforming various reactions and processes into mathematical form like matrices. Certain in-silico approaches using the mathematical model have been developed to validate hypothesis and predictions which would be difficult to estimate using in-vivo techniques. These predictions provide an excellent knowledge about the specific disease progression pathways. The major approaches in mathematical modelling are the Boolean . The complex chemical reactions in biological systems can be derived using ODE (Klipp and Liebermeister 2006). ODE models have been used on an extensive scale to determine the dynamic properties of many signalling pathways, specifically cancer signalling pathway. ODE model of tumour suppressor p53 and oncogene Mdm2 was constructed and this model exposed high variability in the oscillatory behaviour of the cells (Geva-Zatorsky et al. 2006). An ODE model for NF-ƙB was constructed to differentiate the role of NF-ƙB kinase isoforms (Hoffmann et al. 2002) and also to develop the model for MAPK pathway (Orton et al. 2005 1.3 In the NF-ƙB pathway, there are two pathways for activation, that is, canonical and noncanonical. The canonical pathway initiates with binding of TNF-α or IL-1 to its receptor. The binding recruits various proteins such as TRADD, TRAF2, IRAK, TAK, etc., which cause phosphorylation of IƙB by IKK. This leads to degradation of IƙB through the ubiquitin system. It further leads to activation of p50 and p65 proteins which translocate to the nucleus and activates transcription. Non-canonical pathway initiates with binding of CD40L to CD40 receptor resulting in recruitment of proteins such as NIK, cIAPs and TRAF2/3. The phosphorylation of IKK-α leads to the activation of RelB and p52 proteins that further translocate to nucleus and initiate transcription S. Bapat et al.
ODE is the most popular modelling approach due to its simplicity. Runge-Kutta is a well-known algorithm derived to increase the accuracy and efficiency of the approximations of the problems. In several cases, a series of logarithmic functions or power law functions are used to describe biological or chemical modelling kinetics . As a result, the ODE approach has been successfully developed to tackle issues regarding biological network systems.

Stoichiometric Approach
Stoichiometry refers to the estimation of reactants and products which are involved in chemical reactions. The stoichiometric approach aims to find a pathway in which a particular node satisfies a variety of different (which are meaningful, biochemically) stoichiometric constraints (Planes and Beasley 2008). This approach makes a direct use of reaction stoichiometry. The stoichiometric approach can be efficiently used to study and analyse the feasible steady state of a biological system (Materi and Wishart 2007). They are also used to evaluate the distribution of metabolic flux under a given set of conditions in a cell at moment (Cakir et al. 2004). Various tools such as flux balance analysis (FBA) and metabolic pathway analysis (MPA) are used for stoichiometric pathway analysis (Orman et al. 2011). The stoichiometric approach was implied on Saccharomyces cerevisiae to study the central carbon metabolism (Maaheimo et al. 2001). This approach was also used to characterize the metabolic network behaviour of rat tumour cell line (C6 glioma) using 13 C isotope (Portais et al. 1993). The stoichiometric approach also helped in characterizing human blood cell metabolism to determine the important regulatory points by using extreme pathway analysis (Trinh et al. 2009). There is a considerable amount of work on metabolic flux distribution for cell metabolism using the stoichiometric approach (Llaneras and Picó 2008) as compared to pathway modelling. Hence, there is a scope to explore the modelling of pathways using this approach ( Fig. 1.4).

Network-Based Modelling Approach
Network-based modelling focuses on applying the graph theory to find and connect associations between nodes and internodes in a signalling pathway. Each node is represented by an entity such as gene or a protein and the interaction between the two nodes are represented by edges. The approaches in network-based modelling of pathways are Bayesian networks, Gaussian networks, maximum likelihood approach, hidden Markov modelling and latent variable model.

Bayesian Method
The Bayesian network has the ability to uncover a statistical relationship among the random variables of the dataset. The Bayesian network approach provides graphical representations of various metabolites such as proteins, genes, amino acids or biomolecules. These metabolites are considered as variables and the relationship between these variables is predicted. Directed acyclic graph (DAG) structure is used to overall summarize the dependency or relationship among the variables in the Bayesian network analysis. The arc of DAG represents statistical dependence relations among the random variables and local probability distributions for each variable. Bayesian network analysis not only refines existing knowledge and uncovers a potential relationship among the signalling pathway, but also proves to be very useful for testing gene expression regulation problems (Sachs et al. 2002). The Bayesian network approach is widely used in metabolic pathway modelling and construction of the genetic networks and various causal modelling processes (Price and Shmulevich 2007). It has various special features such as incremental learning, well-developed methods for parameter estimation and techniques to introduce an unobservable or missing node, which creates a special interest in studying metabolic pathways (Conti et al. 2003). The Bayesian networking approach has a better performance for capturing the various probabilistic information of a biological pathway.
It was first used for modelling gene expression data when sequencing for various micro-organisms was carried out at tremendous rate (Friedman et al. 2000). The Bayesian networking approach is helpful in exclusive graphical representation of the gene expression data by justifying the relationship among the gene variables ( Rangel et al. 2004). The Bayesian model can be applied in the construction of single reaction metabolism pathway. It has also been used to deduce the relationship among various interacting proteins in a signalling pathway (Creixell et al. 2015). The Bayesian network model was used to study the activation of focal adhesion kinase (FAK) and extracellular signal regulated kinase (ERK) in a signalling pathway ( Fig. 1.5). Both of these activations resulted from interaction between the integrin α5β1 and extracellular matrix protein fibronectin (fn) (Sachs et al. 2002). Apart from these methods, the Bayesian modelling approach was used to integrate the high-throughput genetic and protein data for the reconstruction of a detailed biological pathway. The Bayesian network model was also implied to study the cell cycle expression pattern of Saccharomyces cerevisiae.

Fig. 1.5
In the MAPK-RAF-ERK pathway, the activation of ERK pathway initiates with binding of epidermal growth factor (EFG) to epidermal growth factor receptor (EFGR) which initiates various downstream protein recruitments such as SOS and GRB2 which result in activation of RAS protein. The activated Ras transmits the signal by activating RAF proteins (A-Raf, B-Raf, C-Raf). The signal is transmitted down for activation of MEK 1/2. MEK 1/2 further activates ERK 1/2, which further leads to the activation of transcriptional factors such as ELK1, ETS1/2, MYC and FOS which lead to apoptosis identified 800 different genes whose expression pattern varies from different stages of cell cycle (Fu et al. 2017). Conclusively, the Bayesian networking approach is capable of determining the relative probability of various statistical dependent models of unpredicted complexity and serves an important approach for analysing data in the pathway modelling domain. The Bayesian network approach has utmost importance in the problem-solving area, especially in gene expression analysis problems.

Gaussian Networking
The Gaussian network is an illustration of biological macro molecules such as proteins, amino acids, genes as an elastic mass and spring network. This elastic and spring network are used to study, characterize and understand the various aspects of its dynamic properties. This model is extensively used in studying the cell signalling pathways and the pathway modelling of various proteins (Eungdamrong and Iyengar 2004). The Gaussian network model is proved to be a simple and yet powerful approach to study the dynamics of the proteins. The Gaussian graphical model is an undirected graph. In this graph, the pairwise correlation between two variables is represented by each edge. The model is interpreted using the linear regression technique. While regressing two random variables for A and B, for example, on the remaining variables in the dataset, Pearson correlation is used to give the partial correlation coefficient between A and B. The Gaussian network is used for analysing various protein-protein interactions, gene-gene interactions or gene-protein interactions (Vella et al. 2017). The Gaussian network model is an undirected probabilistic model which is used to estimate the conditional dependencies between variables of a system. It is widely used for reverse engineering of various genetic regulatory networks and pathway modelling. The Gaussian network model was recently applied to biological datasets for elucidation of relationship genomic features in human genome. It has been applied to various biological datasets for the analysis of mRNA expression data. It was widely applied to study the lipid-focused targeted metabolomics dataset of 1020 serum samples of German population to study the metabolic pathway and Raf signalling network modelling (Barupal et al. 2018). Raf is an important signalling protein which functions as in regulation of cellular proliferation in human immune system cells. The deregulation of Raf signalling pathway leads to carcinogenesis. Hence, this pathway is said to be the most critical pathway in literature (Leicht et al. 2007).
The Gaussian networking approach has also helped in isoprenoid gene network pathway modelling in Arabidopsis thaliana. This approach served as a tool to conclude a gene network for isoprenoid biosynthesis in Arabidopsis thaliana (Wille et al. 2004). Conclusively, Gaussian network modelling is a valuable approach for rediscovering metabolic reactions of a biological system. It can further help in the investigation of metabolomics data obtained from high throughput technology leading to accurate profiling of metabolic data and providing a comprehensive picture of cellular metabolism.

Maximum Likelihood Approach
Maximum likelihood approach permits us to calculate the parameters for general models of network growth which can be expressed in the terms of recursion relations. This approach needs a probabilistic model which reflects the nature of the data and gives us an insight into how the network has been evolved. Calibration of the mathematical model by estimating the parameters of the ODE system from the experimental data is often done by maximum likelihood approach (Wiuf et al. 2006). The maximum likelihood estimation initiates with establishing the mathematical expression for the sample data called as the likelihood function. This likelihood function is basically a probability of obtaining a set of data for a chosen probability model. The likelihood function contains several unknown parameters. Maximum likelihood estimators (MLE) are the values of these parameters that maximize the sample likelihood (Jiao et al. 2015). In maximum likelihood approaches, the parameters are determined using confidence intervals. Maximum likelihood approach is an iterative process for modelling. This approach is used for phylogenetic modelling, studying genetic cross-over and gene expression analysis (Lu et al. 2018).
Maximum likelihood approach was used for estimating the kinetic rates in gene expression. Kinetic rate in gene expression gives information about reconstruction of genetic regulatory networks and is an important aspect for measuring the stability of the gene expression (Tian et al. 2007). Domain-domain interactions were used to study the protein-protein interaction by the MLE method (Deng et al. 2002). The MLE method was employed to predict the protein-protein interaction pairs for Saccharomyces cerevisiae, Caenorhabditis elegans and Homo sapiens. The prediction was carried out on the observation that proteins with common signatures are most likely to interact with each other and produce an external or internal response (Mahdavi and Lin 2007). The predicted PPIs by MLE methods were used for the construction of metabolic pathways and aids in filling up the gap of knowledge between proteins and pathways. The maximum likelihood approach was also applied to data obtained from DNA sequences of Nicotiana tabacum, Marchantia polymorpha and Oryza sativa for comparing synonymous and non-synonymous nucleotide substitution rates (Chumney 2012).

Hidden Markov Model
Hidden Markov modelling (HMM) approaches are often applied to statistical modelling problems for protein modelling, pathway modelling, database searching and multiple sequence alignments. The basic mechanism of HMM is that it describes a series of observations by a hidden stochastic process which is referred to as the Markov process. It is a statistical model which predicts the output observed events, based on previous observed or unobserved events. The observed event is called symbol and the unobserved or invisible factor which has an underlying observation is known as "state". Each state has probability distribution over the possible outcomes which are known as tokens. Transition among the state is managed by a set of probabilities known as transitional probabilities and the observation generated by associative probability distribution (Choo et al. 2004). The HMM approach is often used for identifying the pathway information and modelling biological sequences ( Fig. 1.6) (Qian and Yoon 2009).
The HMM approach greatly influenced the computational biology field. HMM has been used for various biological sequence analyses due to which it is popular in its effectiveness in modelling the relation between two domains or event and hence it is used in various fields. HMM has aided in many aspects such as prediction of gene, prediction of secondary protein structure, RNA structural alignment, modelling DNA, fast non-coding RNA notations and in prediction in pathway modelling (Yoon 2009). It was first implemented for construction of genetic linkage maps. It has also stated its importance in distinguishing the coding and non-coding regions in the DNA. It was later used to model protein-binding sites in DNA. HMM was successfully used to model protein super families. Protein super families are difficult to characterize than the families (Siepel and Haussler 2004). HMM was also implied to predict the secondary structure of proteins (Asai et al. 1993). Some of the studies used the HMM method for obtaining multiple sequence alignment. HMM is a valuable tool for the representation of a protein family or family domain (Pachter et al. 2002).

Latent Variable Model
The latent variable model is a statistical model which consists of latents (manifest variables) or a set of observable variables. Based on the position of the latent variable, the response on the indicators is generated. The latent variables can be categorical or continuous. The latent variable model is useful for the study of pathway modelling, regulatory networks and gene expression profiles (Tagore et al. 2008).
There are some other approaches like the density estimation approach, Helmholtz machine approach and generative topographic mapping. The density estimation Fig. 1.6 Hidden Markov model consists of two components: observable states (S1, S2, S3) and hidden states (y1, y2, y3, y4). It is assumed that the hidden states are modelled by simple first-order Markov process and they are all connected to each other. The connections between the hidden states and observable states represent the probability of generating a particular observed state given that the Markov process is in a particular hidden state approach is used for various metabolic pathway analyses, pathway modelling and various immunological or clinical trials (Estivill-Castro and Houle 2001). The Helmholtz machine approach is used for studying metabolic activities which are associated with the brain and nervous system (Han et al. 2011). The generative topographic mapping approach is used in gene expression profiling, microarray analysis and pathway modelling (Tonella 2001).

Molecular Modelling Approach in Lung Cancer
Lung cancer is one of the most lethal and a frequent cause of cancer-related death, accounting for approximately 1.79 million deaths globally in 2017 (Sun et al. 2007). Lung cancer is generally caused due to occupational exposures. There are two subtypes of lung cancer: non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). Eighty-five per cent of lung cancer patients suffer from NSCLC. Advances in radiotherapy, surgery, chemotherapy and molecular therapy have brought a revolution in lung cancer treatment (Herbst et al. 2018). Still, the outcome of the clinical results for NSCLC remains unsatisfactory due to local tumour recurrence and metastasis. Studies have shown that critical mutations in p53 and Ras gene may cause the persistence of DNA adduct formation in NSCLC. Extensive molecular genetic studies targeted at specific genes and pathways and genome-wide approaches have shown NSCLC to have multiple genetic and epigenetic alterations (Zappa and Mousa 2016). Various pathways with crucial components have their functions altered in NSCLC, and these pathways are starting to emerge important with regard to targeted therapy. These signalling pathways are stimulated by oncogenes, which help cells in malignancy, proliferation and escape from apoptosis ( Fig. 1.7). Mutated oncogenic proteins cause an addiction of tumour cells to their abnormal functions, a concept known as oncogene addiction. There is a need to focus on the genetic changes caused in NSCLC leading to functional alterations in the signalling pathways, rather than studying individual factors (Ray et al. 2010).

1.3
Signalling Pathways in NSCLC

MAPK Pathway
The mitogen-activated protein kinase (MAPK) pathway consists of a set of proteins that are present in the cell and the communication of signal from the receptor of the cell surface is carried out with the DNA present in the nucleus of the cell (McCain 2013). It plays a vital role in cellular growth, survival and regulation of gene expression. MAPK pathway is one of the most studied pathways for cancer biology. The MAPK pathways consist of signalling molecules such as Raf, Ras, MEK and ERK. Activation of MAPK pathway is done by binding extracellular growth factors with the receptor tyrosine kinase (Germann et al. 2017). Activation of MAPK pathway results in transcription of genes which encode the proteins required for essential cellular functions. Abnormal MAPK signalling leads to increased or uncontrolled cell growth and is resistant to apoptosis. MAPK signalling begins with activation of Ras by tyrosine kinase receptor. Activation of Ras leads to membrane recruitment and activation of Raf proteins (Pan 2013). MEK phosphorylates ERK, which directly or indirectly activates many transcriptional factors. Finally, the activation of these transcription factors results in the expression of genes that encode protein for vital cellular functions. Many mathematical approaches are used for the reconstruction of the MAPK pathway. The Boolean approach was applied to the modelling of the MAPK pathway. A Boolean network was constructed for comparing the collected data with all the possible Boolean functions and input datasets. The Boolean method was applied for this pathway modelling as it is not time consuming and was quite simplified compared with other known approaches. The Boolean method was applied to MAPK pathway reconstruction as it is suitable to simplify the potential values to ON or OFF which represent the presence or activity of a particular compound (Grieco et al. 2013).
Apart from the Boolean approach, a mathematical approach such as ordinal differential equation method has also been applied to the MAPK pathway to study the different aspects of the MAPK cascades. The differential equation approach helps  7 Signalling network in non-small cell lung cancer (NSCLC) generated using the Cytoscape tool. The hub node proteins SOS1, HRAS, KRAS, BRAF and MAPK play a crucial role in NSCLC pathway. The edges represent the interaction between the proteins in studying the important dynamic properties and behaviour of MAPK cascade with respect to particular structural features such as feedback loops, phosphatase activity, role of scaffold proteins and double phosphorylation. The ODE method also helped in studying the deregulation of MAPK signalling and its effect on tissue homeostasis leading to imbalance in cell proliferation and cell growth arrest and later leading to apoptotic cell death (Karreth and Tuveson 2009). Linear non-homogenous firstorder differential approach was formulated based upon the reactions involved in MAPK signalling pathways. Along with this, ODE method was used to find out the ultra-sensitivity of the signalling cascade (Shuaib et al. 2016). Hence, Boolean approach methods and ordinal differential approach methods were used for modelling of the signalling pathways involved in MAPK pathway. Figure 1.8 shows activation of various pathways such as MAPK, RAS, PI3K, PLCƐ, RAL and AF6. The pathway was generated using the PathVisio tool (Kutmon et al. 2015). The primary step starts with the binding of growth factor with the receptor tyrosine kinase (RTK). The SOS and GRB2 proteins are essential elements for the activation of RAS by dimerizing RAS-GDP to RAS-GTP which is in activated form. The activated RAS-GTP then leads to various signalling pathways such as AF6, MAPK, PI3K, PLCƐ and RAL. All these pathways have different functions. MAPK pathway leads to the expression of genes for cell differentiation, proliferation and survival. The AF-6 is one of the unique mixed linkage leukaemia partner which normally functions at the cell-cell junctions (Beaudoin et al. 2012). PI3K is an important pathway helping in the regulation of the cell cycle. Activation of PI3K phosphorylates Akt which has several downstream effects helping in cell cycle  (Liu et al. 2009). PDK1 pathway leads to the activation of many genes for the translation process. Ral pathway has proved an important role in the biology of cells and is involved in cell signalling for the expression of genes for the translation activity (Moghadam et al. 2017). The PLCƐ pathway is responsible for various broad range of biological and pathophysiological process hence PLCƐ pathway is responsible for Nuclear transportation (Dusaban and Brown 2015). This figure provides an easy snapshot of the activation of various pathways and their role at the cellular level.

NF-ƙB Pathway
Nuclear factor kappa-light chain enhancer of activated B-cells (NF-ƙB) is a complex of proteins that controls the DNA transcription, cell survival and cytokine production, cell differentiation and cell survival. It also plays a vital role in the regulation of immune response to infection. NF-ƙB consists of group of transcription factors that regulate inflammatory responses and hence protects the cells from cell death due to cellular stress (Hayden et al. 2006). In addition to this, NF-ƙB is responsible for programmed cell death via regulation of anti-apoptotic signals. The proteins expressed by NF-ƙB act in increasing the cellular genes expressions such as chemokines and cytokines (including interleukin 1β and tumour necrosis factor (TNP), major histocompatibility complex (MHC) and receptors involved in neutrophil migration and adhesion). NF-ƙB is also involved in the expression of genes for cell proliferation and apoptosis. cIAP1, cIAP2 and IXAP are the proteins expressed by NF-ƙB. TRAF1 and TRAF2 are the TNF receptor-associated factors for cellular inhibition by apoptosis. NF-ƙB transcription factors consist of the Rel protein families which are both hetero-dimer and homo-dimer in nature. NF-ƙB pathway activation is controlled by two different pathways, namely, canonical and non-canonical (Shih et al. 2011). In canonical pathway activation, IKK (IκB kinase)mediated phosphorylation takes place, followed by ubiquitination and finally IκB degradation which results in translocation of transcriptional factors to nucleus and genes activation. Non-canonical pathway is IκBα degradation independent and mainly involves the activation of NIK, and regulation of p100 NF-ƙB subunit. The Boolean networking approach was implied on the NF-ƙB pathway for studying the changes in the pathways due to the ageing process (Kang et al. 2011). The Boolean approach aided in studying the genes that are ON or OFF in two different cases, namely, young people and aged people. It was concluded that in aged phenotype genes such as TRAF5, IRAK1, CARD10 and PLCγ2 are constantly OFF in contrast to the young phenotype (Schwab et al. 2017). Hence, the Boolean approach helped in studying the gene-expression data of NF-ƙB.
An ordinal differential equation approach was also implied on NF-ƙB pathway to study the reaction kinetics which mainly focused on IκBα association and dissociation rates. ODE approach has also aided in describing the reaction kinetics of the concentration of nuclear and cytoplasmic NF-ƙB with respect to time. ODE also helped in studying the kinetics of IKK, NF-ƙB, IƙBα and the IKK inhibitor A20 present in the cytoplasmic and nuclear compartment of cells (Fumiã and Martins 2013). Moreover, ODE approach has been largely exploited to study the various aspects of the NF-ƙB pathway.

RAS Pathway
Ras is an original member of the RAS superfamily of proteins. Ras is expressed in nearly all cells and organs. Ras protein belongs to small GTPase which is mainly associated with cellular signalling transduction (Alanazi 2014). There are three distinct RAS genes, RAS-N, RAS-H and RAS-K, which act as intercellular switches and play an important role in signal transduction pathway for controlling the cell growth and differentiation. These three genes are closely related and are found to be activated in human tumours by point mutations. Several studies have concluded that KRAS is expressed in all cell types (Fernández-Medarde and Santos 2011). When the Ras pathway is switched ON, it switches on the proteins for cell growth, differentiation and survival. Mutation in Ras can cause permanent activated Ras proteins that would lead to cancer due to continuous cell growth and proliferation (Zenonos 2013).
Activation of RAS pathway takes place by binding of ligand such as growth hormone or cytokines with the RTK. This leads to dimerization of receptors and autophosphorylation of selective tyrosine residues in the cytoplasmic domain of the receptor. This acts as a binding site for various molecules such as growth factor receptor bound protein 2 (GRP2). The binding of guanine nucleotide exchange factor (GEF) with the SH3 domain adaptor molecules takes place especially with the Son of Sevenless (SOS) protein (Lake et al. 2016). The formation of GEF/SOS complex is the most essential part in RAS activation. GRP2 helps in the interaction of SOS with Ras-GDP. Finally, the Ras protein binds to Ras-GTP with the release of GDP and activated the downstream processes (Badawi et al. 2016).
There are various pathways involved which contribute to the cancer cell development in the metabolic system. The Ras/mitogen-activated protein kinase, RASS/ NOREA and PI3K/Akt pathways are major signalling networks linking EGFR activation to cell proliferation and survival (Yuen et al. 2012) ( Fig. 1.9). The Ras protein functions as a molecular switch, and sends signalling events from the cell surface to the nucleus, regulating cell growth and differentiation. Ras switches between guanosine triphosphate (GTP) bound active form and guanosine diphosphate (GDP) bound inactive form. The active Ras form binds to c-raf-1,3 A-raf,3 B-raf,3 PI-3 kinase and RalGEF/RalGDS targets. Mutations generally occur in the active form of Ras, thereby resulting in abnormal cell growth and defective signalling mechanisms (Lee et al. 2016).
The protein Raf is an immediate downstream target of Ras in the MAP kinase pathway. Activation of Raf by Ras occurs partly or entirely through recruitment of Raf to the cellular membrane by farnesylated Ras. Evidence also exists for an additional allosteric mechanism, whereby Ras binding induces conformational changes in Raf that promote activation (Chong et al. 2003). MAPK pathways are evolutionarily conserved kinases that link extracellular signals to the machinery that controls fundamental cellular processes such as growth, proliferation, differentiation, migration and apoptosis. MAPK pathways are comprised of three-tier modules in which a MAPK is activated upon by mitogen-activated protein kinase kinase (MAPKK) phosphorylation, which in turn is activated when phosphorylated by a MAPKKK (Zlobin et al. 2019). Mutations of RET, Ras and BRAF are mutually exclusive in thyroid papillary cancer and lung cancer. These results indicate that simultaneous mutations of multiple genes in the same signalling pathways are not required for NSCLC cancer pathogenesis, but a single mutation in any of the genes may suffice (Halliday et al. 2019).
Activation of one of the key oncogene K-Ras alters the molecular mechanism in the cell. Point mutation in the k-Ras protein leads to inactivation of GTPase activity and RASSF1. This leads to the tumour suppressor RASSF1 forming an interaction with NORE1, a Ras effector. The RASSF1/NOR1 causes a shift in Ras activity causing a cell proliferation. Mutation of K-Ras activates the PI3K-Akt signalling pathway. PI3Ks are heterodimeric lipid kinases composed of catalytic and regulatory subunits (Vara et al. 2004). The regulatory subunit p85a is the only PI3K molecule which has somatic mutations in human cancers. The mutations are seen to occur predominantly in helical or kinase domains of its catalytic subunit encoded by the phosphoinositide-3-kinase, catalytic, alpha polypeptide (PIK3CA) gene. Mutations of PIK3CA occur in many human cancers, resulting in PIK3CA being one of the two most commonly mutated oncogenes (along with KRAS) identified in human cancers (Fumarola et al. 2014). Mutation of K-Ras activates the PI3K-Akt signalling pathway. Interaction of K-Ras with Phosphatidylinositol-4,5-bisphosphate 3-kinase leads to blocking of caspase activity causing a shut down in the apoptotic cells (Tripathi et al. 2017). In MAPK pathway, the mutation of K-Ras activates the PI3K-Akt signalling pathway. Interaction of K-Ras with phosphatidylinositol-4,5-bisphosphate 3-kinase leads to blocking of caspase activity causing a shutdown in the apoptotic cells. Finally, the K-Ras protein activates the Raf protein in MAPK signalling pathway. The activation of MAPK1_3 in mitogen-activated protein kinase (MAPK) cascade hampers the various cellular functions and proliferation of the cell Molecular dynamics studies have been performed on individual protein roles in the signalling pathways. A molecular dynamics simulation-based approach, referred to an interaction correlation analysis, was applied to the PDZ2 domain to identity the possible signal transduction pathways (Kong and Karplus 2009). A residue correlation matrix was constructed from the interaction energy correlation between all residue pairs obtained from the molecular dynamics simulations. While both complexes are stable, several rearrangements occur in the Ras:RBD (Ras binding domain) simulations: the RBD loop 100-109 moves closer to Ras, Arg73 in the RBD moves towards Ras to form a salt bridge with Ras-Asp33, and Loop 4 of the Ras switch II region shifts upwards towards the RBD (Zeng et al. 1999). To investigate the mobility and the dynamics of STAT3 complex on IL-6 signalling in living cells, the signal transducer and activator of transcription 3 (STAT3) is a critical signal transducer of interleukin-6 (IL-6) signalling. The number of STAT3 molecules at the cytoplasmic membrane and in the cytoplasm decreased after IL-6 stimulation. In the nucleus, the diffusion speed of STAT3 complex strongly decreased after IL-6 stimulation (Watanabe et al. 2004). Another study elucidates the crucial structural features of SG2NA proteins which are involved in various protein-protein interactions and reveals the extent of disorder present in the SG2NA structure crucial for excessive interaction and multimeric protein complexes. The study also potentiates the role of computational approaches for preliminary examination of unknown proteins in the absence of experimental information (Soni et al. 2014).
The above studies describe dynamics of an individual protein-protein complex in a pathway. There is no work currently which describes large-scale dynamic simulation of protein-protein complex belonging to a signalling pathway in cancer. Also, there is no present study where atomistic energies between interacting residues are calculated to determine the contact matrix in signalling protein complexes for a given trajectory. This paper aims to determine the plausible interacting residues based on the contact maps generated for protein complexes.

Molecular Dynamics Approach in NSCLC Pathway
In the current study, a molecular dynamics simulation-based approach, referred to an interaction correlation analysis, and is applied to the protein targets in NSCLC pathways. A residue correlation matrix is constructed from the interaction energy correlation between all residue pairs obtained from the molecular dynamics simulations. Importantly, they reveal the energetic origin of the long-range coupling ( Fig. 1.10). A conformational analysis was carried out for all the complexes. The starting structure of the K-Ras-PI3K complex (PDB ID: 1HE8) was retrieved from the RSCB Protein Databank database. PI3K and PIP3 complex (PDB ID: 1HE8) belonged to the PI3K-Akt pathway. The RASS-NOREA pathway protein complexes were K-Ras/RASSF5 (PDB ID: 3DDC) and RASSF5/MST2 (PDB ID: 4LGD). The MAPK pathway proteins were SOS/RAS complex (PDB ID: 1BKD), RAS/Raf (PDB ID: 3KUD), Raf/MEK1 (PDB ID: 4MNE) and MEK1/ERK complex (PDB ID: 4IC7). The resulting structures were then optimized by conjugate gradient minimization using the OPLS (Optimized Potential for Liquid Simulations) force field. A multiple sequence analysis was performed between 3DDC, 1HE8 and 4G0N complexes. The sequence displayed maximum identity with an exception in the 12th residue where glycine is mutated into valine, and the 30th and 31st residue where glutamate and valine are mutated in 3DDC ( Fig. 1.11). The region in residues 29-38 were identified as the effector domains. The effector domains are seen to regulate a variety of signal transduction pathways.
The contact map is generated for the entire trajectories derived using molecular dynamics tool GROMOS. Amino acid residues like Glu31, Arg67 and Ile36 H-bond distance decreased during the time of simulation. Further analysis revealed that these residues were found to be common between Ras-Raf and Ras-Rassf5 pathways, indicating their involvement in signalling. These findings highlight guidelines for the design of potential inhibitors for targeting disease pathway. The study was successful in identifying residues in interface region and analysing the protein conformational changes during the course of dynamic simulation of protein involved in the NSCLC pathway.

Conclusion
MAPK, NF-ƙB and RAS pathways are the most studied pathways in the field of cancer biology. These pathways are activated by different external factors and thus lead to downstream signal transduction and release of proteins and the proteins are targeted for the drug designed for cancer. MAPK pathway leads to cell growth, survival and proliferation, the NF-ƙB signalling pathway leads to ageing of cells and RAS pathway is also involved in cell proliferation and apoptosis. Any mutation caused in these pathways may naturally lead to cancer and uncontrolled cell growth. Hence, these pathways are said to be one of the most crucial pathways in cancer study. Our molecular dynamic simulation studies for NSCLC revealed that apart from network and mathematical-based modelling approach, a molecular dynamics approach can also help to reveal the energetics and contact information of the amino acid residues at the interface region of the protein complex.

What Is Bioinformatics?
Bioinformatics is an interdisciplinary field of information science that has been applied to molecular biology to produce and organize large amount of sequence information. This sequence information consists of protein structure and genetic data DNA and RNA. This information plays a vital role to make decision about having children, to know cause of disease, to identify inherited disorders, and so on (Pocock et al. 2000). Bioinformatics software tools help to perform operations on biological data like sequencing the DNA structure, representing gene data in computational form, predicting the protein structure, analysing the genome data, protein structure modelling, unfolding the protein dynamics, etc.

Application of Java in Bioinformatics
Java is a high-level, object-oriented, flexible, interpreted and platform-independent language that is becoming popular for scientific computing also. Java is free to download and can be easily extended with modules written in C, C ++ (Pocock et al. 2000).
It offers great source of utility functions which reduce the lines of code and help to implement fast compared to other languages. There are other important features that enhance its utility such as: 1. It is a platform-independent language which means write once and run anywhere. This helps the developer to deploy only compiled file on any platform without any further compilation. 2. Java gives reusability feature by offering thousands of APIs to enhance the efficiency of code. 3. It supports common features of OOP concepts including polymorphism, inheritance, abstraction, and encapsulation. 4. It is secured language and that's why it plays a vital role in enterprise applications. 5. Java supports parallel processing which enables programmer to use multicore systems to make their applications run faster by using multiprocessor at same time. 6. Java has also adopted best practices in software engineering, including unit testing, constant integration, and also code review. 7. It is more interactive than C ++ by providing GUI components that enable development of scientific bioinformatics software (Guzzi 2019).

Introduction to BioJava
Bioinformatics has allowed the rapid advancement in the computational biology for sequencing and structural comparisons. BioJava was introduced in 2000 as an opensource project which is freely available on https://biojava.org/. It provides software modules for molecular biology to perform common bioinformatics routines. These modules are intended to give Java APIs which can be easily applied and used without having to know how they are implemented (Lafita et al. 2019).

The BioJava Modules
BioJava provides several independent modules built using Maven. Maven is a software project management tool and can manage project's build, documentation, and reporting of information due to which BioJava become distributed. It enables rapid bioinformatics application development using java programming language. In the following sections, we describe number of modules and highlight some of the new functionality that is involved in latest version of BioJava (Holland et al. 2008) ( Fig. 2.1).

Core of BioJava
The core module provides classes and interfaces to work with nucleotide and protein sequences. It also includes parsing sequences from remote as well as local resources, file conversion between any formats and translation from a gene sequence to protein sequence (Lafita et al. 2019). BioJava 3 provides with leverage the new innovation of java. Core module is a base module which gives the common functionality required to other module to process data.

Alignment Module
Alignment module contain the data structure and standard algorithms for multiple and pairwise sequence alignment.

Structure Module
Data structures play an important role in system memory; it organizes data and provides data structure algorithm to parse, compare structure and manipulate macromolecule structure in required form. The structure module allows GUI to view structure and structure alignment in Jmol.

ModFinder Module
ModFinder module is to provide an API for finding protein modification with protein structure and also to identify protein modification in protein 3D structure form (Gao et al. 2017).

Protein Disorder Module
BioJava's API supports multithread feature; it makes ~3.2 times faster than other language implementations like C and C ++ . The protein disorder module provides the way to detect disorder in protein molecules by using java's RONN predictor. This module can be used in two ways either by calling library function or by executing command on command line (Gao et al. 2017).

Web Service Access Module
Bioinformatics field is now becoming popular in web-based tools. User can access bioinformatics web services using REST protocol. REST protocol provides interoperability between Internet and computer system. Examples of bioinformatics web services are Blast URLAPI (QBlast) and HMMER.

The BioJava Packages
The following lists of packages are available to perform operations in bioinformatics. These packages provide classes and interfaces to perform computational operation on molecular data (Lafita et al. 2019).

Sequence Matching
The package org.biojava.bio.search provides interfaces and classes to search sequence similarity, to implement filtering, to test value associated with specified key, and to find exact subsequence within a sequence (Lane and Brodley 1997).

Symbolic Representation for Sequence
The package org.biojava.bio.symbol provides facility of SymbolList for manipulating, inserting and deleting gaps, translates symbols from one alphabet to another, and encapsulates the mapping from source to destination alphabets, suffix tree implementation.

Biological Sequence Data
The package org.biojava.bio.seq.db defines a database of sequences with keys and iterators over all sequences, implementation for making ID for a sequence and interface for objects that allow retrieval of sequences by names.

Process and Produce Flat File of Sequences
The package org.biojava.bio.seq.io supports to read and write arbitrary file format to identify sequence formats, alphabets, etc.

GUI Representation of the Sequences
The package org.biojava.bio.gui provides an interface LogoPainter to draw sequence logo in bars, rendering of the component so that same class can render data in different ways (Prasad 2015).

Sequence Database
The package org.biojava.bio.seq.db.biosql defines classes for relational database schema for storage allocation of biological sequence annotation and data. Application can connect to database by simply constructing an object of class (Holland et al. 2008).

Input Output Utility
The package org.biojava.utils.io provides classes, interfaces, and methods for reading character streams, to read character into the character buffer, to skip character from character stream, and also to help to mark the present position in the stream.

Network Programming Utility
The package org.biojava.utils.net has interfaces and methods to obtain URL associated with an object.

To Manage and Generate XML Document
The package org.biojava.utils.xml is utility package for generating and managing XML document by implementing classes and interfaces and also provides simple tool for creating java objects from XML document for configuring multiple applications.

To Generate HTML Reports from Blast Output
The package org.biojava.bio.program.blast2html is used to generate HTML from blast outputs. 3. Open JAR/Folder; it will add attached .jar file in your project library to access interface and classes of that package ( Fig. 2.4b).

Design and Implementation
Example 1: Demonstrate Using Pairwise Alignment of Alignment Module Pairwise sequence alignment was performed for two sequences of putative oral cancer suppressor gene, intron 1, partial sequence, segment 2/2(GenBank: BE628572.1) and Homo sapiens DnaJ gene, complete cds. (GenBank ID: KU178862.1). The ID: BE628572.1 was considered as the query sequence and KU178862.1 being the template sequence. The source code below illustrates local and global pairwise alignment using substitution scoring matrix for the two sequences. The results displayed a score of 1232 for the two sequences using global method and 1340 using local method. The gap penalty max score and min score was 5 and 2, respectively. Thus the code successfully gave the output for the two sequences .

Example 3: Demonstrate Jmol Molecular Viewer Using Structure Module
The code below illustrates the application for visualizing tertiary structure of protein complex of 3-isopropylmalate dehydrogenase from Thiobacillus ferrooxidans with 3-isopropylmalate (PDB id: 1A05) using Jmol molecular viewer. It renders the structure of a protein with different styles like cartoon, wireframe, CPK, backbone, ball and stick, and along with that, various characteristics of protein-like secondary structural elements, protein chains, hydrophobicity, amino acids, and elements can be highlighted using different colour modes ( Fig. 2.6).

Exception Handling in BioJava
An exception is an unexpected event occurred during the execution of source code, i.e. at run time, that's why exception is called as run time error. It disturbs the normal flow of execution. The Java compiler helps to identify and handle an exception by throwing it using try-catch block (Ebert et al. 2015). The programmer may face common exceptions like NoClassDefFoundError or ClassNotFoundException; it occurred when you try to load a class or method at runtime which are not found in classpath with uploaded JAR files (Rahmani et al. 2012).

2.9
How to Contribute in BioJava Open-Source Project?
The following steps illustrate the step-by-step tutorial for importing user-defined packages in existing open-source project. The package contains any number of classes, interfaces, and methods and more than that can be readily used by its members.
1. First identify a need in existing open-source project. 2. Analyse the skill requirement to implement the project. 3. Create your own open-source project with functionality you would like to build which existing projects don't offer. 4. BioJava is hosted on GitHub; it offers developers with public access APIs, UI, and Git repository that allow to contribute new functionality or even whole package and much more. Developer can fork existing projects, add or update changes to source code, fix bugs or report a bug and send pull requests to GitHub. 5. You can reach the GitHub projects by following the "Trending" link https:// github.com/trending/developers (Dabbish et al. 2012).

Conclusions
The BioJava library provides the powerful APIs for analysing genetic data. It is a mature tool which can be used to develop bioinformatics application in different research areas. Since it is pure java-based tool, applications developed are platform independent, scalable, and distributed (Rahmani et al. 2012).
BioJava is an open-source project that anyone can easily access and contribute to making of new packages into a source. Currently BioJava version 5 is under development which gives full-fledged tool with maven repository to facilitate rapid application development in the field of bioinformatics.

Abstract
Attention deficit hyperactivity disorder (ADHD) is a neurodevelopmental disorder that affects the social and personal traits of children between the age of 2 and 18, and the symptoms include inattentiveness and hyperactivity/impulsivity. Though this disorder is identified in childhood, it may persist till teenage in a few cases. ADHD is diagnosed on the basis of various rating scales that have been developed by experts. Additionally, MRI patterns are also used to study the anatomical and functional features of ADHD brain and the effect of medication. This chapter focuses on various machine learning models developed for accurate prediction of this disorder. Majority of machine learning studies were based on creating classification models, out of which SVM and ANN have been proved to give the most accurate diagnosis. A better predictive model with good correlation coefficient (CC) values, specificity and sensitivity has been generated with genetic programming-based algorithm. Numerous other relevant examples have also been cited in this chapter. The contents of the chapter will help the researchers to understand various techniques of ADHD prediction to provide better treatment for the children who are suffering from similar neurodevelopmental disorders.

Keywords
Attention deficit hyperactivity disorder · Genetic programming · Machine learning · Predictive model

Attention Deficit Hyperactivity Disorder (ADHD)
Attention deficit hyperactivity disorder (ADHD) was first described as an "abnormal defect of moral control in children" by a British paediatrician, Sir George Hill in 1902. He identified that the children are intelligent but exhibit some uncontrollable behaviour (Lange et al. 2010). Earlier it was considered as a result of poor parenting (Barkley 2015). But, over years, ADHD was recognised as a mental disorder and got listed in Diagnostic and Statistical Manual of Mental Disorders (DSM) released by the American Psychological Association (APA) in 1952. In their revised version, DSM-IV, released in 2000, ADHD was clearly defined as "Persistent pattern of inattention and/or hyperactivity-impulsivity that is more frequently displayed and is more severe than is typically observed in individuals at comparable level of development" (American Psychiatric 2013). ADHD has been categorised into three subtypes: • Predominantly hyperactive-impulsive • Predominantly inattentive • Combined hyperactive and inattentive (mixed type) As per the statistical studies by Centers for Disease Control and Prevention (CDC) in 2016, children between 2 and 17 years have been identified by ADHD (Division of Human Development and Disability 2018). Rarely, the disorder persists in their adulthood too. As per regional statistics, it was observed that North America has the highest prevalence of ADHD medication use (4.48%), while Asia and Australia have the least (0.95%) and Europe in between (Raman et al. 2018).

Symptoms and Causes of ADHD
The most common traits for ADHD are inattentiveness, impulsivity and hyperactivity. Inattentiveness refers to the disability of the child to focus on his/her play activities and studies. Mostly, the children having this disorder seem to be restless and not pay attention to any educational or play activity. They often get distracted by extraneous stimuli and feel difficulty in organising tasks. The hyperactive nature often drives them in intruding on others, as they have difficulty in waiting for their turn.
Earlier, it was believed that children with this disorder have got a behavioural problem and the blame was put on parents that they do not grow their children with good discipline. Later on, studies proved that foetal exposure to various environmental and genetic factors can lead to birth of child with behavioural and developmental anomalies (Fig. 3.1). Use of alcohol and tobacco in pregnancy causes prenatal exposure in children which is more likely to be the reason for this disorder. A statistical probabilistic study (Bouchard et al. 2010) has given a glimpse that use of pesticides can cause higher urine levels of organophosphate in pregnant ladies, which is another cause of birth of an ADHD child. Another study suggests that intake of food additives like artificial food colours, preservatives and sugar in younger age causes hyperactivity in children. Exposure to lead, which is a neurotoxin, affects the development of brain tissues of children and in turn affects behaviour in their young ages.
The frontal lobe of the brain controls the emotions, mood and impulses in a human. Children with head injury in this area also show the symptoms of ADHD. Recent studies show that ADHD is one of the major psychiatric disorders that occurs inheritably (Hart et al. 2014), as there is a strong influence of genetic factor in its prevalence. There is more likelihood to have a child with this disorder, if any relative has suffered with the same in the family.
The increase in the level of neurotransmitter, norepinephrine, is found to be the main reason for the hyperactive, impulsive and inattentive behaviour ( Fig. 3.2). This neurotransmitter is made up of dopamine, which regulates the emotional responses and movement in a person. Dopamine transporter and receptor genes are closely associated with ADHD. High amounts of dopamine can also cause various psychotic issues. Hence, maintaining proper levels of dopamine can control most of the neural disorders including ADHD. A few studies carried out by scientists at Cardiff University in Wales identified that there are duplicated or missing segments of DNA in the children with ADHD.

Diagnosis and Prediction of ADHD
ADHD is identified if the child is repeatedly showing any of the symptoms over a period of time, say 6 months. In most of the cases, the disease is identified in their school ages, according to the child's social development and academic skills. Various rating scales are associated with the diagnosis of ADHD, which are questionnaire based. Other methods of diagnosing ADHD experimentally include capturing and analysing the electroencephalogram (EEG), structural magnetic resonance imaging (sMRI) data, functional magnetic resonance imaging (fMRI) data and computer-aided diagnosis ( Fig. 3.3).

Rating Scales Associated with ADHD Diagnosis
The behavioural ratings of the child can be obtained from a series of interviews with the child and from the questionnaire filled by both parents and teachers (Green et al. 1999). But often it goes symptomatic, as they over-report each minute symptom. In majority of the cases, more correlation is found between teacher and parent ratings on the children (Sims and Lonigan 2012). Rating scales are the critical assessment tools in diagnosis of ADHD (Gomez et al. 2016;Kubo et al. 2018). The top three in the given summary list (Table 3.1) are commonly employed for diagnosis. The International  Classification of Diseases (ICD) and Diagnostic and Statistical Manual of Mental Disorders (DSM) have listed ADHD as a mental disorder, which serve as a reference for clinicians and researchers in this area. A child's social behaviour depends on his/her ability to learn things and interact with environment. Hence, taking measure of the intelligent quotient (IQ) also plays an important role in ADHD diagnosis (Katusic et al. 2011). Wechsler Intelligence Test (Wechsler 1991) is the most common IQ test in which an individual's verbal, reasoning, memory and working  (Katusic et al. 2011).
Based on DSM-V criteria for hyperactive/impulsive and inattentive symptoms, Vanderbilt Assessment Scales are developed by National Institute for Children's Health Quality. It also includes separate questionnaire for teachers and parents to assess and accurately predict the ADHD condition prevailing in the children and thus serves as one of the best tools for clinicians to proceed with the treatment (DuPaul et al. 2016). ADHD rating scales (ADHD-RS), based on DSM-IV criteria, are based on 18 questions, 9 on symptoms of inattentiveness and 9 on hyperactivity/ impulsive. The various common impairments in ADHD children are assessed using this, which includes their self-esteem, peer relationships, behavioural functioning, academic performance, etc. Conners' rating scale (CRS) (Conners et al. 2011) is used in assessing any comorbid disorders in children and youth between 6 and 18 years. This scale is based on questions regarding their emotional, behavioural and academic performances. They have separate forms to be filled by parents, teachers and a self-report (Farré-Riba and Narbonne 1997). The reports are easy to interpret and serve as a reliable tool for clinicians, psychiatrists, paediatrician, mental health workers, etc. (Kao and Thomas 2010).

Diagnostics Based on Functional MRI Data
Structural MRI (sMRI) and functional MRI (fMRI) data helps to understand the brain connectivity and development of any disease. Resting-state fMRI identifies the intrinsic activity patterns related to any regions in the brain (Wang 2017). Functional magnetic resonance imaging (fMRI) is used in studying the brain metabolism by identifying the cerebral blood oxygen level (Glover 2011). The subjects with ADHD lack inhibitory control, which is mainly used for task-based fMRI studies. There are many studies on fMRI image processing and feature analysis, which converge to the conclusion that the activity and connectivity of the ADHD brain are different from the normally developed brain (Liang et al. 2012;Akdeniz 2017). In a study on meta-analysis of task-based fMRI, several frontal regions, the right superior temporal gyrus, the left inferior occipital gyrus, the right thalamus and the midbrain showed hypoactivity (decreased blood flow), and hyperactivity (increased blood flow) is observed in right angular and middle occipital gyri (Cortese et al. 2012). Principal component analysis and independent component analysis are the majorly used techniques in fMRI analysis feature selection. The surface area, grey matter volume, cortical folding index, cortical curvature index, average cortical thickness, etc. for each brain region scan are measured and analysed in disease identification and classification. MRI analysis also helps in studying the effect of drugs in therapeutics (Borsook et al. 2012;Wise and Tracey 2006). The effect of various stimulant drugs for ADHD, their administration in different dosages and their positive or negative effects can also be thoroughly studied using the help of these imaging techniques (Weyandt et al. 2013).
Thus, the fMRI analysis has a higher importance in ADHD diagnosis and treatment than sMRI.

Computer-Aided Disease Prediction
In this century, healthcare industries are doing continuous research on improvement of disease diagnosis and treatment options. Various software tools enable the researchers to collect and analyse huge amount of historical and real-time data. Big data analytics and machine learning techniques have made a revolution in the growth of healthcare industry (L'Heureux et al. 2017). Using big data analysis, the patient data can be collected from molecular level to patient and population levels (Herland et al. 2014;Sheeran and Steele 2017). Data from various sources are combined together and are analysed by various algorithms based on artificial intelligence and machine learning. As their analysis is being critical in improving the diagnosis and treatment of diseases, the decision-making using large amount of data has been made easy with machine learning methods (Chen et al. 2017). Machine learning uses computer systems to make decisions using statistical analysis and optimisation techniques. It allows computer to get trained by learning from past experiences for making predictions and classification of data. Machine learning focuses on the improvement of quality and consistency of care (Frandsen 2016). Various machine learning models are developed by experts to quickly analyse data, which proves to be much accurate and reliable in decision-making (Kononenko 2001).

Overview of Various Machine Learning Methods in Predictive Analysis
In machine learning, a few data points in the dataset are used for training the model, then the model is tested with another set of data, and finally the model validation is done, which is mostly x-fold cross validation or leave-one-out cross validation. The features contributing classification/prediction were extracted from the training dataset, and the model is trained by a particular machine learning algorithm ( Fig. 3.4). Only a brief introduction to machine learning is provided here. For a detailed understanding, comprehensive reviews by Vyas and group can be referred (Vyas et al. 2015). Machine learning algorithms generally fall into two domains: supervised learning and unsupervised learning, based on the learning techniques/model ( Fig. 3.5). In supervised learning, labelled inputs are used to make decisions on output, whereas in unsupervised learning, there are no labelled data to get output. They usually use properties of given data and are mostly used in analysis, not in prediction (Qiu et al. 2016). Predictive analysis is done by creating machine learning models which can be either classification-or regression-based models. Classification model predicts the category of data according to a selected number of parameters or features, whereas regression model creates a relationship between the selected parameters that in turn predict the future data (Sagar et al. 2017). Most commonly used classification models include decision tree, Bayesian classification, artificial neural networks, support vector machines and classification based on association and clustering techniques. Regression models include linear and logistic regression and evolutionary algorithms such as genetic algorithm and genetic programming. Even though the complexity of regression model is more compared to classification models, accurate prediction with low error can be obtained by the regression model.
Neurological disorders range from headache to chronic issues like Alzheimer's, stroke, tumours, etc. Nowadays, a large set of medical data regarding these, available from various hospitals and research institutes, can be analysed by the aid of computers for detection and treatment of the abnormalities (Siuly and Zhang 2016). Disease diagnosis can be done using the symptoms, instrumental data or by gene/ molecular level data analysis. Machine learning methods can be used to identify/ predict the biomarker responsible for the particular disorder.
The comparative studies on application of various machine learning models in diagnosis of each disease show that the prediction accuracy is different for each models (Fatima and Pasha 2017; Jain 2015) (Table 3.2). Decision trees are one of the most popular classifiers, which consist of a root node followed by various branches having labelled values (Rokach and Maimon 2005). Each branch meets at   Support vector machine (SVM) uses a kernel function to split the dataset into different classes. The kernel is a non-linear mathematical function which defines the classification criteria. In linear classification, a hyperplane is formed by setting a maximum margin/distance between the data clusters and their distance with the separating plane (Shmilovici 2005) ( Fig. 3.7). The data points closer to the margin are called support vectors. Being a fast and efficient machine learning model for pattern recognition and classification, SVM has obtained better classification accuracy in diagnosing breast cancer, diabetes from tongue images (Zhang et al. 2017), fatty liver disease (Wu et al. 2019), chronic kidney disease (Polat et al. 2017), heart disease, chest disease (Yahyaoui and Yumuşak 2018), skin disease (Parikh and Shah 2016) and many others. Machine learning algorithms have ascertained to be a strong tool for feature selection of neurodegenerative diseases like Alzheimer's and Parkinson's disease. A comparative study of six algorithms in validating the feature selection method for these two diseases had concluded that SVM gives better accuracy in optimal feature selection (Tejeswinee et al. 2017).
Computational models such as artificial neural network (ANN) have great analogy on human brain connections ( Fig. 3.8). They have made recent advancements Being a supervised learning algorithm, the input data is mapped to the desired output by passing through various hidden layers. Due to their ability to learn linear and non-linear relationships, they are considered as one of the powerful computational tools in various disease predictions (Wahyunggoro et al. 2013). Multilayer perceptron (MLP) and extreme learning machines (ELM) are various feed-forward ANNs used in classification and regression studies. Recently, deep learning algorithms have proved to provide better accuracy among the neural network models, as they have multiple hidden layers in between, which analyses the data as human brain does, and do predictions based on large level of features. Feature selection in several disorders also proved better using various neural network algorithms (Moetesum et al. 2019). It had been observed that neural network gives better accuracy in classifying various diseases, for example, probabilistic neural networks (PNN) in

ADHD Prediction Using Machine Learning Models
ADHD often gets confused with autism spectrum disorder (ASD) (Taurines et al. 2012). An autistic child will be usually self-centred and focus on similar works, whereas a child with ADHD is hyperactive and very soon loses interest in any In 2011, ADHD Consortium along with the Neuro Bureau held a global competition involving the researchers in this field from three continents to analyse the ADHD individuals' data collected from eight various research institutes. The efforts put by various teams helped to establish a preprocessed dataset, which are available to all researchers in providing better healthcare facilities to the patients (Milham et al. 2012). The imaging and phenotypic dataset consisting of 776 subjects in the age group of 7-21 years old were available, out of which 362 were diagnosed with ADHD and 585 as typically developed children (TDC) and diagnosis was unavailable for the rest. ADHD individuals are again classified into three subcategories such as ADHD inattentive type, ADHD hyperactive/impulsive type and ADHD combined type. The dataset was preprocessed and is available to the researchers in this field through the Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC). The phenotypic data of these subjects includes data regarding their gender, age, IQ scores, diagnostic status and medication status. In the diagnosis of ADHD from experimental and nonexperimental data, two major machine learning studies that have been employed in classification are SVM and ANN. Hence, the following sections focus on the examples from these methods.

SVM-Based Studies on ADHD
Majority of the studies had reported SVM to be a better predictive model. A data-driven method in ADHD and non-ADHD classification based on SVM model, using symptom questionnaires, has been observed to be more efficient in clinical diagnostic prediction of ADHD with excellent classification accuracy, sensitivity and specificity (Bledsoe et al. 2016). Receiver operating characteristic (ROC) area is a technique used to check the classifier performance by plotting true positives and false positive on its axes. In a comparative study of SVM and MLP classifiers, accuracy of ADHD classification based on ROC area was evaluated. It was found that MLP has a relative accuracy of 3% over SVM classifier (Radhamani and Krishnaveni 2016). Cortical features such as intracranial volume, surface area, subcortical volumes, folding index, etc. can be analysed from sMRI which aids in predicting ADHD. In a study by Peng and team, 340 cortical features based on thickness, surface, folding, curvature and volume of various brain areas were extracted, and ELM and SVM were applied to the dataset (Peng et al. 2013). It was observed that ELM method provided faster results than SVM using large dataset with an accuracy of 90.18%, where linear and RBF-based SVM resulted only 84.73% and 86.33%, respectively.
As the cognitive development of children can be well studied using eventrelated potential (ERP) data, it can also be considered to be a better tool for diagnosing ADHD. From ERP recordings, the best classification features were identified using support vector machine-recursive feature elimination (SVM-RFE) and were observed that the accuracy of the classifier seems improving while considering more features (Öztoprak et al. 2017;Milham et al. 2012). Although ERP technique possesses a few drawbacks due to the generation of time-varying signals, they provide better quantification results. SVM technique had been used to classify the ERP dataset and observed 96% accurate after tenfold classification (Jahanshahloo et al. 2017).

ANN-Based Studies on ADHD
In ADHD children, some variations are also observed in their EEG signals, while performing some cognitive tasks, which capture their attention. A multilayer perceptron (MLP) model on EEG signals achieved a higher accuracy in ADHD diagnosis (Mohammadi et al. 2016). Deep learning methods through neural networks have very good analogy with learning and analysing data using human brain. Deep belief network (DBN), a class of deep learning method, applied on ADHD fMRI data was very effective in distinguishing ADHD subtypes as well as ADHD with typically developed children (TDC) (Kuang and He 2014). A few of the studies concentrate on a mixed approach of various machine learning algorithms in feature selection and classification of data points. A combination of deep belief network and Bayesian network was found to be a better platform in normalising fMRI data and extracting features based on various brain areas (Hao et al. 2015). Convolutional neural network (CNN) is another class of deep neural network, which is commonly used in image analysis. A 3D CNN-based model applied on sMRI and fMRI images facilitated in learning patterns and extracting information based on spatial features (Zou et al. 2017). This study had set a new insight in identifying ADHD biomarkers by neuroimaging. The validity of using imaging data in ADHD prediction is thus proved by various studies. But since the studies are carried out in different regions, a large heterogeneity prevails. The nonimaging data that includes the phenotypic key of individuals has shown better performance in diagnosis, where imaging data aids in generalisation of the method (Bohland et al. 2012). The difference in IQ of TDC and ADHD individuals shows the severity of the disorder. Their medication status and dosages are also mentioned in the phenotypic key along with IQ measures. The imaging data just validates the classification of ADHD formed from the phenotypic data (Kyeong et al. 2015).

ADHD Prediction Based on Regression Models
Regression models are used for quantitative analysis that can predict the features responsible for the diagnostics. One such regression model is logistic regression, which establishes relationship between one dependent variable and one or more independent variables. It is observed that the diagnostic stability of ADHD depends on various parental history of psychopathology and other socioeconomic factors (Grane et al. 2014). This study was done on the basis of logistic regression analysis for predicting ADHD based on the child level and parent level. The genetic variations also resulted with the influence of environmental factors, which complicates the identification of ADHD genetics. The ability of random forest regression approach in analysing complex problems helped in predicting the ADHD severity by identifying the responsible genes (van der Meer et al. 2017). Due to limited studies in regression model-based ADHD prediction, the authors developed a genetic programming model for the same based on IQ levels on ADHD patients.

Genetic Programming
Genetic programming is an evolutionary algorithm, which relates Darwin's principle of natural selection (Koza 1994) with computer-aided approach. It functions as a regression model by creating a mathematical relation between various features in the dataset. Similar to the natural evolutionary process, new individuals are created by random transformation of parent individuals, by making changes in the parent equation. The fitness of each individual is calculated, until satisfactory individual exists. The mathematical equation indicating the best fit individual is considered as the model for establishing the relation between the features involved ( Fig. 3.9). In this evolutionary process, all the newly formed individuals will be unique (Banzhaf et al. 1998).
In genetic programming (GP), each population is expressed in the form of a tree structure. Each tree consists of root nodes, functional nodes and branches. These nodes represent the mathematical functions to be performed. The terminal nodes represent the variables (features), on which the model has to be created. Figure 3.10 represents a sample tree structure in GP. Using crossover and mutation techniques, new tree has been formulated from the parent tree structures, whose complexity and fitness score has to be evaluated. Several iterations are performed to obtain the best fit model (Vyas et al. 2015).
As GP models have a unique way of problem-solving providing with good approximation model, it has been made a remarkable revolution in molecular data analysis in cancer diagnosis (Worzel et al. 2009). GP model has given a better performance model in predicting the protein-protein interactions responsible for cancer prognosis (Vyas et al. 2018). Application of GP against the diagnosis has provided better accuracies in identifying several other diseases like thrombosis, breast cancer (Werner and Fogarty 2001), etc. In automated real-time epileptic seizure detection from EEG signals, GP model was used and proved to be best accurate model with better computational speed (Bhardwaj et al. 2016). GP has also ascertained as an efficient model for discriminating the movement characteristics in Parkinson's disease (Lacy et al. 2013;Smith et al. 2007).
In the study on GP-based ADHD prediction model, we have considered verbal IQ, performance IQ, full-scale IQ, inattentiveness and IQ measure as the dependent variables, which are the most significant features describing the disorder. Using these features, a GP model was formulated with the basic mathematical functions. We observed the fitness score of more than ten iterations, with different complexities. Out of these, an expression with less complexity and low root mean square error (RMSE) was chosen as the best fit model to predict ADHD. Fivefold cross validation and the most complex leave-one-out cross validation (LOOCV) were also applied in to it validating the accuracy of this model. We also did classification of the disorder using SVM and ANN. Even though they provide good accuracy in disease identification, GP model had resulted better correlation between the selected features and hence can be used as accurate model for ADHD prediction.

Conclusion
Machine learning methods have made a revolutionary approach in disease diagnosis, which aids in the early treatment of ADHD. This chapter explains various machine learning methods applied in neural disorders, especially in ADHD prediction. Deep learning method and SVM have proved better performance among other machine learning methods. One of the approaches in ADHD prediction is based on an evolutionary algorithm called genetic programming, which is a regression modelbased prediction. The GP model of ADHD has resulted in providing a high accuracy model for prediction of ADHD with less error compared to other classification models such as SVM and ANN.

Applications of Deep Learning in Drug Discovery
Ketan Dinkar Sarode

Abstract
Armed with advances in computational resources and high data throughput, artificial intelligence techniques have achieved remarkable success in diverse application areas in past decade. In recent years the field of pharmaceutical drug discovery has seen upsurge of deep learning applications that go beyond bioactivity predictive models and aid in various facets of drug discovery process. One of the biggest strengths of deep neural networks is their ability to learn from complex nonlinear data without explicit need for handpicking the features. This chapter aims to provide an overview of deep learning methods and their applications in the drug design field. The chapter begins by introducing concepts of . Application examples of these architectures such as RNN-based variational autoencoders for de novo molecular design, natural language processing, use of adversarial network in GANs for obtaining valid molecular designs, bioactivity prediction, and imagebased profiling of bioassays using CNNs are reviewed to bring out variety of drug design challenges being addressed using deep learning techniques.

Introduction
Discovery and development of new drugs for better human health management remains a key research problem and a big challenge. Computational techniques are used as cost-effective alternatives to accelerate the drug discovery process. These techniques aim to find new hit molecules with desirable pharmacological properties from large libraries of chemical compounds. Commonly referred as virtual screening (VS), such approaches scan through millions of compounds in search of a drug molecule that can efficaciously interact with the desirable biological drug target. Depending on availability of the structural knowledge about the target molecule and target-ligand interaction information, appropriate computational approaches are used. These are broadly classified as ligand-based or structure-based methods (Schneider 2018; Śledź and Caflisch 2018). Computational approaches based on mechanistic molecular modeling face limitations in accuracy and scaling while handling large molecular systems and large compound libraries. In such cases datadriven machine learning approaches which are most popular class of artificial intelligence methods have found much wider applications (Smith et al. 2018;Zhang et al. 2017). Historically, quantitative structure activity relationship (QSAR) models trained using machine learning approaches were used to find mathematical relationships between physicochemical, molecular properties of compounds, and their biological activities (Topliss 2012). Recent years have seen remarkable technological advances that have led to increase in availability of complex pharmacological and pharmacoomics data. These datasets are high dimensional and heterogeneous coming from genomic, proteomic, or activity assay experiments and medical or cell assay images quantifying several molecular variables. Modeling of such large datasets with efficiency is an important and challenging problem. ML methods like support vector machines (SVM), random forest (RF), and neural networks (NN) (Karthikeyan and Vyas 2014; Karthikeyan et al. 2005) have been useful for building conventional QSAR models but are not suitable for use with such high-dimensional, heterogeneous, and high-volume data (Zhang et al. 2017).
Deep learning (DL) approaches are better equipped for such problems because of their ability to learn from complex nonlinear data without explicit need for handpicking the features, ready availability of techniques for handling challenges of overfitting, and efficient computational methods for training (Chen et al. 2018;Goh et al. 2017b;Mamoshina et al. 2016). In recent years, DL has revolutionized several artificial intelligence fields such as computer vision, natural language processing, and automated game playing.
DL approaches are also finding applications in drug discovery with architectures like convolutional neural networks (CNNs), recurrent neural networks (RNNs), variational autoencoder (VAE), and generative adversarial network (GAN) (Ching et al. 2018;Gawehn et al. 2016). This chapter aims to provide an overview of these deep learning architectures and their applications in drug design field for chemists, biologists, and pharmacologists working in biomedicine. The chapter starts with a primer on machine learning describing the workflow and broad steps involved in solution of any machine learning problem. Next structures of artificial neural networks (ANNs) and deep neural networks (DNNs) are discussed. Subsequent sections discuss various DNN architectures and their applications in drug discovery.

Machine Learning Primer
Machine learning methods build mathematical models with predictive abilities. The math model acts as a machine, and the learning refers to estimation of optimal set of parameter values in the mathematical model equations that convert input features accurately into label predictions. The input for any machine learning algorithm is typically in the form of some features and corresponding labels. Features are some measurements or descriptors of the system which are expected to be most helpful in predicting the label or output of the model. Final aim of machine learning method is to predict the label for any unseen data based on the input features. To perform this task accurately, the model needs to learn from several different samples of the features and their corresponding labels. The learning or training process carries out iterative corrections in the model parameter values such that model performance in terms of predicting accurate labels is improved. Model performance is measured with the help of cost function. In supervised learning, the cost function measures difference between model predictions and true labels, and the training process continues until the cost function is minimized. In drug discovery applications, different types of data such as physicochemical properties of compounds; 2D, 3D, or binary fingerprints; biological activity profiles; and gene expression profiles can be used as features. Features can be continuous, categorical, or binary. Similarly, labels can also be continuous (e.g., activity values), categorical, or binary (e.g., active or inactive) (Camacho et al. 2018). When the labels for samples in training data are available, the learning process is referred as supervised learning. Depending on whether the label is continuous or discrete, supervised machine learning problems use regression or classification models, respectively. For datasets without labels, unsupervised ML methods are used. Here rather than learning to predict the label, the machine learns to find patterns in the input features or learns to reduce the dimension of the feature space. Supervised methods find more applications as the model can be trained to make predictions of practical interest.
Typical machine learning steps ( Fig. 4.1) involve, first, processing of the input data -to handle missing data points, feature scaling, and selection of features most relevant for prediction of output label; second, training of the machine, i.e., mathematical model to learn from training data; third, use of the trained model for prediction on test data to validate the prediction accuracy; and, fourth, use of the machine learning model as label predictive machine for desired application.
For a problem from drug discovery field where one might be interested in predicting biological activities of novel drug compounds, machine learning algorithm will learn from the known biological activities of a set of drug compounds. Typical machine learning approaches will use various physiochemical properties of the compounds as features and will try to correlate them with the corresponding biological activity values by tuning parameter values in the chosen mathematical model. The quality of the trained model can be evaluated based on the accuracy to recall the labels from training dataset and also for test compounds with known activities which were not used for training. Such validated model then has potential to use its learnings to predict biological activities of novel drug compounds based on their individual physiochemical properties/features without the need for experimentation.
ML methods can address any task where a pattern or mathematical mapping between features and labels can be learned for predictive applications. Domains where much mechanistic information is not available, ML methods can identify patterns or relationships from large datasets and use them for predictive applications. The quality of the features used in the input data, apart from the number of sample observations in the training data, affects accuracy of machine learning models. The Steps involved in machine learning workflow. The input training data is represented in the form of features (x 1 , x 2 , x 3 , ...) available for several samples. A machine learning model of choice learns from the data in input feature space. The trained model needs to be validated on test dataset. A model with good predictive accuracy in validation exercises can then be used for predicting labels of new input data realization of saying "garbage in, garbage out" cannot be better seen than in case of machine learning applications (Camacho et al. 2018). The performance of any machine learning algorithm depends on the training data used. All features used for training machine learning model need not be informative for the prediction of the desired output label. Actually, irrelevant input features can lead to overfitting and thereby hamper the predictive accuracy of the model for unseen data. Another important aspect affecting machine learning performance is the choice of functional form for the mathematical model. For simple property prediction, linear regression models have been used in the past. Such regression models are the simplest examples of machine learning. The field has evolved to handle much complex problems, and generally support vector machines (SVMs), random forests (RF), and artificial neural networks (ANNs) have been the most popular choice for mathematical model structures. Such model structures are able to handle nonlinearity in the feature space. But with increase in model dimensions (number of features), most of these model structures require huge number of parameters to be estimated, and regularization techniques need to be used to find balance between overfitting and underfitting during parameter estimation. Recent advances in deep learning make them better equipped for the challenges discussed above. Highly scalable and customizable architecture of deep neural networks makes them capable of handling high-dimensional and heterogeneous (e.g., images, sequence data, etc.) features. Widely applicable nature of DL framework has led to development of many efficient algorithm and hardware optimization frameworks. All these factors make DL methods attractive as model structures for machine learning applications. Next section describes the architecture of traditional ANNs and how it inspires deep neural networks (DNNs) architecture.

ANNs and Deep Learning
Architecture of ANNs is inspired from the structure of human brain. A set of neuronal nodes receive input signal, and it is processed through network of connected neurons to generate some output. Most basic artificial neural network architecture has three layers: an input layer, a hidden layer, and an output layer ( Fig. 4.2). The neurons in these different layers may be fully or partially connected. The input layer neurons or nodes receive input in the form of features, and thus typically the number of input layer nodes is equal to the number of features in the training data. Nodes in the hidden layer receive input from the input layer and transform them based on a mathematical function known as activation function. Each node in the hidden layer uses different sets of parameter values. In ANN terminology, the parameters are also referred as connection weights. All the commonly used activation functions have mathematical form as shown in Eq. 4.1. where a i is the activation of ith neuron in the hidden layer, x j is the jth input feature out of total n inputs to the node, w ij are the weights for connection between jth input node and ith hidden layer node, and x 0 is the bias term. This linear-weighted sum of inputs is then passed through a nonlinear activation function g. Most commonly used activation functions are logistic or sigmoid function (g(a) = 1/(1 + e −a )), hyperbolic tangent (g(a) = tanh (a)) activation function, and ReLU (rectified linear unit) function (g(a) = max (0, a)). Logistic and tanh functions show saturation curves between limits 0 to 1 and −1 to 1, respectively. Logistic activation function is useful for classification applications, whereas ReLU activation function helps in solving vanishing gradients problem during training of ANNs. Hidden layer of the ANNs transforms the input features into alternate features with the help of activation functions. These transformed features, i.e., activations, a i are then used by output layer nodes as inputs to calculate final output of ANN. Output layer nodes like hidden layer nodes also use activation functions to transform input activations into final output.
The weights associated with all connections in ANN are estimated/learned by iterative optimization such that the accurate output can be predicted based on input features. The gradient based back-propagation optimization methods are used for training of ANNs.
During early development of ANNs, applications were limited by availability of big data, limiting computing resources, and algorithmic challenges like estimation of large number of parameters, overfitting, and diminishing gradients. In recent times development of deep learning methods has led the resurgence of this field. Compared to traditional ANN, DL methods use larger number of hidden layers and also show more flexibility in defining the nature of these hidden layers. DL frameworks nowadays can use several layers with many nodes due to availability of  (b). The neuron calculates weighted sum of the inputs, adds bias (x 0 ), and then passes it through a nonlinear activation function g to obtain output activation of the neuron as a i powerful CPU, GPU, and cloud computing resources. Number of hidden layers indicates the depth of the neural network. In a feed-forward deep neural architecture, consecutive hidden layers are connected to each other such that each hidden layer receives input from the previous one and the transformed input features are then passed on as input to next layer ( Fig. 4.3). This kind of multiple and hierarchical feature mappings in different layers of DNN allows them to construct features most useful for success in a given task and provides DNNs with ability to learn several complex tasks. Deep neural networks learn by estimating the weight parameters that minimize prediction error through backpropagation process. In backpropagation, the error calculated at output node is propagated back through the network to calculate corrections in the weights from different hidden layers so as to minimize activation error at each neuronal node and as a result error at final output node.
The access to powerful computing resources has also accelerated algorithmic

Convolutional Neural Networks (CNNs)
The typical ANN/DNN architecture discussed in earlier section represents each layer of the neural network as a vector of neuronal nodes. Such regular networks do not scale well for modeling multidimensional data, e.g., images. In the input layer, each node corresponds to a single feature. Thus, for modeling color images of size 100 × 00 × 3 (i.e., 100 pixel wide, 100 pixel height, 3 color channels), the input layer will have 100 × 100 × 3 = 30,000 nodes, and a single fully connected node in the hidden layer will have 30,000 weights and a bias parameter to be estimated. Such huge number of parameters clearly prevents practical use of several nodes in the hidden layers and also poses the problem of overfitting. The CNN architecture presents a practical solution for this problem (Krizhevsky et al. 2012). CNN uses three main types of hidden layers, (i) convolution layer (conv), (ii) pooling layer (pool), and (iii) fully connected layer (FC).
To deal with multidimensional data, unlike a regular neural network, the convolution layers of CNN have neurons arranged in three dimensions. The threedimensional layers consist of several 2D filters. Each 2D filter is an array of neural nodes and connects with only a segment of a layer before it instead of all the neurons in a fully connected manner. Still, the filter is able to interact with all the nodes in previous layer through convolution process. Figure 4.4 discusses this idea with a schematic example. In this example, the input features are a 2D matrix A of size 6 × 6, and a 3 × 3 filter is convoluted across two dimensions of the input starting from upper left corner. In very first step, the receptive field of the filter is shown by blue square. Convolution operation calculates dot product between filter and the receptive field (to obtain C 11 = −3) and then shifts into next position (receptive field Illustration of convolution operation performed by nodes in the convolution type of hidden layer in CNNs. The 6 × 6 2D matrix A forms input for the hidden layer, and matrix B is a 3 × 3 filter defined by connection weights of the hidden layer nodes. Output of this hidden layer is calculated by convolution operation where starting from the upper left corner of A, dot product is calculated between matrix A sub-segment (marked in blue color) and filter B. The output of this dot product is stored in C (1 * 1 + 2 * 0 + 2 * −1 + 7 * 1 + 3 * 0 + 5 * −1 + 2 * 1 + 4 * 0 + 6 * −1 = −3). The filter is then slided along both dimensions of A to fill matrix C with corresponding dot products shown by red square). The process is iterated until filter has convoluted across both dimensions of the input to obtain matrix C. While shifting across receptive fields, the filter parameters (B) remain the same, thus reducing the number of parameters compared with fully connected regular neural network. Another type of hidden layer used in CNNs is pooling layer. Pooling layer reduces size of the feature maps by calculating mean, maximum, or other statistics of various subregions in the feature maps.
In typical CNN architecture (Fig. 4.5), initial few hidden layers are convolution and pooling layers. Various filters in convolution layers help in learning different characteristics of the feature map. For example, in image recognition application of CNN, some filters help in detecting edges while others help in detecting some objects. The pooling layers reduce size and variance of feature maps. In the deeper layers, the smaller feature maps are concatenated into fully connected layers similar to traditional ANN.
Using common set of filter parameters allows CNN to reduce number of parameters to be learned, efficiently scales for higher dimensional feature maps, and also helps in learning relevant features by avoiding overfitting. These characteristics also lead to lower computational memory needs and higher computational speed in case of CNNs. For applications like image processing and computer vision, CNNs outperform other machine learning algorithms (Krizhevsky et al. 2012;Lawrence et al. 1997).

Recurrent Neural Networks (RNNs)
Regular neural networks do not differentiate between static data and sequential data. A novel deep learning architecture recurrent neural network (RNN) is designed for modeling sequential data such as linguistic text, audio for speech recognition,  In traditional neural networks, all the input samples and corresponding outputs are independent of each other. Thus, outputs for a set of input features can be calculated independently in any order. In sequentially dependent data, e.g., when one needs to predict next word in a sentence, information about previous words in the sentence is important. To model sequential dependence in these types of data, RNNs allow sharing of parameters across sequential steps. Input features enter RNN sequentially, and to compute output of any current node, apart from the input for the current node and the associated parameter values, activations from the previous node are also used. Concretely the information flow in RNN is depicted in Fig. 4.6. Hidden layer neuron receives input x t (subscript denotes current sequential position t of input x) and activation h t-1 of the hidden node from previous sequential step t−1. Using these inputs, the activation at current step is calculated as h t = g(w hh h t-1 + w xh x t + b) where g is a nonlinear activation function like ReLU; w hh is the recurrent connection weight for the hidden layer node; w xh is weight for connection between input node and hidden layer node; and b is a bias term. Using the current activation of hidden node h t , output can be calculated as y t = w hy h t + c where w hy is the weight for connection between hidden layer node and output node and c is a bias term.
The backpropagation technique used for learning weights in RNNs is called backpropagation through time (BPTT). For longer sequences BPTT method faces problem of vanishing gradients practically leading to termination of optimization. Long short-term memory (LSTM) networks (Sak et al. 2014) are the variants of RNNs that can reduce the vanishing gradient problem. LSTM networks basically extend memory of RNNs and are therefore more suitable for learning from important data peculiarities that are separated by very long time (or any other sequential step unit) lags in between. While processing data x t at sequential step t, hidden layer weight parameters w hh and hidden layer activations h t−1 of previous step are used. w xh are weight parameters for connection between input node and hidden layer, whereas w hy are weight parameters for connection between hidden layer node and output node y t

Autoencoders (AEs) and Variational Autoencoders (VAEs)
Autoencoder is a deep learning neural network used for unsupervised learning. The input features are used without labels, and the aim is to achieve nonlinear dimensionality reduction of the input. The accuracy of compression is determined by reconstructing the original data from the compressed latent space. Thus, AE neural networks have two components: the first encoder part transforms information from input layer into hidden layer latent representation (or bottleneck) with fewer number of nodes ( Fig. 4.7). Instead of using these reduced features for predictive purpose, the second component decoder tries to reconstruct its inputs from these latent representations. Fewer number of units in the hidden layer (latent representation) necessitates discarding of information, and only information relevant for full reconstruction by decoder is retained.
AE neural networks have been used for data denoising as during the data compression, noise information is discarded (Lu et al. 2013). Unsupervised AE approach can also be used for application like anomaly detection (Sakurada and Yairi 2014) where the training data availability is skewed towards normal samples. By learning to accurately encode and decode training samples, anomalous sample can be detected if a high reconstruction error is observed. Since characteristics of any new anomaly cannot be known before hand, differentiating them from normal occurrences serves the purpose of anomaly detection. Dimensionality reduction for data visualization, neural inpainting, and image segmentation are also other popular applications.
AEs also have application in building generative models. Here a main limitation is that the latent space in AE neural networks is not continuous and therefore they

Original input
Reconstructed output Encoder Latent representation Decoder . AEs process input features through an encoder neural network to obtain a latent representation with reduced dimension. Decoder neural network then converts the latent space representation into an output of original input dimension. After training of AE networks, the latent representation encoded by encoder can be converted by decoder network back into its original input are accurate only for predicting learned representations and do not allow interpolation.
Variational autoencoders (VAEs) have been developed to address this limitation and to make them useful for generative modeling. Latent spaces in VAEs are continuous and allow random sampling and interpolation. VAEs achieve this by learning means and standard deviations of the latent space distribution rather than an encoding vector ( Fig. 4.8). Variational autoencoders (VAEs) are powerful generative models which have been used in diverse fields (Semeniuta et al. 2017). They help in generating a random new output which is not part of the training data but still has similar properties.

Generative Adversarial Networks (GANs)
Generative adversarial networks belong to the class of generative models. Thus, they are able to generate new samples by learning from the training samples. Other generative models like VAEs discussed above also help in generating diverse and novel samples, but in certain applications, validity of the generated samples is an issue. For example, in drug design application, generative models can generate structures of novel molecules which may not be valid. In such cases generative adversarial networks (GANs) prove more useful. GANs have two components, a generator and a discriminator that try to identify whether the samples are model generated or real (Murugan 2018) (Fig. 4.9). The two components are framed in adversarial roles where generator iteratively learns to generate more and more realistic samples and discriminator tries to improve its performance in differentiating the generated samples from real ones. Iteratively both generator and discriminator

Original input
Reconstructed output Encoder Continuous latent space D ecoder networks become better at their jobs. This competition between the two subnetworks ultimately makes GANs capable of generating samples that are indistinguishable from the corresponding real-world samples. Great potential for GANs has been widely recognized, and they are already being used to generate surprisingly good quality samples of images, music, and speech (Ledig et al. 2017).

Learning from 2D Images of Compounds
CNNs are DL architecture which shows better performance in handling image data compared to other methods. Goh et al. (2017c) used CNNs to predict chemical properties of compounds just based on the 2D structure drawing of the molecule. The strength of DL-based methods to not require handpicking of features or feature engineering is highlighted in this study. With minimal information of 80 × 80 pixelsized images of the compounds, the method showed comparable results with other studies using multitask DNNs with extended connectivity fingerprints (ECFP4). ECFPs are circular topological fingerprints of the compounds. Surprisingly with some basic additional chemical information, the method showed further improvement in performance (Goh et al. 2018).

Learning from SMILES Representation
RNNs are DL architecture ideal for handling sequence data. In drug discovery field, RNNs have been used to model input features in the form of SMILES representations of the compound structures. SMILES (simplified molecular-input line-entry . GANs use two neural networks in adversarial roles. The generator network produces novel chemical structure samples using generative models like VAEs, while the discriminator network tries to differentiate between compound structures produced by generator and samples from real compound database. Through iterative training steps, accuracy of both generator to produce valid structures and that of discriminator to classify real and synthetic samples increases system) is a notation for representing chemical compound structures as ASCII strings. Goh et al. (2017a) built a RNN-based deep learning network SMILES2vec to correlate the compound structures provided in SMILES format with their chemical properties. The method outperformed existing methods for regression. A key finding of this study was that without using any complex features for describing the properties of chemical compounds, just the linear character string representation was found to be sufficient for predictive applications. This study thus brings out potential of deep learning methods to learn relevant features on its own without explicit need for feature extraction and selection.
Bjerrum (2017) also used SMILES representations to model QSAR dataset using LSTM RNNs. In their studies, they found that instead of using a single canonical SMILES string for any given molecule, enumerating all SMILES representations in the dataset provided better results.

Learning from Small Datasets
Practical applicability of DL methods for predicting property or activity of small molecules is limited by availability of large training datasets. Altae-Tran et al. (2017) addressed the issue of large training set requirements for deep learning applications. They used an approach called one-shot learning that can work with significantly lower amounts of data. It uses the iterative refinement long short-term memory (LSTM) combined with graph convolutional neural networks to learn from small datasets. The developed models are accessible through an open-source DeepChem platform (DeepChem 2019).

Scoring of Protein-Ligand Interactions
Molecular docking is widely used in structure-based drug design approaches and involves scoring of protein-ligand interaction poses. Traditional scoring functions typically use molecular mechanics-based force fields for such purpose. Recently different studies have used CNNs to build models for scoring protein-ligand interactions. One such study (Ragoza et al. 2017) uses a 3D grid around binding site as input and trains CNN to learn key protein-ligand interaction features that correlate with binding affinities. The method was able to differentiate between correct and incorrect binding poses and outperformed AutoDock Vina in ranking poses. Gomes et al. (2017) and Pereira et al. (2016) have also used DNNs to model protein-ligand binding affinity for docking scoring.
Another example of CNNs in structure-based drug discovery was the earliest of such use. Wallach et al. (2015) used features from 3D structures of target-ligand complexes. A 3D box of 20 Å with 1 Å grid spacing was used to define input feature space. Each grid cell contained numerical values representing structural features such as enumeration of atom types and more complex descriptors at that position. This method, AtomNet, outperformed other docking approaches on a various set of benchmarks by a large margin and brought out capability of CNNs to model the ligand-protein interactions with performance comparable or even better than docking methods.

RNNs as Generative Models
In contrast to virtual drug screening applications, where one looks for promising drug molecules in a library of millions of compounds, in de novo drug design approaches, one aims to create novel active drug molecules (Hartenfeller and Schneider 2011). Segler et al. (2017) and Yuan et al. (2017) used RNNs as generative models to obtain predictions of novel drug compound structures based on training on library of known molecules. These studies used SMILES representations of the compounds in training library. The trained models were then able to predict novel and structurally valid SMILES strings by learning the probability distribution of characters from the training set. In generative mode of RNN, a character is generated based on the learned probability distribution which then forms input for the next sequential step for generating the next character. Such generative models have been shown to be able to generate novel drug compounds against Staphylococcus aureus and Plasmodium falciparum which were not present in the training set (Segler et al. 2017).

Autoencoder (AE) and Variational Autoencoder (VAE)
Similar to RNNs, VAEs can be used as generative models. Gómez-Bombarelli et al. (2018) used VAEs to learn from SMILES representations of chemical compounds in ZINC database. Here the encoder maps the SMILES representation into latent space which is continuous distribution. The decoder network can sample from this latent space and generate novel compound structures (Blaschke et al. 2018). Thus, to generate structures similar to a given molecule nearby points in the continuous latent, space can be sampled as input for decoder. With increasing distance in latent space, more diverse compound structures can be generated. The continuous nature of latent space also allows interpolating between multiple molecules.

Generative Adversarial Networks (GANs)
Generative models are helpful in obtaining novel drug compound structures but may sometimes produce syntactically invalid structures with strained and reactive groups. Generative adversarial networks (GANs) ameliorate this problem by coupling the generative network with discriminator network. Kadurin et al. (2017) used VAE as generator of new compound structures, whereas the discriminator classified generated structures as valid or invalid. Such setup showed better results in generating compounds with anticancer properties.

Biological Image Analysis
Image segmentation and classification capabilities of CNNs have been used for analysis of fluorescence microscopy and phase-contrast microscopy (Kraus et al. 2016;Ronneberger et al. 2015). CNNs also find use in analysis of cell culture or tissue-based assay systems where they can be used for automated cell tracking (Ning et al. 2005) and colony counting (Ferrari et al. 2015).

Outlook
In the form of QSPR/QSAR models, machine learning models have been used in drug discovery for several years now. The major challenges faced in applying ML for drug discovery ranged from need to engineer and handpick useful features, limitations in handling complex multidimensional feature spaces, to computational limitations in building complex model structures. Various DL architectures discussed here show promising advances in addressing these challenges. Flexibility of DL architectures enables modeling of range of different types of data. For example, CNNs are able to model multidimensional images as input features, and architectures like RNNs and LSTM RNNs can model sequential data. This flexibility in input features representation makes DL an attractive option for interdisciplinary researchers such as chemists and drug research scientists. Chemical representations like 3D protein-ligand interactions, 2D structure images, or even simple text string representations like SMILES can be readily used for complex predictive application without explicit need for feature engineering and selection. Moreover, these architectures can even be combined with generative DL models like VAEs and GANs. These recent advances have allowed ML applications in drug research to move forward from just building activity or property predictive QSAR models to other facets of drug research. Thus, DL methods are finding applications in modeling bioassay images, accurate molecular docking scoring, and also de novo drug design. A possible limitation of DL methods is the need for massive training sets. Such training sets may not be available for specific applications. In this situation utility of transfer learning techniques needs to be explored more widely.
High-performance computational resource requirements for DL methods are becoming less and less limiting with advances in GPU-based technologies and cloud computing platforms. Access to such high-performance computational resources has also boosted research and algorithmic innovations in DL. Several of these advances are accessible through open-source libraries and are driving DL applications in drug research. All major technology companies are also aggressively developing their deep learning platforms such as TensorFlow (2019) (2019) is an opensource library specifically useful for drug discovery application. Thus, in future, we are likely to see more applications of DL in drug discovery research. Wider use of DL techniques by non-experts will depend on ease of choosing DL architectures and hyper-parameter settings.

Abstract
With the evolution of next-generation sequencing (NGS), huge amount of complex data is generated in computational analysis of large sequences. For instance, sequencing of a single whole genome of a simple organism alone generates 100 GB of data. In the variant calling pipeline of NGS, a number of different files are generated like fastq, bam and sam, and all are in the size of GB. The challenges of data handling, storing and organizing of large and com-plex datasets can be overcome by employing big data analytics framework that consists of different components like Hadoop, NoSQL databases and massively parallel processing (MPP). Implementation of this framework will reduce the time required for identifying cancer mutations by accelerating alignment step which is a time-consuming step in mutation analysis. The chapter focuses firstly on highlighting the challenges in computational analysis of NGS data. It further explains implementation of Hadoop framework for reducing the processing time required for sequence alignment. The readers will also be made aware of illustrative examples of use of big data analytics in mutation identification of various cancers. The knowledge base provided will accelerate the understanding of NGS in application of routine health care and personalized medicine protocols.

Introduction
Next-generation sequencing (NGS) is a high-throughput sequencing technology that has a potential to sequence millions of sequences reads in a single run of a sequencing platform and has greatly reduced sequencing time and cost. NGS can be implemented in three different ways: WGS (whole genome sequencing), WES (whole exome sequencing) and transcriptome sequencing. In clinical diagnostic field, WES plays an eminent role as by sequencing only exome part of a gene that is only 1%, there are greater chances to get more number of diseasecausing mutations. In exome part of a gene, more number of mutations are located, so it reduces time and cost as compared to sequencing of whole genome. With the advent of NGS, all possible range of mutations can be identified that could not be detected using Sanger sequencing. NGS is carried out using different sequencing platforms like Roche454, Illumina and ABI/SOLiD, and for analysing the generated sequencing data, various computational tools are available like FastQC, Bowtie, BWA, VarScan, GATK, different genome browsers, Galaxy and SanGenix platforms, etc. After performing the computational analysis of NGS data, vast amount of data is generated (Wadapurkar and Vyas 2018). The technological revolution made by NGS has shifted biological data to big data, as shown in Fig. 5.1 that depicts size of files generated after execution of steps such as generation of raw sequence files, base calling, alignment and mutation identification, etc. (Bao et al. 2014). Among these, the first three steps generate very large amount of data.

Challenges in Computational Analysis of NGS Data
As a result of NGS data analysis, terabytes, petabytes and exabytes of genomics data are generated from different analysis steps that create number of intermediate files, and day by day, the size of data increases very rapidly. For instance, if we take an example of EBI (European Bioinformatics Institute) database, storing 20 petabytes of data currently, out of which genomics data itself is of 2 petabytes (Marx 2013). 1000 Genomes Project, a catalogue of human genetic variations that has been launched in the year 2008, has generated more than 200 terabytes of sequencing data that has been stored in GenBank database which is twice or thrice larger than the data that has been submitted in the last few years (Tripathi et al. 2016).
As 1 terabyte of data is generated in per sample analysis, so storing, organizing and handling of these big, complex, heterogeneous sequencing data is very challenging and that requires huge amount of memory space, high-performance processors and computational techniques. Sequence alignment is the most tedious and time-consuming step in NGS data analysis. For storing sequence alignment bam file (Binary Alignment Mapping) of a small-to medium-sized sequence read that occupies around 2 GB of space and extends up to 10 and 150 GB for analysis of whole exome and whole genome read, respectively. Then, storage is overheaded by variant identification and annotation steps in downstream analysis that add (Roy et al. 2016).
It is very difficult task to mine and interpret this complex, large data. Around 13 quadrillion DNA bases per year is the current word's limit of sequencing. This enormous amount of data is generated from all laboratories and institutes across the world. As raw sequence reads only take major part of storage space, and along with that transformation of this data do occupy more space. So, there is a dire need of developing more databases to store these large datasets as well as computational tools and techniques to analyse the stored data.
Genomic datasets are obtained from multiple data sources that are linked to the sequencing data and evaluated as big data. Different stakeholders maintain these data that has been created from multiple platforms and has different formatting requirements. This creates heterogeneity in accessing the data that will ultimately cause trouble in deriving useful information from big data. Also, issues like selection bias and confounding can be created while selecting genomic datasets for linking. Some databases are accessible only for a particular region, for example, HES (Hospital Episode Statistics) data has only England region availability; it is not for the entire UK. Conventional data analytic techniques have failed to solve these issues (Wordsworth et al. 2018). So, researchers and scientists are trying to resolve these challenges using big data analytic techniques.

5.3
Introduction to Big Data Analytics

Big Data
The data which is characterized by its complex structure, huge volume and tremendous velocity is defined as big data. Big data has the following four characteristics: 1. Volume: It refers to the amount of data growing at a high rate going beyond petabytes. 2. Velocity: It refers to the rate at which data is produced. 3. Variety: It refers to different types of data, structured, semi-structured and unstructured. It includes different data formats like audio, image, video, sensor, text and web log data. 4. Veracity: It refers to the uncertainty of available data. Veracity induces data incompleteness and inconsistency due to the high volume of data (Walunj Swapnil et al. 2016).
There are many sources of big data in biology, among which genomics data is one of the prominent sources. Genomics is all about genes that carry lot of information. A characteristic human genome contains thousands of genes, with each made up of millions of base pairs. The human genome alone consists of more than 3.1 billion base pairs. Merely mapping a genome involves a 100 GB of data, and sequencing numerous genomes and tracking gene interactions further surge data to petabytes in some cases. Traditional data analytics technologies fail for such huge genomics data where big data analytics technologies effective achieve this.

Big Data Analytics
Data analysis refers to the process of gathering, cleaning, transforming and modelling data with the objective of discovering the required information. The results so obtained are interpreted, proposing conclusions and supporting decision-making process. Data visualization techniques are used to represent the data in simple way to discover useful patterns hidden in data. Data analytics can be defined as process of examining very large datasets with the use of mathematics, statistics and computer software knowledge aimed to draw conclusion to support decision-making systems and provide deep insights. It has also adopted from other fields of machine learning, signal theory, computational intelligence, operations research and pattern recognition.
Data analysis projects typically consist of several phases: data retrieval, selection, cleaning, filtering, visualization and analysis and finally data interpretation and evaluation. The whole data analytics process is an iterative process in nature. For simplicity, data analytics can be distinguished by following four phases, as described in Fig. 5.2. The following four stages are considered most common data analysis stages (Runkler 2012): 1. Data collection and preparation 2. Data preprocessing 3. Data analysis 4. Data post-processing

Data Collection and Preparation
First stage of data analytics is to identify and specify data requirements depending on the business needs. The problem needs to be solved are identified and objectives are targeted.
In data collection, data is gathered depending upon the requirements of business needs. Data is gathered from various sources like web pages and organizational databases. Data which is relevant to analysis is selected. In feature selection, most independent, discriminating and informative features are selected in order to achieve effective analysis of data and more promising results.

Data Preprocessing
The data that is collected data is first cleaned and then processed for effective analysis. It is then transformed to format relevant for analysis tool. This process involves tasks like data cleaning, filtering and transformation which essentially improve data quality. The data collected may contain missing data, duplication and errors. In this phase, different processes are used for filling missing data, eliminating outliers and removing errors. Eliminating outliers and errors is very important to maintain the accuracy of the model. The filtering methods are aimed for removing noise. The data is normalized to uniform scale to bring all features under unique range.

Data Analysis
In this phase, various analysis techniques are applied, and conclusions are drawn based on the requirements. The data visualization techniques help to represent information in various graphical formats like chart, picture and diagram which helps to communicate with data in better way. Visualization techniques highlight patterns and relationships between variables in data. Correlation analysis derives relationship between continuous and quantitative variables and also measures strength of relationship. Correlation can be positive or negative, and it allows predicting future drifts among variables. Regression analysis is also used to evaluate relationship between dependent and independent variables. Different techniques are used to model and analyse relationship among variables. This phase helps business mainly in prediction and forecasting. Classification analysis helps to understand category of new observations, depending upon the observations learned by model in training phase. Similarly clustering analysis is carried out to group similar variables, called cluster.

Post-processing
The results and conclusions generated are formulated, and reports are generated according to specifications. It supports business decisions and its impacts. The feedbacks are collected from users and also reviewed to improve business decisions and data analysis.

Introduction to Hadoop
The main problems which are encountered while handling genomic data are: 1. Long access time for reading and writing very large sequence files from single storage device. 2. Hardware failure causing data loss. 3. Combining analysis results from multiple storage sources is challenging.
In hand technologies like redundant array of independent disks, high-performance computing, distributed systems can overcome above challenges, but it requires lot of human intervention to handle big data analytics, and scalability remains a major bottleneck. To handle limitation posed by traditional approach to store and process big data, Google has proposed MapReduce model as solution. The model divides the task into small parts, assigns them to clusters of computers, executes these tasks in parallel and combines the results from them to form the result dataset. Hadoop implemented this MapReduce model as open-source project and provides it a successful big data analytics tool. Apache Hadoop is an open-source platform that works in an environment which targets distributed storage and parallel computing across clusters of computers using MapReduce programming model. MapReduce programming model is designed for batch query processing, running ad hoc queries against whole dataset at once and getting the results in an optimized amount of time. The Hadoop platform is typically written in the Java programming language, with command line utilities written as shell scripts and some native code written in C. Hadoop also supports other programming models which fall under infrastructure of large-scale data processing and distributed computing that are collectively referred as Hadoop ecosystem (White 2015).
Hadoop consists of the following four core modules which work together to accomplish overall task assigned to it.

Hadoop Architecture
It follows the master slave architecture for distributed storage and parallel processing using MapReduce and HDFS, as shown in Fig. 5.3.

HDFS (Hadoop Distributed File System)
It is a distributed file system that takes care of storage part of Hadoop. HDFS architecture follows data locality principle where in data is stored locally on commodity computers referred to as data nodes and computational logic is directed towards them, thus making it different from other distributed file systems. The computational logic is a program written in a high-level language such as Python or Java. Such a program processes data stored in Hadoop HDFS that adopts a reliable way to store very large files in a distributed environment as data blocks. The default block size for Hadoop 2.x is 128 MB which is configurable. The replica of each block is maintained to provide fault tolerance. By default replication factor is 3 which is a configurable parameter.

Daemons
Daemon in computing terms is a process that runs in the background. Each daemon runs separately in its own JVM.

HDFS Daemons
Name Node: It act as master node which is configured as high-end server. It accomplishes the File System Namespace and securely controls access to files by clients. It maintains records of files metadata that are stored in the cluster, e.g. location of blocks stored, the size of the files, permissions, hierarchy, etc. Data Node: It act as slave node which is configured as commodity hardware. It stores the actual data and services low-level write and read requests submitted by file system's clients.

MapReduce
MapReduce is a processing model and software framework for handling applications that process massive volume of data stored in the Hadoop Distributed File System. The MapReduce implements divide and conquer model to process data in parallel. It divides the submitted job into a set of independent tasks termed as sub jobs, executes these sub jobs independently in parallel and then combines the result of all sub job to get final result of submitted job. MapReduce divides the processing into two phases: the map phase and the reduce phase. The map is the first phase of processing, which incorporates all the complex logic or business rules to be executed. It is a heavy weight process. Reduce is the second phase of processing in which aggregation or summation is executed and is a computationally lighter process.

YARN
Apache YARN (Yet Another Resource Negotiator) plays an important role as resource management system in Hadoop cluster. In Hadoop 2, YARN framework was introduced to overcome limitation of the MapReduce implementation to batch processing tasks. YARN provides framework to support diverse set of tasks, comprising advanced modelling, interactive SQL and real-time streaming. YARN provides necessary APIs for allocating and releasing resources, job scheduling and monitoring.

YARN Daemons
Node Manager: This daemon process runs on slave nodes. It is responsible for coordinating with Resource Manager for task scheduling and tracking the resource utilization on the slave node. It utilizes two other daemon process, viz. ApplicationMaster and Container, to handle MapReduce task scheduling and execution on the slave node. Resource Manager: This daemon process runs on master node which is responsible for getting job submitted from client and schedules it on cluster, monitoring running jobs on cluster and allocating proper resources on the slave node. It communicates with Node Manager Daemon process on the slave node to track the resource utilization. It utilizes two other processes named Application Manager and Scheduler to execute MapReduce task and manage various resources.

Advantages of Hadoop
• Hadoop provides a reliable analysis system (MapReduce model) and distributed storage (HDFS).
• Hadoop is linearly scalable, as Hadoop cluster can support growing data need by adding more computers within cluster. To support scalability HDFS is designed to store massive amount of data on single platform and Map Reduce is designed to process enormous amount of data. • It is cost-effective, as it can work with commodity hardware and doesn't require expensive high-end hardware. All slaves nodes are commodity computers and only master nodes are high end computers. • Highly flexible, as it can store and process variety of data structured, semistructured and unstructured, irrespective of type or format. Flexible storage provides access to full-fidelity data for a wide range of analytics and use cases. • Hadoop is fault tolerant. Data is replicated on multiple nodes according to the replication factor configured. So even if one of the nodes fails, the required data can be read from another node which has replica of that failed node. And it also ensures that the replication factor is maintained after failure of certain node. It does so by maintaining replica of failed data node on to another available node. • Hadoop works on the principle of write once and read multiple times (HdfsDesign 2019).

Work Flow of Hadoop for Sequence Analysis
A MapReduce job is nothing but analysis task that is submitted. It comprises of the Input, the MapReduce logic, and configuration information (White 2015). Hadoop executes the job into two phases: map phase and reduce phase. Each phase accepts data as key-value pair and also gives output key-value pair. These tasks are scheduled by YARN and executed on data nodes within the cluster. If one of the tasks fails, then it is rescheduled automatically to run on a different node.
Input: Reference sequence Map phase logic: Sequence alignment logic Reduce phase logic: Combining sequence alignment results from all mappers Final output: Alignment sequences The following steps are executed for sequence analysis: 1. Input submitted to MapReduce is first divided into equal size pieces splits. One map task is created for each split, which executes the user-defined map logic for each record in the split. 2. Map task output is intermediate output so it is stored on local disk instead of HDFS. Once the job is complete, the map output can be thrown away. The Mapper output is processed by reduce phase logic to produce the final output. The input to a single reducer task is normally the output from all mappers. 3. The sorted map outputs are transferred across the network to the node where the reduce task is scheduled, and then they are merged and then passed to the reduce function.
4. The final output of the reduce phase is normally stored in HDFS for reliability. 5. For each HDFS block, replicas of the final output of MapReduce algorithm are stored for reliability.

Applications in Identifying Cancer Mutations
To increase the survival rate of cancer patients, it is necessary to detect the cancer at early stages. Big data analytics can be applied in diagnosis as well as prognosis of cancer by efficiently identifying all possible range of cancer mutations. Each patient's genome profile is generated that includes a very large file size and hence treated as a big data. So, for storing, managing and analysing this big data, analytical techniques such as Hadoop and MapReduce are applied that extract patterns to speed up mutation identification pipeline that will ultimately help in facilitating the cancer diagnosis process. With this, it will prove beneficial for doctors to take quick diagnostics decisions. For further analysing the genomics big data, different methods are used like classification, regression, neural network, cluster analysis, machine learning, pattern recognition, etc. (Venkat Reddy Korupally and Subba Rao Pinnamaneni 2016). Big data analytical techniques are applicable for number of cancer big datasets. The datasets are in the form of cancer mutations, gene expression, genomics, transcriptomics and proteomics data of cancer patients. These datasets are hosted by different resources. The cancer genome atlas (TCGA) hosts data of 33 types of cancer that is in the form of tumour and normal datasets of more than 11,000 patients and has contributed to more than 2.5 petabytes of data. The International Cancer Genome Consortium (ICGC) contains data on 21 tumour sites and covers 70 projects, and that has been contributed by more than 19,290 cancer donors. The researchers can access theses datasets using Amazon Web Services (AWS) cloud. The repository of TCGA, Cancer Genomics Hub (CGHub), consists of more than 2.5 petabytes of data and for cancer genomes; it is the largest database in the world. COSMIC (Catalogue of Somatic Mutations in Cancer) is the biggest resource hosting data on somatic mutations and its impact on cancer in human. It covers around 2002, 811 point mutations in coding region of more than one million tumour samples that are present in most of the human genes. Big data analytics is already applied in some cancer types. For breast cancer, standard practices such as the OncoType DX and MammaPrint are carried out by generating cancer signatures based on big data-driven cancer genomics. CancerLinQ is a big-data project established by ASCO (American Society of Clinical Oncology) in the year 2010 hosts information about treatment and outcomes that is shared by physicians and patients. In this database, patient's identity is protected. The information submitted to the database is contributed by doctors and their patients, and both can get access to the same. The database is subsumed of more than 170,000 medical records of breast cancer patients. If a cancer patient with a particular mutation resisting the targeted therapy, by referring shared information in the database, doctors will not prescribe the same drug to a patient who is detected with a same mutation. Thus, this project is very useful in diagnosing and treating the patients (Barbosa 2016).
Prediction of slower-moving cancers or non-cancerous lesions and aggressive triple-negative breast cancers was performed using big data analytic techniques (Makler and Narayanan 2016;Coates et al. 2016;Swift and Stojdl 2016;Yang et al. 2015;Kim 2015). For instance, next-generation sequencing data of breast cancer patients can be retrieved from sequence archives, and for identifying the breast cancer mutations from that retrieved data, computational pipeline can be implemented using Hadoop platform that implies MapReduce algorithm. Mapping part of algorithm will perform the most tedious step of computational analysis that is alignment using BWA algorithm of BWA tool on cluster of computational nodes, and reduce part of algorithm will combine the alignments from all nodes. After implementing MapReduce algorithm, mutations are called using VarScan tool, and final resultant VCF (Variant Call Format) file of identified mutations is generated, as depicted in Fig. 5 As not all the identified mutations are clinically significant, filtering and prioritization of these identified mutations are the next steps to be carried out in analysis of NGS data. The clinical significance of identified mutations can be checked using ClinVar database that predicts whether the mutation is pathogenic, benign, uncertain significance, risk factor or affecting drug response, etc. The identified clinically significant mutations will ultimately help in NGS-based testing. Thus, by extracting knowledge in the form of clinically significant data from big data of NGS with the application of analytic techniques, big data analytics will become a main component of NGS-based clinical diagnostics.

Implementation Details
Hadoop has three operation modes: 1. Standalone mode 2. Pseudo-distributed mode or single cluster 3. Fully distributed or multi-node cluster mode Following implementation is done on single node cluster in which master and slave run as separate processes in one node. Before executing any program on Hadoop, first check successful installation of Hadoop, and then start all daemons for execution of task.
Verify successful installation of Hadoop by using the following command (

Execution of Program: 1
The program demonstrates reading of sequence file on Hadoop and counting each sequence as one part. Input sequence is .Nucleotide Human BRCA2 gene found in breast cancer. MapReduce model splits sequence file into parts, and then each part is handled by separate map task logic. Figure 5.7 shows input file, and Fig. 5.8 shows output given by Hadoop splitting sequence into lines.  The MapReduce logic is designed using Java programming language mapper and reducer classes. To execute java code on Hadoop, java source code is compiled, and then jar file is created to execute it on Hadoop as shown in Figs. 5.10, 5.11, 5.12 and 5.13.

Conclusion
The effectiveness of cancer treatment and its long-term outcomes can be achieved using big data analytics that encompasses clinical trials to practices and real-world patients. Hadoop framework can be used as a powerful data analytics technology to get most optimal data analysis results on genomic datasets. Scalability of Hadoop provides solution to growing genomics data. Its MapReduce programming model achieves and optimizes time complexity for data analysis. Thus, in the coming decades, big data analytics will prove beneficial to oncologists, doctors and cancer patients in clinical diagnostics.

Medicinal Properties of Fruit and Vegetable Peels
Pranav Pathak

Abstract
In these days, more focus is made in improving the immune system and curing diseases by using food and food-related products. In these contexts, fruits are being extensively used to treat and prevent diseases. After utilization of fruits and vegetable, huge amount of waste is generated during pre-and post-harvesting process. This produced waste is generally discarded as waste in the dump yards which is hazardous to the environment. However, recent studies have confirmed that fruit and vegetable peel (FVP) waste can be a valuable source of bioactive compounds, due to the presence of steroids, phenolics, tannins, flavonoids, triterpenoids, glycosides, carotenoids, ellagitannins, anthocyanins, vitamin C and essential oil. These compounds can add value to the FVP if extracted efficiently. Several economically valuable products having superior medicinal, nutritional and antioxidant properties can be obtained from FVPs by various processes like drying, size reduction, fermentation, solvent extraction and many more. The bioactive compounds in the FVP show its various uses in the treatment of wounds, acne, diarrhoea, gastroenteritis and rotavirus enteritis, allergies, malaria, coughs, degenerative muscular diseases, bacterial/fungal infections, cancer, cardiovascular disorder, diabetes, liver diseases, dental plaque, inflammatory ailments including rheumatism, menstrual pain, etc. This chapter reviews the antioxidant, antiatherogenic, antimicrobial, antiallergenic, anti-inflammatory, antithrombotic, cardioprotective and vasodilatory properties of some commonly used FVP. This will help to obtain the maximum health benefits and maximize the industrial profits.

Keywords
Fruit peel waste · Bioactive compounds · Antioxidants · Medicinal properties · Health benefits

Introduction
Every fruit and vegetable contains about 5-50% of peel (FVP) which is assumed as a leftover after utilization of fruit pulp. These peels are generally dumped as waste or burned in the air giving rise to new problem of environmental pollution. In past few years, many researchers have found numerous applications of FVP because of their physico-chemical characteristics. FVPs are mainly composed of cellulose, hemicellulose and lignin as major constituents and may also contain other functional groups of lignin, which include aldehydes, ketones, alcohols, carboxyl, hydroxide, phenols and ethers which can be easily converted into value-added products which can be used for health benefits. Also, FVP are economic and eco-friendly because of their unique chemical composition, and their availability in abundance and low cost make them a viable option for its valorization (Pathak et al. 2015(Pathak et al. , 2016aBhatnagar et al. 2015).
The problems related to nutrition and health are common. The main issue is related to finding the cheap and easy available source for the bioactive compounds and phytochemicals. Fruit peels can be the best alternative solution for the same. As discussed, FVP consists of various bioactive compounds which can be good replacement for synthetic substances which are commonly used in the food, cosmetic and pharmaceutical industry. The application of FVP as phytochemical source concerns about the safety of the human as utilization of synthetic molecules is supposed to cause or promote negative health effects (Chatterjee 2014). In recent years, enhanced attention is made on utilizing food products for curing diseases and improving immune system. Vegetables and fruits are being progressively incorporated to prevent and treat diseases in diet. The properties of fruit and there of peels are dependes and changes by some factors (Mphahlele et al. 2014;Pathak et al. 2015).
Recent studies prove that FVPs are the major source of bioactive compounds and one can extract maximum health benefits from them. As shown in Fig. 6.1, the FVPs prove themselves in the treatment of wounds, acne, diarrhoea, gastroenteritis and rotavirus enteritis, allergies, malaria, coughs, degenerative muscular diseases, bacterial/fungal infections, cancer, cardiovascular disorder, diabetes, liver diseases, dental plaque, inflammatory ailments including rheumatism, menstrual pain, etc. In this context, this chapter reviews the antioxidant, antiatherogenic, antimicrobial, antiallergenic, anti-inflammatory, antithrombotic, cardioprotective and vasodilatory properties in brief of some commonly seen FVP (banana, guava, apple, mango, pineapple, orange, papaya, potato and tomato). This will help to obtain the maximum health benefits and maximize the industrial profits.

Medicinal Properties of FVP
FVP can be a valuable source of bioactive compounds, due to the presence of steroids, phenolics, tannins, flavonoids, triterpenoids, glycosides, carotenoids, ellagitannins, anthocyanins, vitamin C and essential oil. The bioactive compounds in FVP show its various uses in the treatment of various health problems as discussed in the above section. In this regard, the following section explains the medicinal properties of FVPs which are cultivated and consumed more throughout worldwide.
Due to the presence of high gallocatechin in BP, it is a good functional food source against heart disease and cancer (Someya et al. 2002). Due to the presence of phenolic compounds including dietary fibre and flavonoids, a jelly prepared from BP has good antioxidant activity. Because of its nutritive and healthy properties, jelly is more advantageous than tablets and pills .
In the treatment of atherosclerotic endothelium injury, novel cell adhesion inhibitor, 7,8-dihydroxy-3-methyl-isochromanone-4, extracted from BP can possibly be used (Fu et al. 2012). Arabinoxylans present in BP have the possibility to be used as a health-beneficial nutritional supplement . Also, the extracts from yellow BP show good antiallergic and antibacterial effects in the treatment of infections caused by both Gram-negative and Gram-positive bacteria and may replace synthetic medicines (Chabuck et al. 2013;Tewtrakul et al. 2008). Phenols, peel oil, lipids and tannin extracted from BP exhibit good antimicrobial activity against Klebsiella pneumoniae and Proteus vulgaris, and thus, these are used in the infection treatment (Fapohunda et al. 2012). In male rats, significant wound-healing activity was found from a gel prepared from unripe BP and the alcoholic extracts from bark (Atzingen et al. 2011;Rosida et al. 2014). Also, these extracts affect the thyroid hormones, tissue lipid peroxidation, insulin and glucose concentrations (Parmar and Kar 2008). In addition, BP extracts considerably suppress the regrowth of seminal vesicles and ventral prostates in castrated rats which are recognized for increased testosterone activity. Therefore, it can be successfully used in the benign prostate hyperplasia treatment (Akamine et al. 2009). The green biopolymer/HAP nanocomposite prepared from BP can be utilized for natural bone replacement (Kanimozhi et al. 2014). Thus BP can be used for the treatment of various diseases due to the presence mainly of phenolic compounds. But one should be careful about the presence of pesticide residue in BP extracts, which may exist in some cases, and this is a major issue for commercial cultivations (Pathak et al. 2016b;Aurore et al. 2009;Rodrıguez-Ambriz et al. 2008).

Guava Peel
Guava (Psidium guajava) is cultivated in various tropical and subtropical countries due to its capability to bear fruits throughout the year. The fruit is very healthy and has therapeutic value. Most folks eat the fruit afresh. Commercially, it is mostly used in the production of juice, beverages, jams, canned slices, jelly, etc., which generates huge amounts of guava wastes in the form of peels, bark, seeds, eaves and pomace. Guava peel (GP) contains minerals such as Ca (17.31 ppm), Mg (206.65 ppm), Na (2.04 ppm), ascorbic acid (Packer et al. 2015) and phenolic compounds (596.67 mg/L) (Rejal 2010).
The antioxidant activities of GP is calculated using FRAP assay and found to be 10.24 ± 0.24 mmol/100 g wet weight (Guo et al. 2003). The methanol (60%) extracts (at 55 °C and 120 min) of GP show the highest antioxidant activity (1021.00 μmol/L) (Rejal 2010). The GP aqueous extracts exhibit antidiabetic and hypoglycaemic effect on blood glucose level in healthy rats (Rai et al. 2009), and it has the potential to reduce the oxidative stress in the pancreas of diabetic rats (Budin et al. 2013). The presence of ferulic acid and gallic acid in the GP aqueous extracts shows antimicrobial activity against S. aureus, P. aeruginosa, E. coli and L. monocytogenes (Abdelmalek et al. 2016).
Thus, the GP has potential of antidiabetic, hypoglycaemic and antimicrobial activity. Also due to presence of minerals and phenolic compounds, the consumption of guava fruit along with GP makes it more health beneficial.

Apple Peel
Apples are one of the best popular and usually consumed fruits all over the world. Approximately 5-15% from the total weight of the fruit consists of apple peel (AP). AP has six time more polyphenolic content than the flesh (Balasuriya and Rupasinghe 2012; Massias et al. 2015). AP polyphenols have beneficial effects on oxidative and inflammation stress (Massias et al. 2015). In AP, triterpenes and flavonoids are two important groups of bioactive compounds which have potential as a dietary supplement to reduce blood cholesterol (Thilakarathnaa et al. 2012). Also the presence anthocyanins and triterpenoids enhances lifetime (Palermo et al. 2012). AP provides more nutritive and medicinal benefits due to the presence of high mineral contents such as Ca, Mg, Na, K, Fe and Zn along with dilatory fibre and bioactive compounds (Manzoor et al. 2012;Leontowicz et al. 2007).
The presence of anthocyanin, triterpenoids, flavonoids and phenolic compounds in high amount makes AP to have high phytochemicals and to be used in several foodstuffs for health benefits. Moreover, these phytochemicals show strong antiproliferative effect against Caco-2 colon cancer cells, MCF-7 breast cancer cells and human HepG2 liver cancer cells (Wolfe and Liu 2003;Vieiraa et al. 2011;He and Liu 2007). Flavonol molecules (quercetin glycosides and quercetin) available in the AP are physiologically important and have various health benefits (Rupasinghe et al. 2011). In cosmetic and food industries, AP may be a low-cost raw material to reduce glycation stress (Parengkuan et al. 2013). In the preparation of functional foods and beverages, AP powder can be used. The powder from AP has several bioactive phytochemicals and nutrients with putatively health-beneficial effects (Henríquez et al. 2013). Also the edible films made from AP polyphenols can be used to protect food from pathogenic bacterial ).

Mango Peel
Mango (Mangifera indica L.) is native to India and is widely cultivated within the globe due to its sweetened taste, flavour, aroma and high nutritive content. Within the weight of total mango fruit, about 35-55% is the mango peel (MP). MP is a good source of phytochemicals like pectin, hemicelluloses, cellulose, carotenoids, proteins, lipids, polyphenols and vitamins C and E. Some micronutrients in the MP (cellulose, reducing and non-reducing sugars and proteins) depend on its variety. Also the polyphenolic contents in the MP are higher than the pulp (Imran et al. 2013;Ajila et al. 2007). MP contains phenolic compounds like syringic acid, ellagic acid, quercetin and mangiferin pentoside (Ajila et al. 2010).
MP fibre has high hydration capacities which makes it useful in the making of dietary fibre-rich foods. MP dilatory fibre has high antioxidant capacity than that of DL-α-tocopherol and French paradox (Koubala et al. 2012;Larrauri et al. 1997).
MP has substantial amount of heat-stable and pharmacologically active phytochemical, mangiferin (C-glucosylxanthone). Its presence in MP own some bioactivities, like antioxidant anti-inflammation, anti-diabetic and anti-tumour immunomodulatory (Luo et al. 2012). The MP exhibits antioxidant properties (Berardini et al. 2005) and antibacterial and anti-inflammatory activity (Zgórka and Kawka 2001). β-Carotenoids available in MP are shown to have high vitamin A activity and antioxidative capacity (Mercadante and Rodriguez-Amaya 1998). The ethanol crude extracts and ethyl acetate fraction of MP show good antifungal properties against pathogenic fungus (Rhizoctonia cerealis van der Hoeven and Rhizoctonia solani Kühn) (Qin et al. 2007). The solubility and water and oil absorption values of powder of MP have substantial role for their application in food-based products (Sogi et al. 2013). Polyphenols, carotenoids and vitamins (C and E) in MP show large health-promoting activity. Due to which, there is a huge potential for the development of MP-based functional foods is present. MP flour can be added as a healthy ingredient in healthy food products like bread, biscuits, sponge cakes, noodles and other bakery formulations (Aziz et al. 2012).
Thus, MP exhibits good antioxidant, anti-inflammatory, antidiabetic and antitumour immunomodulatory, antibacterial and antifungal properties due to mainly the presence of polyphenols and carotenoids.

Pineapple Peel
Pineapples (Ananas comosus) are harvested throughout the year. From the total weight of the fruit, only 52% is used for consumption, leaving pineapple peel (PAP; 35% of dry weight) and the leaves (13% of dry weight) and being rejected as waste; but, these PAP and leaves are a rich source of valuable bioactive compounds (Krishni et al. 2014;Bardiya et al. 1996;Foo et al. 2011).
The most important and valuable compound is bromelain from PAP. It has antithrombotic, fibrinolytic antiedematous, anti-inflammatory and anticancer properties (Ketnawa et al. 2011;Chobotava et al. 2009;Bhui et al. 2009). Also, in food industries, it is used as a meat tenderizer and dietary supplement (Maurer 2001). Due to the presence of phenolic antioxidants (2.01 mmol FRAP/100 g wet weight), PAP also exhibits antimicrobial and antioxidant activities (Guo et al. 2003).

Orange Peel
Orange (Citrus sinensis) is cultivated worldwide and includes a wide range of varieties. Orange contains high amount of peels (OPs), i.e. about 40-50%, which are discarded as a waste (Knappa and Nicholasa 1969). OP is mainly composed of cellulose, hemicellulose, chlorophyll, pectin, lignin, pigments and other low-molecularweight hydrocarbons (Bhatnagar et al. 2015). Traditionally, wastes obtained from orange are used to improve lactation and microbial growth in ruminant, thus making them a good source of food for the ruminants aimed at their high yield of milk and weight gain (Bampidis and Robinson 2006). It also has essential oil which is mostly used as flavouring agents in the food industry. In the oil obtained from OP, d-Limonene (about 90%) is the primary biochemical. d-Limonene is used in the manufacture of food and medicines as a flavouring agent (Braddock et al. 1986;Hull et al. 1953).
In addition, OP exhibits anti-carcinogenic germicidal and antioxidant properties, which are utilized for treating stomach upset, skin inflammation, colon and breast cancers, ringworm infections and muscle pain  Hameed 2012).

Papaya Peel
Papaya (Carica papaya L.) is the fruit cultivated throughout the year in tropical and subtropical countries. The papaya fruit contains about 12% peel (PaP) and 8.5% seeds by weight. Conventionally, PaP are used in cosmetics, animal feeds and many home remedies (Pathak et al. 2019). The vitamin contents of the peel vary with the maturity of fruit. Vitamin A content in the peel increases with the maturity level. Similarly the vitamin C content is considerably high in very ripe and hardripe fruit (Chukwuka et al. 2013). Generally, fat, carbohydrate and protein contents of PaP reduce upon ripening (Kumara and Wijetunga 2010). PaP is a source of valuable mineral nutrients. PaP is source of trace minerals like Fe, La, Na, Rb, Sc, Br, Zn, Cr and Cs which are important for the human body. Sometimes, the concentration of some trace materials, especially Br, may be increased because of the pesticide use (de Matuoka e Chiocchetti et al. 2013). From the studies, unripe PaP has high nutritive value, due to which it is recommended for consumption (Chukwuka et al. 2013).
The presence of bioactive molecules like vitamins, minerals, dietary fibre and phenolic compounds makes it health beneficial against pathological and physiological defects like inflammation, cancer, aging and cardiovascular diseases and shows antithrombotic, antioxidant and anti-inflammatory activities (Contreras-Calderón et al. 2011;Parni and Verma 2014;Morais et al. 2015). PaP contains flavonoids and polyphenols which makes them effective against different oxidative stress insults. Also, they exhibited anti-carcinogenic and anti-inflammatory activity for AOMinduced cytotoxicity in rat colon (Waly et al. 2014). PaP shows antibacterial activity against Gram-positive and Gram-negative microorganisms (Asghar et al. 2016;Orhue and Momoh 2013;Prakash et al. 2013;Khan et al. 2012;Rakholiya et al. 2014;Roy and Lingampeta 2014).
Traditionally, PaP is used in cosmetics and in many homemade therapies. Vitamin A present helps in restoring and rebuilding skin damage and can be used as agent for skin lightening. Mixture of honey and PaP acts as a skin moisturizer. Vinegar, PaP oil and essential oils (orange, rosemary and lavender) in bath water can be relaxing, nourishing and refreshing and can act as a muscle relaxant and pain reliever (Yogiraj et al. 2014;Aravind et al. 2013).

Potato Peel
Potato (Solanum tuberosum cv. Toyoshiro) is one of the most commonly grown vegetable all over the world and contains about 6-10 of peel (POP) from total potato weight. POP possess antioxidative, apoptotic, chemopreventive, anti-inflammatory and antibacterial properties due to presence of bioactive compounds like polyphenols, phenolic acids, lipids, pigments, lignin, dietary fibres, fatty acids, minerals, vitamins, etc. (Wu 2016;Amado et al. 2014;Liang and McDonald 2014;Jeddou et al. 2016;Sánchez Maldonado et al. 2014;Onyeneho and Hettiarachchy 1993). The lipid fraction present in the POP contains sterols, alcohols, triglycerides, sterol esters, phenolics and long-chain fatty acids (Liang and McDonald 2014). In addition, POP extracts have different phenolic acids like caffeic acid, chlorogenic acid, gallic acid, protocatechuic acid, p-hydroxybenzoic acid, p-coumaric acid and vanillic acid (Onyeneho and Hettiarachchy 1993). Due to the presence of these valuable compounds, POP can be used in various medical applications.
Due to the presence of starch, POP can be used as a burn-healing agent. In this, the POP bandages can be prepared and used for treatments of burns by applying the inner surface of the POP on wound site. The use of POP bandages has several applications like rapid epithelial regeneration, quicker return to skin texture and colour and comfortless and less painful bandage removal, and also the peels do not shrink during the application period, providing extra advantages (Keswani and Patil 1985).
POP show defence against attacks of insect and possess anti-inflammatory, antifungal and antibacterial activities due to availability of steroidal alkaloids (Hossain et al. 2014). Also POP has antioxidant properties. The extracts of POP can be successfully used in the treatment of CCl 4 -induced liver injury and also shows protective effect towards diabetes and oxidative stress in rat (Singh et al. , 2005. Thus POP can be used as important part in functional food to extract its maximum benefits.

Tomato Peel
Tomato (Lycopersicon esculentum) is the second largest warm season fruit vegetable cultivated all over the globe (Roja et al. 2017;Savatović et al. 2012). Tomato consists of about 27% peel (TP). The TP are rich in lycopene from which the waterinsoluble fraction is about 72-92% (Kaur et al. 2008). Also flavonol glycosides like kaempferol and quercetin are rich in TP (Savatović et al. 2012;Chérif et al. 2010;Noura et al. 2018). These available compounds in the TP make it essential to incorporate it into daily food diet.

Limitations of Utilization
These compounds can add value to the FVP if extracted efficiently. Several economically valuable products having superior medicinal, nutritional and antioxidant properties can be obtained from FVPs by various processes like drying, size reduction, fermentation, solvent extraction and many more. The extraction of medicinally important compounds mainly depends on the correct selection of process and the efficiency of the selected process. If both are accurate, the maximum benefits can be obtained. In addition, chemical constituents of FVP depend on different parameters, like change in season, location, application of fertilizers, availability of irrigations, varieties/cultivators, stage of maturation, pre-and/or post-harvesting conditions, storage and transportation.

Conclusion
In these days, more research is focused on the use of natural products as primary health medicines due to their pharmacological properties. In this regard, a wide range of fruit and vegetable sources are studied for their beneficial health benefits. These sources have been marked as the cheapest sources for bioactive sources, and these are more popular due to lower costs for the medications compared to the orthodox medicines. The FVP has antioxidant, antiatherogenic, antimicrobial, antiallergenic, antiinflammatory, antithrombotic, cardioprotective and vasodilatory properties. Thus, it can be useful in the treatment of wounds, acne, diarrhoea, gastroenteritis and rotavirus enteritis, allergies, malaria, coughs, degenerative muscular diseases, bacterial/ fungal infections, cancer, cardiovascular disorder, diabetes, liver diseases, dental plaque, inflammatory ailments including rheumatism, menstrual pain and many more. This will help to obtain the maximum health benefits and maximize the industrial profits. Thus, instead of throwing FVP in the garbage bins, the extraction of phytochemicals from them makes them a good alternative source for synthetic drugs. This will definitely be beneficial in human health. Introduction Nanotechnology is an interdisciplinary science where it involves the knowledge from physics, chemistry, biology, medicine and cosmetics. These nano-sized particles have different physiochemical properties which makes them different from their bulk molecules (Lindsay 2009). These properties make them more useful in a wide variety of applications. These nanomaterials are synthesized using various methods such as physico-chemical which involves pyrolysis, vapour condensation, sol-gel technique and chemical reduction. Biological synthesis of NP involves the use of plant extract, fungi, bacteria and algae. The biological synthesis of NP is believed to be more eco-friendly as it involves very less use of chemicals and is also cost-effective than other methods (Mohite et al. 2015). Nano systems or NPs are of different types such as carbon nanotubes, liposomes, dendrimer, metallic NP, nanocrystal quantum dots, polymeric micelles and polymeric NP (Lindsay 2009). These different types of NP have a wide variety of applications in medicine such as therapeutic, drug delivery, detection of different diseases, biosensors, cosmetic industry, etc. Various studies have been carried out based on their proposed applications in medicine/drug delivery (Patil et al. 2019). Apart from their considerable potential benefits, available literature suggests NP mediated potential risk of epigenetic toxicity, cell toxicity, genotoxicity and immunotoxicity (Fig. 7.1).
The environmental changes mediate the changes at molecular level because of which the identical twins display notable phenotypic changes. Epigenetics means "above genetics", which means change in the phenotype without changing the genotype, and these changes are heritable (Collins et al. 2003). The term "epigenetics" was coined by Conrad H. Waddington in the 1940s. Epigenetics provides an explanation to how a single genotype can result into multiple phenotypes (Ho and Tollefsbol 2014). These modifications are highly influenced by lifestyle and environmental factors such as pollutants, food, drugs, etc. Various factors are crucial during the early developmental stages which modify the risk of disease developing condition in an individual (Horowitz 2015). These epigenetic modifications involve DNA methylation, histone modifications and non-coding RNA-mediated modifications.
DNA methyltransferases (DNMT) catalyse the transfer of methyl group from S-adenosyl methionine (SAM) to the fifth carbon of cytosine. Other modified bases are also present in DNA which are N6-methyladenine (m6A), N4-methylcytosine (m4C) (Hattman 2005;Ratel et al. 2006) and 7-methyl guanine (Achwal et al. 1983). These modified bases which involve the methyl group can also be restored back using different groups of enzymes which are known as ten-eleven translocation (TET) enzymes which occur through the process of demethylation (Oswald et al. 2000;Zhang et al. 2007). Since DNA methylation and histone modification play an important role in the cell cycle and tumour growth, consequences of epigenetics changes with respect to nanotoxicity need to be explored in more details.
Thus, to understand the molecular mechanism of nanotoxicity, it is obligatory to examine finer cellular changes at the level of the epigenome which leads to biological consequences. This analysis may provide an additional filter to complement common toxicological assay in defining NP-mediated effects. Here in this review, the epigenetic studies which have been carried out upon NP exposure to different cells are summarized. Accumulating DNA damages, mutation and DNA abrasions eventually increase the risk of cancer. NP-induced cytotoxicity, DNA damage, protein expression modulation and oxidative stress leading to cell death have been previously reported at the genomic level but have not yet been extensively probed with epigenetic modulations.

NP-Mediated Epigenetic Interactions
Inhalation of NP particles is proven to cause pulmonary toxicity lung inflammation, fibrosis and lung tumours in several laboratory rodent species (Bermudez et al. 2004;Samberg et al. 2010). In a press release on April 17, 2007, the American Association for Cancer Research (AACR) reported research presented at the 2007 annual meeting that suggests NP could cause cancer (thyroid, breast, cervical, prostate, stomach, lung, bladder, oesophagus, colorectal and liver) and should be thoroughly investigated and used with caution. The alteration in epigenetic regulator due to disruption epigenome can be strongly related to initiation and progression of some cancers. The epigenetic profile can be propagated through several cell generations; thus epigenetic modulation may continue even after the NPs do not come in contact with the next-generation cells, and if ignored, it could lead to adverse effects. More recently, evidence related to NP-mediated epigenetic variation are building up ( Fig. 7.2). Some of such epigenetic changes have been described in the following sections.

AuNP
The gold NPs are the most popular NP employed in numerous application such as sensors, imaging, targeted drug delivery, diagnostics, etc. But in the perspective of toxicity, gold NPs are known to mediate alterations in epigenetic signatures as well as miRNA causing modulation in gene expression. These evidences are collated using in vitro nanotoxicity studies. Human foetal fibroblast cells (MRC5) resulted in the alteration in the expression of microRNA-155, global chromatin condensation and reorganization (Ng et al. 2011). HeLa cells undergoing radiation therapy using 50 nm AuNP showed amplified accumulation of DNA abrasions due to γ-H2AX increased expression (Berbeco et al. 2012). HeLa cells exposed to AuNP along with 2-mercapto-1-methylimidazole leads to global dimethylation of histone H3 proteins (Polverino et al. 2014). The hepatocarcinoma (HepG2) treated with quercetin-coated AuNP leads to cytotoxicity along with repression of histone deacetylases (Bishayee et al. 2015). Similarly, human embryonic stem cells (hESC) exposed to folate-coated AuNP resulted in hypomethylation with increase in global DNA hydroxymethylation (Senut et al. 2016

AgNP
In case of silver NP (AgNP), toxicity is induced by oxidative stress and is in good correlation with cytotoxicity and genotoxicity, increase in ROS, DNA damage, apoptosis and necrosis (Arora et al. 2008;Foldbjerg et al. 2009;). The epigenetic toxicity caused due to AgNP such as modification of miRNA expression in human Jurkat T cells (Eom et al. 2014), global methylation of histone protein H3 in mouse erythroleukemia (MEL) cells (Qian et al. 2015), increased expression of DNMT and DNA hypermethylation in mouse hippocampal neuronal cells (Mytych et al. 2017) is well documented. NIH3T3 cells exposed to AgNPs showed the DNA hypermethylation through p53 and p21 pathway. It has also shown the alteration in cellular response and changes in bulk histone gene expression (Gurunathan et al. 2018). Whereas a study on triple-negative breast cancer cells has shown the charge-dependant toxicity and cell death in these cells irrespective of the size, the toxicity was associated with an alteration in oxidative stress, Wnt signalling and histone H3 phosphorylation at Serine 10 and Lysine (9/14) residues (Surapaneni et al. 2018).

TiO 2 NP
The study conducted by Xiangliang Yang (2009) on mice showed that titanium dioxide (TiO 2 ) NP triggered single-and double-stranded DNA breaks, causing chromosomal damage as well as inflammation. In clinical nanomedicine, nanoparticles serve as "intelligent" vehicles for drug delivery or as local heaters in cancer therapy. Human lung adenocarcinoma epithelial cells (A549) exposed to TiO 2 NPs showed an increase in the expression of γ-H2AX protein known as a potential DNA damage marker (Toyooka et al. 2012). Another report on TiO 2 -treated A549 cells display hypermethylation of PARP-I promoter (Bai et al. 2015). TiO 2 -treated human lung fibroblast (MRC5) cells illustrate global DNA hypomethylation and changes in DNMT expression levels (Patil et al. 2016).

SiO 2 NP
The nano-silicon dioxide (nano-SiO 2 ) has been widely used in a number of fields including plastic, rubber, ceramics, coatings and adhesives. Previous studies have demonstrated SiO 2 NP can induce pulmonary inflammation, myocardial ischemic damage, atrioventricular blockage and increase in fibrinogen concentration and blood viscosity. Recently, Gong et al. (2010) found that nano-SiO 2 could induce cytotoxicity and protein alterations in HaCaT cells as well as induce epigenetic changes. Upon exposure to SiO 2 NP, human epidermal keratinocyte cell line (HaCaT) experienced cytotoxicity and epigenetic changes causing hypermethylation of PARP-1 promoter eventually decreasing its expression (Gong et al. 2010).

Carbon NP and Derivatives
Researchers at the University of Massachusetts, Pacheco et al. (2007) reported dose-dependent and time-dependent increases in DNA damage in breast cancer cells exposed to aqueous C60 fullerenes. A differential response is given by A549 exposed to carbon NP by hypermethylation of the global genomic DNA (Li et al. 2016

Quantum Dots
Likewise Angela and others reported quantum dot (QD)-induced genotoxic and epigenomic changes leading to cell death. In human breast adenocarcinoma cells (MCF-7), quantum dots (QD) exert histone protein hypoacetylation (Choi et al. 2008). Global changes in miRNA expression have been reported after exposure of NIH/3T3 cells with quantum dots (Li et al. 2011).

Cadmium (Cd) NP
Heavy metal NP such as cadmium (Cd) NP is known to cause epigenetic toxicity and is a potential carcinogen. Exposure of TRL1215 cells (rat liver cells) to Cd NP resulted into DNA hypomethylation and inhibition of DNA methyltransferase and was also observed to cause changes in the rate of cell division and normal cell morphology (Takiguchi et al. 2003).

Copper Oxide NP
N2A (mouse neuroblastoma) cells treated with CuO NP (30-40 nm in size) displayed cytotoxic and genotoxic effect without affecting the global DNA methylation levels (Perreault et al. 2012). Rats exposed to copper oxide NP showed pro-inflammatory response which was seen through the differential expression of genes. Though there was no change in methylation of inflammation-responsive genes, hypermethylation was observed only in Fas-associated death domain gene (Costa et al. 2018).

Conclusion
This review article wants to imply that the NP mediates toxicity by effecting gene regulation through modification of microRNAs or epigenetic signatures. The literature survey of nanotoxicity studies shows that epigenetic variation, DNA methylation and histone modulators can serve as early biomarker of nanotoxicity which is summarized in Table 7.1. NPs are widely used in various products including cosmetics, hair tonics, conductive ink and lubricant oil (Sanderson 2007). This increased use of NPs in industrial products results in frequent exposure through ingestion, inhalation and dermal contact. In this scenario, it is important to analyse the impact of these NPs upon exposure into the human body. The summarized effects of various NPs on different cell lines (Table 7.1) strongly indicate that, although NPs have various applications including drug delivery, it modulates the epigenetic profiles.

Abstract
Protein misfolding has interestingly been referred to as the 'dark side' of the protein world. The cytotoxicity of misfolded and unfolded polypeptides is due to an overwhelmed quality control system, mainly comprising molecular chaperones to assist in folding, the unfolded protein response (UPR) in the endoplasmic reticulum and the heat shock response (HSR) in the cytosol, which are aimed at clearing misfolded proteins and their early aggregates. When misfolded/unfolded polypeptides exceed the quality control measures of the chaperone-ubiquitinproteasome clearing system, they form toxic pre-fibrillar aggregates which interact with the cell membrane, disrupting redox potential due to aggregate organization into non-specific membrane pores. In most cases, increases in intracellular free Ca 2+ and consequent disruption of the redox potential are among the earliest biochemical alterations in exposed cells. An improved understanding of the mechanisms of protein misfolding and intermediate structures that lead from monomers to oligomers ready to aggregate could provide crucial impetus to therapeutic interventions such as upregulating molecular chaperone machinery, use of antibodies and high throughput screening of promising candidate molecules.

Protein Folding Problem in Aggregation
Protein folding chiefly involved the interaction of a relatively small number of residues to form a folding nucleus, about which the remainder of the structure rapidly assembles (Kmiecik and Kolinski 2007). The resulting conformations, attained through stochastic search of the many conformations accessible to a polypeptide chain, usually exhibit highest possible thermodynamic stability under physiological conditions. The rudimentary native-like topology is also a result of the distribution of hydrophobic and polar interactions between key residues. While the mechanism for encoding of these characteristics by the sequence is unclear, it is accepted that it favours preferential interactions of specific residues as the structure becomes increasingly compact and the final topology is said to be achieved during the final stages of folding. In the absence of these pivotal interactions, however, the protein cannot fold to a stable globular structure. This process can therefore also be considered as a method of structural quality control. The secondary structures in turn, stabilized chiefly by hydrogen bonds between the amide and carbonyl groups of the main chain, are an important step in the later stages of the protein folding process (Eberhardt and Raines 1994). The time taken to complete the folding process is directly proportional to the complexity of the molecule and the average separation in the sequence between residues that are in contact with each other in the native structure (contact order) (Stefani 2008). Generation of one or more intermediates was observed in proteins with greater than 100 residues, suggesting that larger proteins generally fold in independent modules/domains establishing native-like folds within local regions, with optimum interactions to establish the overall structure. The final native structure is said to be established once all the native-like interactions have been formed both within and in between domains: ensured by the unique locking in of all side chains in a closely packed arrangement. This step is marked by the exudation of water from the protein core (which mainly comprises hydrophobic residues) (Dill et al. 2008).
Understanding the molecular bases of misfolding may not only help to elucidate the physicochemical features of protein folding but is also a fundamental prerequisite for understanding and controlling disorders that are linked to protein aggregation ( Fig. 8.1), such as Alzheimer's and Parkinson's diseases, type 2 diabetes, cystic fibrosis and some forms of emphysema among other disorders, where the presence of proteinaceous deposits (amyloidoses) is believed to result in clinical symptoms.

Misfolding Leads to Aggregation
In 1998, it was first shown that all proteins exhibit a propensity for aggregation when partially unfolded, demonstrating that protein aggregation was not a unique property of the amino acid sequences and that even proteins were found normally folded under destabilizing physiological conditions (Stefani 2004), such as acidic pH values, high temperature, lack of ligands or moderate concentrations of salts or of co-solvents, where tertiary interactions are destabilized, while secondary interactions remain intact (Martin et al. 2008) and could unfold and aggregate in vitro into assemblies indistinguishable from those formed in vivo. Protein aggregation is primarily said to involve in either unfolded or native states, while inclusion body formation and other aggregates formed during protein folding are assumed to be products of hydrophobic aggregation of the unfolded or denatured states. Additionally, amyloid fibrils and other extracellular aggregates in turn arise from native-like conformations (Fink 1998). The characteristics and properties of the intermediates may be significantly different from those of the native (and unfolded) conformation. In the presence of destabilizing conditions, the equilibrium shifts to favour the population of partly folded molecules. These molecules may undergo refolding by molecular chaperones or during reestablishment of normal physiological conditions or cleared by ubiquitin protease machinery. The nucleation of disordered aggregates results from misfolded molecules overwhelming the restoration machinery. The equilibrium may also shift to the population of ordered aggregates due to mutations resulting in an increase in mean hydrophobicity or reducing the net charge on the misfolded/unfolded molecules. Here, the misfolded protein begins to resemble a molten globule-like structure, secondary interactions are considerably maintained and inward-facing hydrophobic residues become solvent exposed. This reduced physicochemical stability in unfolded monomers leads to the formation of oligomeric assemblies as seen in the path of fibrillization and eventually into stable mature fibrils. The appearance of pre-fibrillar aggregates, however, can be suppressed by molecular chaperones as the molecular chaperons favour the population of correct native forms, mark misfolded proteins for degradation or detach monomers-favouring their clearance-and result in the clearance of amyloid assemblies (Stefani 2004). It follows that proteins have evolved to select against sequences with a high propensity for aggregation (e.g. several hydrophobic residues and a high tendency for β-sheet formation). Factors such as steric hindrances of interactions favouring aggregation and highly polar flanking sequences (resulting in higher solubility limits) usually hinder in amyloid formation.

Cellular Mediators of Appropriate Protein Misfolding: Chaperones
Amino acid sequences such as alternating polar and hydrophobic sequences favouring β-sheet structure have experienced a selective disadvantage during evolution due to their high propensity for aggregation, despite the common mechanisms mediating both aggregation and appropriate protein folding (Alberts et al. 2002). This process, referred to as kinetic partitioning, suggests that mutations could have been selected on the basis of facilitating folding at the expense of aggregation. Kinetic partitioning may have been aided by the presence of molecular chaperones and degradation-clearance mechanisms and the conformation states of different polypeptides under varying physiological conditions and stages. Corresponding to this, most mutations associated with familial forms of deposition diseases exhibit an increased population of partially unfolded states resulting in a high propensity for aggregation, resulting in lowered stability and function of native proteins (Chiti et al. 2002;Sánchez et al. 2011). Most newly synthesized proteins are initially translocated to the endoplasmic reticulum (ER) where, guided by a series of molecular chaperones and folding catalysts, they fold into their designated three-dimensional conformations. Appropriately folded proteins are then translocated to the Golgi complex and delivered into the extracellular environment, while improperly folded proteins are ubiquitinated and degraded in the cytoplasm by proteasomes.

Molecular Chaperones and Other Folding Catalysts
Molecular chaperones increase the efficiency of the overall process by reducing the probability of competing reactions such as aggregation. The evidence of the pivotal role played by molecular chaperones is demonstrated by upregulation during conditions of cellular stress, which serves as destabilization conditions resulting in partially folded proteins. Molecular chaperones not only protect proteins as they fold but also rescue misfolded and even aggregated proteins and enable them to have a second chance to fold correctly ( Fig. 8.1). Additionally, the slow steps in the folding process are accelerated by several classes of folding catalysts (Bukau and Horwich 1998;Hartl 2002). Peptidyl-prolyl isomerases increase the rate of cis-trans isomerization of peptide bonds involving proline residues (Shaw 2002). Protein disulphide isomerases enhance the rate of formation and reorganization of disulphide bonds (Ellgaard and Ruddock 2005). Active intervention by molecular chaperones is an ATP-dependent process and occurs mainly in the ER, where folding takes place before protein release from the Golgi apparatus. Within the ER folding is mediated through a wide range of molecular chaperones and folding catalysts which ensure that all folded proteins satisfy quality checks prior to exportation (Hammond and Helenius 1995;Kaufman et al. 2002). These quality control mechanisms mainly involve a series of glycosylation and de-glycosylation reactions that assist in the differentiation between correctly folded proteins and from misfolded ones (Hammond and Helenius 1995). Conversely, the quality control mechanisms tend to inhibit the overall efficiency of protein folding. For instance, the aggressive clearance mechanisms in the ER clear a significant percentage of proteins prior to the attainment of optimal conformation. Similarly, clearance processes may offset molecular machinery that may in turn favour protein aggregation, with ER membrane carriers performing reverse transport of proteins unable to fold in the ER lumen, the ATP-dependent proteolytic complexes in the mitochondria and the components of the ubiquitin proteasome pathway (Stefani 2004;Braakman and Hebert 2013;Araki and Nagata 2011). Additionally, mutations inactivating any of the components of the quality control and clearance systems, destabilizing environmental conditions such as oxidative stress, heat shock or other chemical modification, may impair the clearance machinery and directly result in a rapidly growing number of misfolded proteins in the cell. There are instances of inhibition of the ubiquitin-protease system by two unrelated aggregation-prone proteins (huntingtin fragment with polyglutamine repeat and a folding mutant of cystic fibrosis transmembrane conductance regulator). This may be another potential mechanism linking protein aggregation to cellular dysregulation and death (Bence 2001). Working in close association with intracellular quality controls, other extracellular controls comprising proteases such as neprilysin and IDE, are present at the cell membrane and in the extracellular spaces. These proteases have been shown to digest Aβ and other aggregate precursors not only in their monomeric form but also as aggregates (Ling et al. 2003;Edland 2004;Kanemitsu et al. 2003). Similarly, Clusterin was shown to affect amyloid formation in vitro (Wilson and Easterbrook-Smith 2000).
The common structural features of protein aggregates (viz. amyloid aggregates) during both early (protofibril) and late (mature fibrils) stages signal towards common early biochemical modifications in cells. These modifications may possibly be a response to the presence of toxic aggregates and eventually may lead to the impairment of quality control and clearance machinery. Several studies have reported early changes in available Ca 2+ and ROS in cells exposed to toxic aggregates/producing aggregating molecules. Additionally, annular-shaped assemblies with a central pore are a characteristic of the heterogeneous population of pre-fibrillar aggregates of several different proteins (LIN 2001;Zhu et al. 2000;Kourie 2001;Butterfield et al. 2001;Milhavet and Lehmann 2002;Hyun et al. 2002). Aggregation is the in many ways the final state of misfolded/unfolded peptides and proteinscharacterized often as intrinsically disordered proteins/peptides (IDPs). IDPs represent categories of states in which side chains and backbone positions deviate significantly from equilibrium position in sharp contrast to polymerization processes that are initiated with structured monomers and driven mainly by nucleotide binding or hydrolysis. The final product of interactions between ensembles of unstructured monomeric states is the fibril-generally believed to be an ordered β-sheet structure (Frieden 2007;Chiti et al. 2003;Rousseau et al. 2006).

Future Directions
Aggregation may be initiated by any factors causing a rise in the concentration of amyloidogenic precursor(s). A shift of the equilibrium favours partially folded molecules or an increase of the expression level of the affected protein and hence its whole equilibrium population comprising partially folded molecules (due to mutations, environmental changes or chemical modifications reducing the conformational stability of the protein). Certain mutations may enhance aggregation simply by kinetically favouring the assembly of the unfolded or partly folded monomers into the early oligomeric pre-fibrillar species (physicochemical features, such as mean hydrophobicity, net charge and propensity to alpha and beta structure formation, affect the tendency of an unfolded or partially folded polypeptide chain to aggregate); this may explain why peptides and natively unfolded proteins such as α-synuclein and tau carrying specific mutations enhancing their mean hydrophobicity or reducing their mean net charge exhibit a higher propensity for aggregation A natively folded protein may also misfold and aggregate, provided it meets a suitable template favouring a specific conformational modification.

Abstract
Enzymes are natural catalysts and protein molecules executing specialized catalysis of substrate to product in chemical reactions. Enzyme technology uses enzyme as a biocatalyst to manufacture new products in bulk in the most dynamic fields such as food, fine chemicals, pharmaceuticals, biofuels, and biopolymers. The most common types of industrial enzymes are proteases, amylases, lipase, cellulases, and xylanases. Enzymes are now increasingly being used in medical application such as therapeutics, drug delivery, diagnostic, new drug development, bioanalysis, and biosensors. Examples of some biomedical enzymes are cytochrome oxidase, creatine kinase, streptokinase, urokinase, trypsin, chymotrypsin, and serratiopeptidase. New advancements in "white biotechnology," mainly in protein engineering, have offered imperative techniques for the effective development of new enzymes using directed evolution. The present paper aims to provide a review on industrial enzymes, stressing on recent advances in enzyme engineering and applications in medical field.

Keywords
Enzyme technology · Industrial enzymes · Biomedical enzymes · Directed evolution · White biotechnology

Introduction
Bioprocess technology serves as a great potential for the increasing production of various products for human needs. Enzyme technology-a field of bioprocess technology-helps to develop new processes to manufacture new products in bulk by utilizing enzyme as a biocatalyst, in order to fulfill the ever increasing demand in sectors such as food, fine chemicals, pharmaceuticals, and now recently therapeutics (Lokko et al. 2018;Bhatia n.d.). The three-dimensional structure of enzymes gives high specificity to the types of substrates and plays an important role in metabolic and biochemical reactions. The various types of substrates categorize enzymes into specific classes. The most common types of industrial enzymes are cellulases, proteases, mannanases, amylases, pectinases, lipase, etc. (Singh et al. 2016;Vittaladevaram 2017). Enzymes are considered as a greener substitute to the usage of chemicals in industry (Ji et al. 2018;Kaur and Sekhon 2012). The "green" status is achieved by the enzyme due to the following properties: 1. Most of the enzymes work under moderate conditions, thus shrinking energy consumption otherwise needed by many chemically catalyzed reactions, and hence declining greenhouse gas emissions. 2. During manufacturing processes of enzymes, by-products generated are not toxic and water consumption and chemical waste production are lesser. 3. Reuse and inactivation of the enzymes are both economically and environmentally feasible. Thus enzymes are bioeconomic, biosafe, and biodegradable gears that have become an integral part of our everyday products.
As enlisted in Table 9.1, enzymes can be obtained from animal, plants as well as microbial sources including fungi, yeast, and bacteria.

Biomedical Application of Enzymes
Biomedical application of enzymes has been on demand since the mid-1950s. Christian de Duve way back in 1960s introduced enzyme as part of replacement therapies for a genetic disorder lysosomal storage diseases (LSDs) (Desnick et al. 2019). After around three decades, the first recombinant enzyme clot-buster drug Activase1 was approved by the Food and Drug Administration (FDA) and marketed. Since then there have been several enzymes introduced into the marketplace as diagnostics enzymes, therapeutics enzymes, supplementary enzyme, manipulative enzymes, analytical enzymes, etc.

Therapeutic Enzymes
Due to recent development in bulk production of pure enzyme and its downstream processing and target specificity, enzymes have found its place in the field of pharmaceuticals and medicine. In contrast to the industrial use of enzymes, therapeutic enzymes which are highly pure, having low K m and high V max , are favored. Enzymes are rapidly gaining importance as therapeutic agents for the treatment of several human diseases (Mane and Tale 2015; Mohanty and Khasa 2019; Kunamneni et al. 2018), and thus an array of enzymes with excellent therapeutic potential are commercially produced. The most recent being the anti-HIV enzyme Tre recombinase which efficiently eradicates the provirus DNA from the host genome (Hauber et al. 2013). A few examples of therapeutic enzymes of bacterial origin are listed in Table 9.2.

Fibrinolytic Enzymes
This class of therapeutic enzymes includes thrombolytic agents. Thrombus or fibrin clot developed in blood vessels obstructs blood flow to tissue or organ leading to myocardial infarction and other serious diseases. Fibrinolytic enzymes can clear thrombosis by actively converting plasminogen to plasmin that degrades fibrin leading to thrombolysis, thus preventing coagulation of blood and/or dissolving existing thrombus. Several enzymes with efficient fibrinolytic activity such as streptokinase, urokinase, alteplase, nattokinase, retavase, and tenecteplase dissolve fibrin directly (Dubey et al. 2011).

Oncolytic Enzymes
Bacterially directed enzyme pro-drug therapy (BDEPT) is one of the most promising approaches for selective and localized tumor destruction (Lehouritis et al. 2013;Harrison Jr. and Krais 2018). Numerous enzymes are used to deliver drugs and inhibit metastasis, angiogenesis, and cell growth (Aguera et al. 2018). Asparaginase, for example, is a promising therapeutic enzyme for the treatment of acute lymphocytic leukemia. Most cancerous cells require exogenous supply of L-asparaginase as these cells are deficient in aspartate-ammonia ligase activity and thus unable to synthesis of amino acid L-asparagine. Intravenous administration of asparaginase does not affect normal cells but restricts the growth of cancer cells by depriving them of nonessential amino acid asparaginase (Fernandes et al. 2016). Similarly, arginine-degrading enzyme is used to inhibit melanoma and hepatocellular carcinomas (Fernandes et al. 2016). Prevention of proliferation, neovascularization, and metastasis in tumor cells is accomplished by localized degradation of chondroitin sulfate proteoglycans using chondroitinase (Denholm et al. 2001). Lipases can activate tumor necrosis factor and are administered in the treatment of malignant tumors. Proteases such as neuraminidase, ribonuclease, etc. make neoplastic cells sensitive to immune response by trimming the sialic acid residues from cell surface (Aguera et al. 2018). Some enzyme inhibitors such as mesupron and methotrexate (Kwaan et al. 2013) are used as oncolytic drugs. Tumor malignancy can be correlated to plasminogen activity of urokinase, making it a potential drug target. Mesupron launched in 2012 is a serine protease inhibitor for urokinase and is used as anticancer agent. Similarly methotrexate inhibits dihydrofolate reductase, an enzyme making nucleotides. Locking nucleotide synthesis is more toxic to rapidly growing tumor cell than nondividing cells; therefore, methotrexate is a drug of choice for chemotherapy.

Wound Healing
Proteolytic enzymes and glycolytic enzymes have anti-inflammatory actions. These enzymes can digest denatured proteins found in necrotic tissues; therefore, they are used for debridement of wounds and in treating burn damaged tissue and incisional, traumatic, and pyrogenic wounds (Fini et al. 1992). Bacteria causing wound infection such as Streptococcus pneumonia, Bacillus anthracis, Clostridium perfringens, etc. can be destroyed by enzymes such as lysozyme and RNAse A which acts against these pathogens by breakdown of the protective peptidoglycan layer.
Serratiopeptidase (isolated from Serratia sp.) acts as anti-inflammatory agent and speeds up liquefaction of pus and sputum and enhances the action of antibiotics. Papain A derived from papaya helps in defibrination of wounds and prevents cornea scar deformation and also prevents edemas and inflammatory responses and accelerates wound healings. A wide range of proteolytic enzymes of plant and bacterial origin has been considered for the removal of burnt dead skin, in turn helping antibiotics to work better and speed up recovery. For example, collagenase helps to break up and remove dead skin in skin burns and skin ulcers and thus helps in repair mechanism. Chitinase has an antimicrobial property used for the treatment of several infections and also shows activity against new drug-resistant bacterial strains. The cause of many diseases is malfunctioning of enzymes or dysregulation in enzyme production. Such enzymes can be inactivated by competitive or noncompetitive inhibitor and thus can have prospective mode of therapeutics (Bjelakovi and Pavlovi n.d.;Bretner 2015). Table 9.3 enlists a few representative of enzyme inhibitors used in several disorders.

Biomarker Enzymes
The metabolic activity of a cell is dependent on enzyme production, thus the synthesis of enzyme is tightly regulated. A slight change in this homeostatic balance could suggest potential cellular stress, damage to the cell, or disease condition. Thus assay of enzyme activity can make important contributions to the diagnosis of diseases and management of disease (Raja et al. 2011).
Hepatobiliary diseases are diagnosed using lipases, glutamyltransferase, sorbitol dehydrogenase, and amylase. Hepatic parenchymal diseases are determined by glutamate dehydrogenase (GLDH). Lipases and amylase act as biomarkers for acute pancreatitis and pancreatic injury. Alanine aminotransferase and creatine kinase are also linked to hepatic parenchymal diseases along with myocardial and muscle disease. Similarly, muscle disease can be correlated to aldolase (ALD). Lactate dehydrogenases (LDH) and hydroxybutyrate dehydrogenase (HBD) are also associated with myocardial infarction, hemolysis, and liver disease. Alkaline phosphatase is involved in bone and hepatobiliary diseases. Diagnosis of prostate carcinoma is possible by monitoring acid phosphatase (ACP). The presence of aspartate aminotransferase is well established with occurrence of hemolysis. Not only the increase in enzyme concentration but also deficiency of enzyme could report disease condition (Eckfeldt and Levitt 1989;Jung et al. 1987;Werner et al. 1982). Thus monitoring levels of enzymes may help us predict one of several medical conditions and early diagnosis of damaged site ( Fig. 9.1).

Enzyme Replacement Therapy
Enzyme replacement therapy (ERT) is systemic delivery of a deficient enzyme to rescue cellular function in patients. Over the past two decades, enzyme therapy for lysosomal storage disorders (LSDs) (Mokhtariye et al. 2019;Desnick et al. 2019) has been well established and becomes integral to the specific treatment of LSDs. Some common LSDs are Fabry disease, Hurler's disease, Gaucher's disease, and Hunter's disease ( Fig. 9.2). Fabry disease is due to accumulation of excessive deposition of globotriaosylceramide in the kidneys, heart, nerves, and blood vessel, due to deficiency of the enzyme alpha-galactosidase A. It is treated with recombinant human galactosidase A. It prevents Gb3 accumulation by breaking it down. Gaucher's disease is due to the deficiency of an enzyme glucocerebrosidase leading to lipid accumulation in the spleen and liver. The enzymes imiglucerase, velaglucerase alfa, and taliglucerase alfa are used to treat the disease. The iduronate-2-sulfatase deficiency leads to Hunter syndrome. Idursulfase enzyme replacement leads to curing of disease. Alglucosidase alfa is used for treatment of the patients. Glasulfatase is used to treat Maroteaux-Lamy syndrome.
ERT for lysosomal storage diseases is marked as a major milestone in the treatment of inborn errors of metabolism. This treatment principle has been taken into consideration for other disorders, such as celiac disease (CD), chronic pancreatitis (CP) and lactose intolerance (de la Iglesia-García et al. 2017;Rosado et al. 1984; Singu and Annapure 2018).

Analytical Enzymes
With the advancement in technology, enzyme assays are transformed to biosensors, nanoparticles, and micro-assays to play an important part in biochemical research, disease diagnostic, and many other various analyses (Doubnerová 2012). Enzymes can catalyze more than 5000 biochemical reaction, thus establishing the status of enzymes as bioreceptor in biosensor or immunoassay. Enzymes are utilized due the fact that they are (1) able to differentiate between wide variety of analytes (2) not consumed in chemical reactions, nor do they alter the equilibrium of a reaction (3) able to detect lower concentration precisely as compared to other macromolecules.
An ample range of enzymes are the basis of immunochemical technique such as enzyme immunoassay (EIA), enzyme-linked immunosorbent assay (ELISA), and enzyme-multiplied immunoassay test (EMIT) because of specificity and sensitivity of enzymes and as an alternative to radioisotopes. Enzyme-based immunoassay includes diagnostics for noninfectious diseases, infectious diseases, or autoimmune diseases. The enzymes frequently used are horseradish peroxidase, alkaline phosphatase, and galactosidase, and for pesticide detection acetylcholinesterase, butyrylcholinesterase, alkaline phosphatase, organophosphorus hydrolase, malate dehydrogenase, and tyrosinase are employed (Asal et al. 2018;Economou et al. 2017;Pérez et al. 2018;Zhu et al. 2019).
Enzymes are the most widely used receptor molecules in biosensor applications. Oxidoreductases are preferred over all the classes of enzyme in biosensor applications. Not only enzyme catalyzed reaction but also the enzymatic inhibition potential is explored and utilized in biosensors. These biosensors rely on estimating enzyme activity, before and after exposure to a target analyte. For example, in pesticide biosensor cholinesterase's or ureases are used as biological receptors (Colmati et al. 2019). The heavy metal biosensors make use of acetylcholinesterase, alkaline phosphatase, urease, invertase, peroxidase, L-lactate dehydrogenase, tyrosinase, and nitrate reductase (Hashemi Goradel et al. 2018). While the anti-nutrients or additives and preservatives biosensor employ amperometric transduction by enzymes such as alcohol oxidase, carboxypeptidase, L-aspartase, oxalate oxidase, β-glucosidase, cholinesterase, sorbitol dehydrogenase, sulphite oxidase etc. (Taylor 2011). However, enzyme-based biosensors have an inherently limited lifetime, work efficiently under optimum conditions, and are expensive.

Future of Biocatalyst
Enzymes have added a substantial market share in the recent years, due to high acceptance of enzyme-based pharmaceuticals for the treatment of several chronic disorders and digestive diseases. Improvements in biomedical and biotechnology field such as the advent of protein engineering have led to widening applications of enzymes, which have further supplemented the market growth. However, the most challenging disadvantages of the enzyme catalyst are that they are fragile ( Fig. 9.3), lot of work is needed to find optimum condition, and bulk production of pure enzyme is not cost-effective (Singu and Annapure 2018;Ji et al. 2018; Bhatia n.d.).

Stability
The denaturation and loss of enzyme activity are a major shortcoming of using enzymes. To overcome these disadvantages, some novel approach has been proposed and recognized such as production of industrial-friendly extremoenzymes having higher stability in extreme conditions attribute by extraordinary properties of extremozymes such as halostability, pH stability, thermostability, cold adaptivity and organic solvent tolerance (

Optimization
Enzymes with low K m and high V max are ideal candidates for industrial application. Molecular modeling and computer simulations (Priyadarshini and Singh 2019) are being exploited for increasing protein stability, protein-substrate binding, and catalytic rate. Likewise, OptZyme is used to design mutations that improve K m , k cat , or k cat /K m , thereby improving enzymatic activity (Grisewood 2013).

Bioeconomy Screening
In this review, we came across several enzyme-producing organisms, including bacteria, filamentous fungi, yeasts, and plants, but it accounts to only a small fraction of enzymes explored so far, which suggests that a world of new catalytic activities still remains undiscovered due to technical limitation. Currently, metagenomics (Distaso et al. 2017;Wilson and Piel 2013;Fernández-Arrojo et al. 2010) has grabbed the attention to be one of the likely technologies for mining biocatalysts and thus providing biomolecules that fulfill the ever-growing industrial demand in short period of time. Nanotechnology approach has offered a new class of nanomaterials called nanozymes (Huang et al. 2019;Wang et al. 2018;Wu et al. 2019) that mimics enzyme activity which is cost-effective and highly stable (Fig. 9.3). The specificity, sensitivity, and catalytic activity of nanozymes can be enhanced by hybridizing nanozyme with molecularly imprinted polymers (MIPs) creating next-generation biocatalysts.

Conclusion
Among all the macromolecule synthesized by living organism, enzyme has been thoroughly exploited for various purposes. This review article gives a brief reference to unconventional but important lifesaving therapeutic and pharmaceutical applications ( Fig. 9.4). Due to improvement of modern biotechnology and protein engineering, a new area of enzyme engineering has evolved which mostly deals with the purification and stability of these vital enzymes. Literature survey based on the recent studies clearly suggests that many clinically important enzymes have been explored, isolated, and purified. The in silico approach such as molecular modeling, computer simulations, and metagenomic techniques and tools such as OptZyme to identify novel enzyme has taken over the traditional ways of identifying and utilizing enzymes, thereby saving the time, money, and resources used to produce enzymes. The major drawback of the enzyme of being unstable under industrial condition can be solved by extremozymes and introduction of nanomaterial-based hydrides. It is time for us to discover new avenues to exploit enzyme for betterment of healthcare sector and extend the horizons of enzymology.

Abstract
Ionizing radiations are indispensable part of today's disease diagnostics and treatments. Acquaintances with ionizing radiations (IR) such as X-rays and gamma (γ) rays have been increased drastically in human life. Bone marrow (BM) is one of the most sensitive tissues to radiations. Hematopoietic failure due to ionizing radiations is a major cause of mortalities after contact with a moderate or high dosage of total body irradiation (TBI). ω-3 and ω-6 polyunsaturated fatty acids (PUFAs) are nutraceuticals essential for the body. Since these are compounds found in nature as well as are essential nutrients, unlike synthetic drugs, they are essential and safe for our body. Antioxidant potential of PUFAs protects the stem cells, and reports show that they control apoptotic and oxidative pathways. Intake of specific PUFAs and their metabolites can boost stem cell regeneration after radiation damage, proposing a promising application for PUFAs as an additional treatment to radiotherapy for recovery of bone marrow cells.

Keywords
Radiation injury · Total body irradiation · Polyunsaturated fatty acids · Hematopoiesis · Recovery

Ionizing Radiations Severely Damage Hematopoiesis
An increased acquaintance with ionizing radiations in modern society is a matter of serious concern (Brenner and Hall 2007). Ionizing radiations are extensively used in medical practices for research, e.g., radioisotopes used to track biomolecules inside the body, diagnosis such as in X-rays, and therapy as in radiotherapy; they are also being used in industries and at construction businesses. Bone marrow (BM) is one of the most sensitive tissues to radiations in our body. Hematopoietic failure due to total body irradiation (TBI) also leads to death (Shouse et al. 1931). Acute and transient BM suppression typically results from exposure to a low dosage of TBI, which mainly injures hematopoietic progenitor cells (HPCs) and hematopoietic stem cells (HSCs) to a lesser degree. In this case, HSCs can proliferate and differentiate to regain HPC cell number and restore hematopoietic balance. But, if the dose of TBI is too high, IR also severely damages HSCs and impairs their self-renewal by induction of HSC apoptosis, differentiation, and senescence causing damage to the HSC niche, which may eventually lead to BM failure and organism death. Hematopoietic stem cells (HSCs) are multipotent adult stem cells. Till date these are the most characterized and well-understood adult stem cells in the mammalian system. HSCs are generated in the bone marrow (Smith 2003). HSCs differentiate to produce blood cells of lymphoid and myeloid lineages (Ogawa 1993;Nakahata et al. 1982) (Fig. 10.1). HSCs were first discovered as a consequence of the atomic explosions at Hiroshima and Nagasaki during World War II at the beginning of twentieth century while understanding the effect of radiation on the normal tissues (Eaves 2015). The bone marrow (BM) was then identified as the most radiosensitive part (Till and McCulloch 1961). Further experiments on mice showed that radiation indeed compromised the ability of BM to produce enough leukocytes required to prevent infection and platelets essential to avoid excessive bleeding leading to death. Researchers were then busy in identifying the strategies to rescue the damage caused by radiations and found that mouse infused with healthy cells from nonirradiated BM could be saved from radiation-induced lethality. The work of Till and McCulloch (McCulloch and Till 1960) then confirmed the presence of hematopoietic activity in the bone marrow.
HSCs are primarily found in the bone marrow (0.005% of the total cells/~1% of the total mononuclear cells) but have also been found in the peripheral blood in very low numbers. There are two types of stem cells in the bone marrow. Hematopoietic stem cells generate all types of blood cells (Fig. 10.1) as well as bone marrow stromal stem cells (also called mesenchymal stem cells (MSCs) or skeletal stem cells). MSCs have non-hematopoietic mesodermal lineage differentiation potential. They also secrete cytokines to show their paracrine effect on HSCs.
Reactive oxygen species (ROS) are responsible for most of the cellular changes induced by ionizing radiation (Weiss and Landauer 2000). Overproduction of ROS increases oxidative stress in cells which can damage cellular biomolecules (Ornoy 2007).
Presently, the treatment for acute radiation syndrome (ARS) has achieved substantial progress (Ito et al. 2007). Research in the field of hematopoietic signaling and bone marrow transplantation (BMT) has reached at the molecular and genetic level (Hirama et al. 2003;Waselenko et al. 2004;Vávrová et al. 2002). There are limitations for SCT for ARS, such as availability of donor, a high mortality for conditioning, and complications resulting from transplantation. The bone marrow niche plays a pivotal role in the recovery of hematopoiesis after irradiation injury. Both HSCs and their niche are damaged after irradiation exposure (Waselenko et al. 2004). Hence, finding a safe and effective way to modulate stem cell fate for proliferation is warranted.

Modulation of Stem Cell Fate Is Essential to Control Cell Proliferation
A number of factors and signaling pathways are involved in stem cell proliferation and differentiation, such as intracellular signaling molecules, extrinsic stimuli, transcription factors, nuclear receptors, chromatin remodeling, etc. (Inniss et al. 2006; Jaenisch and Young 2008). Hence alternative approaches were developed that aimed to modulate stem cell fate. With these efforts, we could reprogram somatic cells into induced pluripotent stem cells (iPSCs). Commonly used methods to control stem cell fate include gene modifications like upregulation or knockdown of genes involved in stem cell pluripotency and treatment of various growth factors, cocktails, peptides, hormones, etc. (Morrison and Spradling 2008;Vaca et al. 2008;Yau et al. 2012). The newly developed approach focuses on the use of functional cocktails of molecules such as growth factors, hormones, peptides, etc. to modulate stem cell fate (Chan et al. 2009;Kawamori et al. 2010;Liu et al. 2010). However, these methods are found to have limited practical use (Kang et al. 2014).

Effect of Diet on Health
Diet plays a significant role in the development of a human being. Diet rich in fruits and vegetables lowers the rate of chronic diseases such as cardiovascular disease as well as cancer (Heidemann et al. 2008;Lorenzo et al. 2009). Many pathological conditions related to the renal, cardiac, and central nervous system were significantly inhibited by dietary restriction (DR) (Turturro et al. 2002). DR affects total tumor formation and lymphoid nodules in mice (Bronson et al. 1991) and delays the aging process (Blackwell et al. 1995). Diet also has a significant effect on health and tissue homeostasis. Vitamins and minerals are essential for embryonic and organ development, but their potential role in regulating the fate of the stem cells has not been fully studied. Recent reports demonstrate the role of certain vitamins and minerals in promoting the proliferation and differentiation of stem cells (Kawamori et al. 2010;Chan et al. 2009;Liu et al. 2007). Nutrients having health benefits are termed as nutraceuticals. Dr. Stephen DeFelice coined the term "nutraceutical" combining "nutrition " and"pharmaceutical" in 1989 (DeFelice 2002). Nutraceuticals have disease-preventing, health-promoting, or medicinal properties. These nutraceuticals generally comprise of adequate quantity of proteins, carbohydrates, minerals, vitamins, lipids, or other necessary nutrients (Zeisel 1999;Whitman et al. 2001). They can serve as ideal supplements in chemotherapy or radiotherapy, due to their potential to reduce side effects associated with and significant advantages in reducing the healthcare cost (Braunstein et al. 2006). Maintaining appropriate nutrient balance is essential to keeping an individual healthy. On the other hand, excess intake of any nutrient may not be beneficial or may even be harmful to health. Our understanding of the potentially harmful effects of nutraceuticals is increasing with growing knowledge from studies in nutrition and food chemistry (Zeisel 1999;Whitman 2001). This knowledge could help us to develop suitable combinations of nutraceuticals for personalized therapy (Zhao 2007).

Polyunsaturated Fatty Acids Reduce Oxidative Damage
Lipids are important structural parts of cellular membranes. They also serve as an important energy substrate in adipose tissue. In addition to this, there are an increasing amount of evidence indicating that lipids, particularly the ω-3 and ω-6 PUFAs (often designated as n-3 and n-6 PUFAs, respectively), play substantial role in cellular signaling and gene regulation and thus are associated with important body functions (Kang et al. 2014). The n-6 and n-3 PUFAs are essential for humans, as mammals cannot synthesize them de novo and they must be gained from food. n-6 PUFA cannot be converted to n-6 PUFA and vice versa in humans (Kang 2011). Apart from fish oil, PUFAs are also present in other forms like walnut, canola, tofu, flaxseed, and canola. These foods are recommended by the American Heart Association (AHA) (Ariel and Serhan 2007).
These foods are rich in alpha-linolenic acid (ALA), an omega-3 fatty acid. Linoleic acid (n-6 fatty acid) and ALA make use of the same enzymes to undergo a sequence of double bond formation and chain extension reactions to get converted to corresponding 20-carbon derivatives: arachidonic acid (AA; 20:4n-6) and eicosapentaenoic acid (EPA; 20:5n-3). Docosahexaenoic acid (DHA; 22:6n-3) is also an important n-3 PUFA. It can be derived from EPA or we can get it from the diet ( Fig. 10.2). DHA is an important component of the brain and central nervous system. DHA is processed to derive key lipid mediators: resolvins and protectins (Ariel and Serhan 2007). LA and ALA are metabolized through enzymes cyclooxygenase (COX), lipoxygenase, and cytochrome P450 into lipid mediators, including prostaglandins (PG), leukotrienes (LT), thromboxanes (TX), resolvins (Rv),  (Figs. 10.3 and 10.4). These mediators of n-6 and n-3 PUFAs often show opposing effects, for instance, mediators of n-6 PUFAs are pro-inflammatory while mediators of n-3 PUFAs are anti-inflammatory in nature (James et al. 2000).
PUFAs and their mediators apply their action in several ways. They alter the physical and chemical properties of membranes and thereby control membranebound ion channels and receptors (Turk and Chapkin 2013). The eicosanoids derived from PUFA such as prostaglandin E2 (PGE2), a derivative of AA can affect pathways leading to cell growth and proliferation (Yun et al. 2011). More notably, metabolites and eicosanoids derived from PUFAs are ligands of a number of important transcription factors like activator protein-1 (AP1), nuclear factor kappa B (NF-κB), sterol regulatory element-binding proteins (SREBP), peroxisome proliferator-activated receptors (PPARs), etc. (Rajasingh and Bright 2006). PUFAs also alter the arrangement of lipid rafts in cellular membrane and afterward modify cellular mechanisms (Langelier et al. 2010;Shaikh 2012;. Lipid rafts play an important part in regulation of stem cells (Yamazaki et al. 2006;Lands 2012). PUFAs also affect energy metabolisms that control various cellular processes (Gennero et al. 2006). Taken together, it is highly believable that PUFA and their mediators are playing an important part in stem cell generation and differentiation.

PUFAs Enhance the Regeneration of Stem Cells in Healthy as Well as in Cell Radiation Damage
PUFAs have also shown to be beneficial in humans and in animal models in case of obesity, diabetes, cancer, and heart diseases (Zhao 2007). N-3 and n-3 PUFAs often show opposing effects on inflammation, and their effects differ as per the cell type and organs involved, as well as on the respective quantity of PUFAs in the diet (Eritsland 2000). PUFAs also influence in vivo hematopoiesis and thrombopoiesis (De Lorgeril 2007;Brunoand Tassinari 2011;Holub 2002) and in in vitro (Rizzo et al. 1999;Harris 2010;Mutanen and Freese 1996) models. Many reports on immortalized leukemic cell lines show that AA and its metabolites are involved in the regulation of proliferation of stem cells (Krishnamurti et al. 2002). DHA/AA have shown their effect on platelet function, activation, and aggregation (Guillot et al. 2008;Nelson et al. 1997). A report showed that when PUFAs were used as additives in the MK expansion media, a beneficial effect on MK and PLT generation was observed (Shabrani et al. 2012). Studies have shown that oral administration of PUFAs enhanced hematopoiesis and thrombopoiesis in mice (Limbkar et al. 2017). Preventing apoptosis of HSCs promotes better hematopoiesis, megakaryopoiesis, and thrombopoiesis (Weiss and Landauer 2003). PUFAs act both as a proapoptotic and an antiapoptotic agent. Similarly, they act like pro-or antioxidant agents in a target cell as per their dose (Ariel and Serhan 2007). In vitro data suggested that the presence of PUFAs promoted the antiapoptotic and antioxidant property in cells. Reports also show that PUFAs were beneficial in recovery of hematopoiesis in sublethally irradiated mice (Siddiqui et al. 2011). Thus, PUFAs hold a great promise in lessening the impairment after radiation exposure in terms of protection from damage as well as enhancing stem cell proliferation postirradiation (Limbkar et al. 2016).

Future Perspective
The promise of PUFAs for alleviating irradiation damage in tissues demands a further comprehensive investigation. Delineating molecular pathways by which PUFAs and their metabolites play a vital role in curbing radioactive stress as well as promoting stem cell proliferation in irradiated individual is warranted. Unbiased analysis gene, protein, and metabolite levels with respect to stem cell signaling should now be achieved with modern analytical technologies, like genomics, epigenomics, proteomics, and metabolomics. Long-term effect of PUFAs should also be tested in combination of n-6 and n-3 PUFAs in a healthy diet. Finally, the priority of the research should be to develop PUFA-based oral formulations which would benefit patients undergoing radiotherapy or suffering from hematological disorders.

Abstract
Biosensors are currently used in various fields such as clinical diagnostics, food analysis, bioprocess and environmental monitoring. Bacterial infections are a major concern worldwide. The rapid and on-site identification and detection of a pathogens in different sample types e.g. water, food and clinical samples is the need of the hour. Traditional methods due to their inherent limitations do not offer a solution to these challenges.
The recent developments in electrochemical biosensors are quite promising and offer the solution to the requirement of rapid and on-site detection of pathogens. These are sensitive and portable and possess point-of-care utility. The integration of nanotechnology with biosensors has advantages in the field of diagnosis and channelized the path for better treatment and prognosis. The use of properties of nanoprobes and nano-transducer mediators such as surface, electronic and electrocatalytic properties can lead to improved platform for detection strategies. The progress in the field of biosensors for detection of pathogens would make significant contributions to advanced medical technology.

Introduction
There are widespread incidences of bacterial infections worldwide. The origin of pathogens causing infections is water, food or hospital acquired. Some examples are E. coli, S. aureus, Salmonella, Campylobacter, Clostridium, Streptococcus, etc. The food-borne diseases are causing higher level of incidence which includes food poisoning, allergies, infectious diseases and chronic diseases. E. coli, S. aureus, Bacillus cereus, Clostridium perfringens and Vibrio cholerae are the most common foodborne pathogens. Salmonella typhimurium, Vibrio cholerae, Legionella, E. coli O157:H7, C. jejuni and P. aeruginosa are commonly occurring waterborne pathogens (Alahi 2017) (Avelar-gonzález et al. 2015. Pathogenic P. aeruginosa forms biofilms when associated with S. aureus as it invades the tissue and causes infections (Dusane et al. 2019). Infections caused by these bacteria result in acute gastrointestinal illness, acute respiratory illness, hepatitis, dermatitis and death. Hospitalacquired infections include transmission from samples like blood, urine, sputum, etc. Hospital-acquired infections may get transferred from contagious infections or from contaminated air if proper care is not taken (Review 2017). According to The Global Burden of Diseases, Injuries, and Risk Factors study 2016 (GBD 2016), Shigella and enterotoxigenic E. coli (ETEC) were the second leading cause of diarrhoeal mortality in 2016. It was observed among all age groups. About 13.2% deaths were because of diarrhoea. ETEC was the leading cause of diarrhoea mortality in 2016 among all age groups, resulted in 51,186 deaths (26,064) and about 3.2% (1.8-4.7) of diarrhoea deaths. ETEC was also responsible for about 4.2% (2.2-6.8) of diarrhoea deaths in children younger than 5 years (Khalil et al. 2016). With such a huge effect on human health and mortality, it is imperative to identify and detect these pathogens in time to control the spread of infections and treat infected patients . In many instances, it is observed that patients report to the clinic when infection is in advanced stage. In such situation, the first step is to identify the causative agent for the specific treatment. The time span of few hours is also critical to save the life of a patient in many situations. The facile, rapid and costeffective detection of pathogen is a challenge worldwide. There is immediate demand of an effective diagnostic method and tool for identification and detection of pathogens in few hours of reporting of the incidence. The field of biosensors is fast evolving in this direction in order to meet the current and future challenges.

Methods for Detection of Pathogens
There are some established traditional methods for identification and detection of pathogens. These include microscopy, culture, chromogenic media-based detection and serological like ELISA (Straub and Chandler 2003) (Anon n.d.). Culturing method is a gold standard since its inception and serological methods like ELISA have become mainstream tools for some time . Both these approaches are lengthy and face the risk of low sensitivity, cross-contamination and reduced viability of culture during transportation. The traditional methods face one more challenge of sample processing before it is subjected to a specific method. The sample processing largely depends if it is water, food or hospital based. Like in case of water, the low concentration of pathogens in a large volume of sample is a challenge. It requires enrichment and concentration of the samples prior to detection. In case of food and clinical samples, cross-reactivity and extraction of sample are a challenge. Moreover, traditional methods require technical skills and need processing of interfering agents from the sample, and results are subject to operator, interpreter and media used. Microscopic methods are highly subjective, lack desired sensitivity and are difficult to quantitate, and portability and point-of-care utility become difficult for these methods (Lazcka et al. 2007).
The detection method used should be reliable, specific, sensitive, reproducible, fast, automated and cost-effective.
Molecular methods are another approach currently in practice. Molecular methods involve hybridization studies, amplification detection (PCR) and DNA microarrays. Culturing techniques takes several days for detection of pathogens. ELISA involves too many complex steps even if they are sensitive compared to culturing techniques (Rajapaksha and Elbourne 2018); these techniques cannot be used in field detection due to the requirement of processing of a sample. The sample processing involves too many steps that involve cell lysis and DNA extraction, prior to subjecting to PCR. These methods are sensitive compared to culturing techniques but cannot be used for real-time and rapid detection. This method also needs technically skilled person and laboratory with expensive equipment (Groundwater et al. 2017).

Biosensors
Biosensors are analytical devices that convert a biological response into an electrical signal. Fabrication of biosensors and its materials, transducing devices and immobilization methods require multidisciplinary research in chemistry, biology and engineering. The materials used in biosensors are categorized into three groups based on their mechanisms: biocatalytic group comprising enzymes, bio-affinity group including antibodies and nucleic acids and microbe based containing microorganisms. Nowadays, many fields such as clinical diagnostics, environmental sciences and lot more are making use of sensors in every aspect. Also non-enzymatic sensors are developed (Mehrotra 2016). Recently, various novel detection methods are developed for detection of pathogens in near real time, with improved sensitivity and reproducibility, and are portable. With portable device, real-time and on-field detection is possible (Setterington and Alocilja 2012).

Electrochemical Sensors
Electrochemical detection systems have overcome this issue, and commercially available glucose biosensors and blood gas sensors are classical examples of this (Kuss et al. 2018). Electrochemical biosensors apart from its high sensitivity, specificity simplicity of instrumentation can be expanded into multiplex detection platform (Privett et al. 2013). They are highly sensitive, specific and reproducible (Veloso et al. 2012).
The integration of nanotechnology with biomedicine and diagnostics has revolutionized the field and opened new opportunities for better treatment and prognosis. Developing biosensors with nanoprobes and nano-transducer mediators using the surface, physicochemical electronic and electrocatalytic properties can lead to improved tools for pathogen detection. The progress in the field of biosensors for detection of multiple pathogens would make significant contributions to advanced medical technology.

Nanomaterials in Detection
Different nanoparticles such as AuNPs, AgNPs, graphene nanocomposites, metal oxides, metal sulphides and quantum dots are used in biosensing applications. They exhibit optical, electrical, thermal and catalytic properties. Metal nanoparticles also act as catalysts in different physicochemical reactions. Electroanalytical monitor redox processes are catalysed by metal nanoparticles (e.g. platinum). Metal nanoparticles facilitate electron transfer and can be conjugated with biomolecules and ligands (Begon and Garcìa 2002). The electrocatalytic properties of metal and semiconductor nanoparticles enhance electrochemical signal (Bangal et al. 2005). The molecule and polymer-functionalized sensing surface of metal and semiconductor nanoparticles developed in electrochemical sensors and nano-devices.
Carbon-based nanomaterials and quantum dots have applications in electrochemical sensors. Because of electrocatalytic activities and biocompatibility exhibited by quantum dots, they are having vast applications in electrochemical sensors and biomedical field.

Electrochemical Methods
Electrochemical biosensors are of the following four types: amperometric, potentiometric, impedimetric and voltammetric. The amperometeric and potentiometric biosensors are the most commonly used electrochemical biosensors. In amperometeric biosensor, the potential between the two electrodes is set, and the current produced by the oxidation or reduction of electroactive species is measured and correlated to the concentration of the analyte of interest. In potentiometric biosensor, the electric potential is due to changes in the distribution (Amiri et al. 2018). Conductometric biosensor measures the ability of an analyte (e.g. electrolyte solutions) or a medium (e.g. nanowires) to conduct an electrical current between electrodes and reference nodes. Impedance biosensors measure the electrical impedance of an interface in alternating current steady state with constant direct current bias conditions. Electrochemical sensors are the emerging tools which are used in current status. Different techniques such as cyclic voltammetry, differential pulse voltammetry, square wave voltammetry, etc. are used in electrochemical studies (Manuscript and Sensors 2013).

Nanomaterial-Enabled Electrochemical Detection of Pathogens
Electrochemical biosensors, depending on the capture agent, can be categorized into DNA-based sensors, enzyme-based sensors, aptamer-based sensors or immunosensors. As nanotechnology has progressed so far, there are majority of applications of nanoparticles in electrochemical sensors (Anon 2010). Nanoparticles are used as nanoprobes and transducers, and they play a major role in enhancement and signal amplification. Nanomaterials developed electrochemical studies to achieve a faster signal response involved in electron transfer of biomolecules and higherspecificity electrochemical biosensors (Journal n.d.). Most often, function of nanomaterial in electrochemical-based nanosensor can be as a selective template for immobilization of target molecules and signal transduction for bio-barcode DNA or aptamer ( Fig. 11.1), antibody-functionalized NPs, enzyme label and antibody label (Amiri et al. 2018). The nanomaterials can be used either as transducer platform or probe for the electrochemical detection. Semiconductor quantum dots, carbonbased nanomaterials (Muniandy et al. 2019a) and metal nanoparticles are promising materials for electrochemical biosensors (Shur 2008). Recently, graphene oxide (GO)/reduced graphene oxide (Fig. 11.1) is in huge demand for the development of label-free biosensors (Muniandy et al. 2019b). Advantages of GO nanosheets containing abundant surface oxygen groups, including hydroxyl, epoxide, carbonyl and carboxyl groups, provide an array of reaction sites for interaction with other nanoparticles, for instance, gold NPs and carbon nanotube. For reduced graphene oxide (rGO), implying interaction with another nanomaterial increases solubility of RGO in hydrophilic medium. AuNPs are more involved in single label dependent, whereas AgNPs are compatible in label-free approach for pathogen detection in electrochemical sensors. Label-free gold-silver core shell nanoparticles have been reported for the detection of E. coli with detection limit of upto 90 CFU/mL (Hazani et al. 2019).

Semiconductor QD-Based Detection of Pathogens
Photoelectrochemical sensors based on quantum dots are also studied for biological and chemical detection. These sensors consist of quantum dots which are immobilized on to an electrode so that upon their illumination, a photocurrent is generated which depends on the type and concentration of the respective analyte in the immediate environment of the electrode (Yue et al. 2013). Semiconductor quantum dots also show stable electrochemical properties; therefore, they are used in electrochemical sensing methods to detect various organic and inorganic polluting agents (Sahu 2019). PbS, CdS and ZnS quantum dots can be used for the simultaneous electrochemical detection of pathogens (Samples 2017;Pedrero et al. 2017). Thus, the integration of quantum dots in the electrochemical study has very promising real-life applications.
As listed in Table 11.1, Xuyang shi et al. (Shi et al. 2018) reported on detecting E. coli without culture enrichment, and no sample processing was required in detection. Detection limit obtained was less than 10 cfu/mL. They have used cyclic voltammetry for enzymatic detection on glassy carbon electrode. Total assay time was 84 min. Chan et al. developed an impedimetric immunosensor for detection of

Electrochemical Detection of Multiple Pathogens
Multiplexed electrochemical immunoassays were developed for sensor miniaturization and automated detection which leads to their high sensitivity, low cost, low power requirements and high compatibility with advanced micromachining technologies. When electrochemical sensors are used, multiple detection of pathogens is possible and is sensitive. Simultaneous and multiple detection of biological pathogens gives advantage for developing a sensor which can be used for early detection of pathogens. Microfluidic-based device was used for simultaneous detection of S. aureus and the limit of detection was 100 cfu/mL. An electrochemical sensor was constructed for multiplexed detection of E. coli and S. aureus based on a 2 × 2 junction array formed with gold tungsten wires on single-walled carbon nanotube and polyethylenimine. The detection time is rapid, and the LODs for E. coli and S. aureus were 10 μL and 100 μL, respectively, and limit of detection for E. coli was 100 cfu/mL (Yamada et al. 2016 (Laczka and Doblin 2020). The antibodies used for capture pathogens were linked to streptavidincoated magnetic beads (MB@SA). The prepared biosensor displayed excellent performance, and this method could be expanded readily for detecting other pathogenic bacteria and would be of great value for future applications in food safety. Ai et al. developed an electrochemical disinfection for E. coli and S. aureus in drinking water based on ferrocene-PAMAM-multiwalled carbon nanotube-chitosan nanocomposite-modified pyrolytic graphite electrode. Potential of 0.4 V was applied for 10 min, and pathogens were killed. It was confirmed that electrochemical method for the disinfection of pathogens was established. Multiple detection of pathogens can be performed with the technique square wave voltammetry. Stripping peaks are observed with the use of quantum dots CdS, ZnS and PbS (Shang et al. 2013). The virulent npcRNA genes used as targets for the detection and identification of V. cholerae, Salmonella sp. and Shigella sp. were VrrA, StyR-36, CssrB, respectively. Multiplexed detection strategy was reported by using npcRNA target genes of these enteric bacteria. For multiplex pathogen detection, the LOD values were 51 aM10 (VC-PbS), 53 aM (SA-CdS) and 38 aM (SH-ZnS). The sensitivity for multiplex detection was comparable to single detection. The relative standard deviation (RSD) values for four repetitive measurements were 0.50-6.42% (Vijian et al. 2015).
As shown in Fig. 11.2, Brandao et al. described the fabrication of a magnetoelectrochemical immunosensor for S. typhimurium. For comparison, both microand nano-sized magnetic beads were coated with monoclonal antibodies against S. typhimurium. A second polyclonal HRP antibody anti-Salmonella was used as electrochemical reporter. As shown in the figure, different strategies were applied for the electrochemical detection of Salmonella. Figure 11.2d shows simultaneous detection of three pathogens: Salmonella, E. coli and Campylobacter. By using  ). (c) Salmonella detection with an ELIME (enzyme-linked immunomagnetic-electrochemical)-based sandwich assay that involves three sequential procedures: washing-blocking-coating, two sequential incubations for the immuno-recognition events, and the electrochemical detection using eight-well/SPE strips (Fabiani et al. square wave voltammetry, stripping peaks of Cd, Zn and Pb were observed. The immunosensor was fabricated by immobilizing a mixture of antibodies, against the three target bacteria, onto the surface of a multiwalled carbon nanotubepolyallylamine-modified SPE (MWCNT-PAH/SPE). The sandwich assay was carried out by adding three specific antibodies conjugated with different quantum dots (CdS, PbS and CuS for E. coli O157:H7, Campylobacter and Salmonella, respectively). After a dissolution step, the metallic component of the QDs was released and three stripping peaks were observed using square wave anodic stripping voltammetry (SWASV). The authors demonstrated that MWCNT-PAH/SPE film enhanced the peak currents (if compared to bare SPE and PAH/SPE) due to its particular electrical properties. The time of detection was about 4 h, and the detection limit of the assay was found to be 400 CFU/mL for Salmonella and Campylobacter and 800 CFU/mL for E. coli. The potential of this immunosensor for multiplexed analysis in food samples was proven by the authors analysing fresh bovine milk spiked with high concentrations (104 CFU/mL) of the three target bacteria (Vijian et al. 2015).

Conclusion
Various nanoparticles such as carbon-based materials, quantum dots, semiconductor QDs and metal nanoparticles have potential applications in the field of electrochemical biosensors. In detection of pathogens, various modification strategies, bioconjugation methods and fabrication process are used according to the mode of detection and target pathogen. Current research in field of electrochemical biosensors and nanotechnology is evolving towards developing miniaturized, portable and low-cost rapid biosensors for detection of pathogens within few hours and point-ofcare diagnostics.

Future Perspective
A rapid, sensitive and low-cost detection method for pathogens has a huge significance in terms of early diagnosis. Lab-on-chip devices using the electrochemical and microfluidic systems for the detection of pathogens will have greater potential for practical use in near future. Due to the integration of nanotechnology with electrochemical biosensors, the development of next-generation biosensors meeting the challenges for pathogen detection seems a possibility in near future. Electrochemical biosensors are developed for detection of pathogens. Also, the multiplexed detection is possible with the use of electrochemical sensors. Electrochemical biosensors based on nucleic acid or aptamer displayed high sensitivity and low detection limit; however, the stability and accuracy have to be improved. The immunosensors based on electrochemical techniques based on the combination of antigen and antibody is widely used in detection of pathogens. These biosensors have high accuracy. The further utility for nanomaterial-enabled electrochemical biosensors for pathogens is that multiple pathogens can be detected simultaneously.
Point-of-care diagnosis for bacterial pathogens can be achieved with the advent of nanomaterial-based electrochemical biosensors owing to their physicochemical and electronic properties.
system through chemical synthesis method, the higher rate of toxicity of such chemicals and also the free drugs have been of great concern. A clear requirement of a delivery system that stems from a biological source that could nullify the toxicity issue and increase the bioavailability of the drug maintaining its therapeutic value was thus very evident in recent past. Biopolymers, in this case, have emerged to be that candidate containing the huge potential to be used as an efficient element in the formulation of nano-drug delivery carriers to transport desired therapeutic drug with significant efficacy. These bio-nanoparticles have advantages of increased stability in biological fluids, higher cellular uptake, controlled delivery mechanism, and targeted distribution profile to provide a systemic treatment with least or no toxicity ). These biopolymeric nanocarriers thus with its enormous potential have been in the spotlight in the formulation of drug delivery systems for countless healthcare applications.

Drug Delivery Systems
Drug delivery is a multidisciplinary arena that constitutes knowledge from the field of pharmaceutical sciences, chemistry, medicine, and engineering. It is briefly the methodology or procedure of administering any pharmaceutical compound to achieve the desired therapeutic effect in animals/humans. Any successful medicinal regime depends on the utilization of the pharmaceutically active agent (drugs or therapeutics). These agents are not essentially effective by itself, and their efficacy is dependent on how they are administered. Drug administration has a significant effect on overall pharmacokinetics, absorption, distribution, metabolism, excretion, therapeutic efficacy, and toxicity (Tibbitt et al. 2016).
In an ideal scenario, drugs applied in vivo precisely target disease-causing organs/tissues/cells of interest and maintain therapeutic concentration for the desired time duration. However, in reality, drug transportation is a complex phenomenon, and controlling the event is even more difficult to achieve. In conventional delivery methods, either the drug is orally administered in the form of capsule or tablet or could be given intravenously to the hepatic system via injections. There is another mode of administration where the drug is locally given to the eyes, lungs, or body cavities for local effects. Systemic administration of the therapeutic drugs by any of the above-mentioned routes could be challenging for certain scenarios where barrier characteristics of tissues are strong or those having poor vascularization behaviour (Lavik et al. 2012). With the conventional system, the poor organ-/ cell-specific targeting, erratic drug release rate, and stability issues create a hindrance in the overall efficacy of the delivery mechanism ( Fig. 12.1).
With the cell-specific number of novel drugs that overcome the limitations mentioned in conventional drug administration, advanced drug delivery systems (DDS) have been designed with the help of wide spectrum of interdisciplinary approaches combining biological, material, and chemical strategies.
To surpass such obstacles, there is a prime requirement of developing efficient non-conventional drug delivery mechanism which bypasses the first-pass mechanism, minimizes the administered drug quantity, and has high locationspecific potential with higher therapeutic efficiency. In this context, various smarter drug delivery systems have hence been envisaged and have slowly entered the research domain in the past two decades. The motivation of all smart drug delivery systems is to optimize the therapeutic effect by regulating the time, drug flow rate, and hence, the quantity of drug administered. Delivery of these drugs can be achieved using various modes such as intravenous or oral administration (in the form of tablets, liquids, capsules, ointments, aerosols, and so on). This is achieved by either developing a new, safer-to-use drug with optimized parameters or modifying the existing drugs to make it safer and better in terms of its usage.
In the past 10-15 years, extensive research and development have been happening in this field of novel drug delivery system. These advanced systems could be defined as technologies that are engineered to improve the efficacy of therapeutics in vivo with increased stability, minimal degradation and loss, increased bioavailability, controlled pharmacokinetics, pharmacodynamics, and well-defined drug release profile in a highly specific fashion localizing the therapeutic effect with minimal or no toxicity or immunogenicity (Bhagwat and Vaidhya 2013). Broadly these DDS consist of two broad-focus areas, being targeted systems and systems providing controlled drug release with multifaceted advantages (Fig. 12.2).

Limitations of Conventional Dosage
General:

Nanoparticle-Mediated Drug Delivery System (DDS)
By definition, nanoparticle-mediated drug delivery system is the use of nanotechnology to nanomedicine field for drug delivery to the patient's body with widespread applications to check, overhaul, and govern biological systems via application of the advanced strategies and assemblies at nano dimension range (1-100 nm) (Parveen et al. 2012;Sarkar et al. 2011;Ranghar et al. 2014).
With every passing day, the constant demand for more controlled, site-specific, and less toxic drugs is being invented which offer interesting delivery routes. There are various drug targeting techniques which have been in use for several decades, e.g. niosome nanoparticles, resealed erythrocytes, microspheres, monoclonal antibodies, liposomes, magnetic microparticles, and so on (Parashar et al. 2013;Kumar et al. 2011;Jain et al. 2010;Shingade et al. 2012;Bala et al. 2014;Guo et al. 2010;Miri et al. 2013). Out of all these, nanoparticle-mediated drug delivery system emerges as a forefront player. Nanoparticles (nanospheres and nanocapsules) are generally in the solid state, either crystalline or amorphous. These nanostructured components adsorb and/or encapsulate the required medicinal moiety, shielding it against various harsh biochemical and enzymatic dreadful conditions inside the human body. In recent years, various engineered or synthesized biodegradable biopolymer-based nanoconjugate formulations have gathered immense attraction as potential delivery machinery, because of their wide-spectrum advantages (Wu et al. 2013a;Hong et al. 2013;Ita 2015). Such nanoparticles are fabricated or synthesized for specifically accumulating (or acting upon) target cells, refining their efficacy,  (Hudson and Margaritis 2014;. The worth of nanotechnology-based revolution market has been reported of almost covering $1 trillion of the global marketplace (Banik and Brown 2014;Ninganagouda 2015). This gives a clear idea of the impact of these modes of DDS. Innovative delivery systems with pharmaceutically beneficial effects, fabricated in the nanometre scale, tend to have altered and valuable physicochemical characteristics (Banik and Brown 2014), very much dissimilar from the initial substance of the same conformation. Currently, various modes of these nanoformulations are in use, like nanowires (Zhang et al. 2014;Alivov et al. 2014;Bataille et al. 1982), nanoshells (Ranghar et al. 2014;Lai et al. 2013;Amin et al. 2009), quantum dots, nanopores (Bhatt and Aqil 2010;Wu et al. 2013b;Stevenson et al. 2012;Yang et al. 2012), gold nanoparticles, etc. (Bhagwat and Vaidhya 2013). At these nanoscale levels, the high surface-to-volume ratio, tailorability to tune surface properties, and wide-spectrum or multifunctionality characters create promising prospects for these prepared unique nanomaterials in the pharmaceutical field.

Advantages of Novel Drug Drug Delivery System
Merits of nanoparticle-based drug delivery system: • Increased bioavailability.
• Increased efficacy: specific targeting of the drug molecule to the affected tissue or organ making normal tissues unaffected. • Biocompatibility: using biopolymer, less toxic chemicals, natural resource-based systems. • Sustenance of the total amount of drug administered over the dose periods.
• Better treatment probability of many chronic illnesses, e.g. arthritis, cancer, asthma, etc. • Reduction in the occurrence and undesired systemic side effects related to high blood plasma drug concentration. • Prevention from first-pass metabolism and gastrointestinal tract degradation. • Reduction in the total drug administered amount over the total drug treatment period reducing the occurrence of systemic and local side effects. • Versatility in terms of pH dependency and charge-based, size-dependent system releases the drug according to the body's requirement. • Better patient compliance as the number and frequency of doses required reduce to maintain the required optimum therapeutic responses.

Biopolymer-Based Drug Delivery System
Even though inorganic, organic, or organic/inorganic hybrid materials have been used for nanoparticle fabrications, polymeric nanoparticles have also been exploited in various therapeutic application in healthcare as biomaterials for delivery carriers of therapeutic molecules (drugs and genes) (Nitta and Numata 2013). Polymeric materials, because of its wide range of beneficial and high efficacy results worldwide, have been considerably utilized in the nanomedicine sector for numerous pharmaceutical drug delivery system preparations. This biopolymer-based drug delivery system offers several benefits over metal-and ceramic-based systems. The very properties like biodegradability, flexibility in synthesis methods, the potential for modification as per requirement, and ability to be fabricated to form different morphological entities like fibres, films, particles, and gels make these polymerbased model a highly desired one for delivery system formulation. The charge, material constituent, functional groups, molecular weight, structural factors, etc. influence the properties of formed systems like release profile, degradation rate, mechanical parameters, cellular internalization, and overall stability. These polymer-based nanoparticles are categorized either as naturally occurring or synthetically fabricated polymer compounds. Even though the properties (chemical, conformational, and functional) of many polymers from synthetic sources like poly(glycolic acid), copolymer poly(lactide-co-glycolide), polyurethanes, polyesters, poly(lactic acid), and polyethylene glycol (PEG) are easily controllable with minimal variations for nanomaterial synthesis, the drawbacks of these polymers include lesser biocompatibility and supplementary inflammatory and chemically toxic, immunogenic reactions. Natural polymers or biopolymers, on the other hand, are raw materials, naturally in the biological milieu; examples include pullulan, chitosan, heparin collagen, albumin, etc. The main attractive properties of these materials are that they are biocompatible and enzymatically biodegradable. Because of the properties of surface alteration, appropriateness for targeted drug delivery, pharmacokinetic control, and less toxicity, these biopolymer drug delivery systems are well sought after in therapeutics.
Biopolymeric nanoparticles are usually prepared through techniques like solvent evaporation, diffusion, spontaneous emulsification, emulsification, and polymerization methods. These synthesis procedures significantly impact the properties of prepared nanoconjugates, especially the basic and advanced integral biological characters. And thus, for successful circulation, internalization, and distribution, along with pursuing in the host system, optimization of these biopolymer-based systems is equally important. For systematic circulation and target-specific drug delivery, various surface modifications and functionalization are conducted to increase the stability, specificity, and pharmacokinetic effect of the nanoconjugate.

Protein Nanoparticles
Gelatin and albumin were the first naturally occurring proteins, used for the synthesis of nanocarriers. These systems are less immunogenic, non-toxic, highly stable, biodegradable, and easy to fabricate, making them a potential candidate for biopolymeric nanoparticle synthesis . Also the defined primary structure of these biological moieties makes it easier for surface modification and to design drug conjugates as per requirements Nitta and Numata 2013;Joye and McClements 2014).
Albumin: Albumin, a blood plasma protein, has been a remarkable molecule in terms of drug delivery system formulation because of its inherent properties of being biodegradable, less toxic, and biocompatible. It not only helps in transportation, metabolism, and distribution of various endogenous and exogenous biological moieties but also has immense antioxidant properties protecting the cells from the toxic effect of chemicals and free radicals. These unique properties made it the desired protein for nanotechnology-based drug delivery systems. Albumin nanoparticles can be prepared by pH-induced dissolution, heat treatment emulsification, chemical treatment emulsification, or self-assembly methods (Hudson and Margaritis 2014). Human serum albumin (HSA) and bovine serum albumin (BSA) both have been thus extensively exploited in the fabrication of nanotechnologybased drug carrier systems (antibodies, antiviral), anticancer drugs.
Collagen: Collagen is the most abundant constituent protein in a mammalian system and a structural building component of all vertebrates. It has high biodegradation, biocompatibility, and bioavailability properties with ample scope for surface modifications making it the preferred mode for the fabrication of biopolymer-based nanocarriers (Banik and Brown 2014;Memon et al. 2013;. These nanocarriers are inherently thermally stable, and once uptake by reticuloendothelial system, it enables for easy uptake of drug molecules into the cells. Gelatin: Partial hydrolysis of collagen and heat dissolution gives rise to these naturally water-soluble biological macromolecules named gelatin. These non-toxic, non-carcinogenic molecules in both type 1 and type 2 form show properties like non-irritability, biocompatibility, low immunogenicity, antigenicity, and increased biodegradability. As gelatin contains multiple functional groups, it gives wide possibilities for modification (cross-linking and derivatization) as per requirement Joye and McClements 2014;Sanjay et al. 2012).
Silk proteins -sericin and fibroin: The structural components of silk fibre are gum-like sticky protein and sericin enveloping hydrophobic glycoprotein fibroin. Fibroin is less immunogenic, histocompatible, and non-toxic. Its properties like increased surface area, high porosity, biocompatibility, and biodegradability make it the preferred choice for application in biomaterial-based drug delivery system. On the other hand, hydrophilic glycoprotein sericin contains inherent antioxidant properties along with those of protein nanoparticles. It also helps in cell growth and wound healing without eliciting any immunogenic response Banik and Brown 2014;Koutsopoulos 2012).
Keratin: These cysteine-rich structural biological moieties contains disulphide bonds giving it increased mechanical strength. These properties are used in the formation of various coating materials in biomaterial and tissue engineering applications.

Polysaccharide Nanoparticles
Polysaccharide-based nanocarriers have found an increased level of attention in the past few years due to its improved properties and provision for ample modifications for formulation of nano-based drug constituents for delivering proteins, peptides, and nucleic acids in vivo.
nanoconjugates ). However, its high water solubility restricts itself from being used in drug delivery applications. Generally, pullulan in its hydrophobized form is utilized as drug delivery transporter that has the potential to build stable nanoparticles in colloid suspension. These molecules have characters to form films on the surface that gives a distinct potential to capture various biomolecules. The shelf life of these molecules is high due to their character of having oxygen obstruction mechanism. Pullulan has been exploited for its affinity towards specific organs like the liver. It also has been found to accumulate in the liver compared to other tested polymeric compounds. This property can be extensively used for the medicinal purpose as site-specific drug delivery system. γ-Polyglutamic acid (PGA): PGA is a polyamino acid moiety, composed of blocks of repetitive d-and l-glutamic acid with inherently water-soluble, biodegradable, non-immunogenic, and biocompatible properties. Approaches like emulsification-diafiltration, self-assembly, and ionic gelation are used in the process for making biopolymer-based nano-drug carriers with PGA. These biopolymers have been used for delivering drug components like DNA, proteins, and biopharmaceuticals for various diseases like malaria (Cherif et al. 2011).

Synthesis of Biopolymeric Nanocarriers
Broadly the methodology of synthesizing biopolymeric nanocarriers can be divided into two major groups, i.e. top-down approach and bottom-up approach. In topdown approach, a preformed biopolymer solution is taken as starting material in which the subsequent process breaks down to nanoscale moieties. In bottom-up strategy, the monomers are taken into consideration to form self-assembly, thus shaping into nanoscopic aggregates.
Emulsification is considered to be the most applied strategies under the former category that involves mixing two characteristically immiscible liquids in the presence of some surfactant compounds. Under the influence of high shear stirring and homogenization, ultrasonication, or simple (water in oil)/double (water in water) emulsions, nanoscale formulations can be achieved. Through the effects of crosslinking, water-soluble biopolymers can be formed in this regard. The double emulsion on the other hand gives rise to nanocapsules. Even though these organic solvents are of huge potential, the very properties of few of these that get denatured upon altered parameters can act as hindrance in the whole process. In this regard, extrusion technique under top-down category comes in handy. In this process, biopolymer-drug mixture is injected through nozzle into a separate solution wherein with change in parameters like temperature or viscosity, the applied polymer is aggregated to form nanoparticles. The size of these formed nanocarriers can be controlled via altering the parameters like pH, temperature, viscosity, etc.
The bottom-up approach on the other hand is based on the principle of selfassembly of the biopolymeric monomers due to the effect of parameters such as pH, concentration, ionic strength, temperature, etc. Coacervation, nanoprecipitation, and inclusion complexion are few of these techniques that play a vital role.
In coacervation, biopolymers interact with each other in a liquid-liquid phase separation strategies forming a different phase encapsulating active component (drug of interest). Hydrophobic forces, hydrogen bonding, and electrostatic interactions contribute to these kinds of strategies. Depending on the types of polymer that are used in these processes, it could be called simple or complex coacervations.
Nanoprecipitation is a unique strategy for hydrophobic biopolymers where water-miscible solvents like ethanol and acetone are used to dissolve the biopolymers. When these polymers are added to aqueous solution, the organic solvent diffuses to form biopolymeric aggregates. After the whole process is finished, the solvent is removed through evaporation. Thus, the strategy concludes with synthesized nanoparticles in aqueous phase.
Inclusion complexion process demands supramolecular aggregate with cavities hosting guest molecule to be carried. Hydrogen bonding, hydrophobic interactions, Van der Waals force, etc. play crucial role in these processes.
Even though there are various techniques involved in synthesizing these biopolymeric nanocarriers, drying is common almost in all of them as solution form makes the entire delivery system prone to leakage of drugs and chances of hydrolysis. The dried form of nanocarriers helps in retaining its inherent stability. These drying could be achieved either using spray-drying technique or freeze-drying strategies. In spray-drying, the prepared nanoparticle solution is injected through hot air steam inducing solvent evaporation. In case of heat-sensitive compounds, freeze-drying, a multistep dehydration technique, is preferred.

Nanoparticle-Based Smart Drug Delivery Applications
The whole purpose of synthesizing these nanocarriers is making it efficient and smart with full control over drug release profiles. These unique compounds are designed to overcome challenges like crossing anatomical barriers and maintaining effective drug concentrations over desired period releasing the drug of interest right at the site of interest. These releases are either by natural diffusion process or triggered by external parameters in the body like pH, temperature, radiation or ultrasound, etc. These drug carriers are overall fabricated to protect, transport, and deliver drug of interest in a highly specific, efficient, and controlled manner making itself named as 'smart' in its true sense.
Cancer therapy: Cancer has been the disease that has been at the centre of major research endeavours. But even after so much of activities, there are an ample number of shortcomings that exist around cancer treatment. Current strategies like chemotherapy, radiotherapy, and surgery in its capacities or combination too are not able to provide a safe treatment altogether. Chemotherapy, with its highly toxic effects, damages healthy cells while targeting cancerous ones and thus creates a cascade of devastating cell-damaging pathways ultimately deteriorating health condition in the long term. Surgery on the other and many times is ineffective as it is unable to take out all minute traces of cancerous cells deep inside some organs or sites, leaving high possibilities of recurrences. Thus, the main aim of the smart nanocarriers with biopolymers is to deliver cancer drug to target site with minimum amount of drug required to achieve highest efficacy by selectively targeting only cancerous cells leaving all healthy cells and site intact and unharmed. This could be achieved either by active targeting by nanoparticle functionalization with specific peptide, protein, or antibodies based on tumour receptors or by passive targeting where passive diffusion is taken into consideration for delivering drugs through interstitial space. PCL-PEG and PLGA polymers are few of the examples in this regard that have been used to deliver anticancer drugs like CPT, PTX, DOX, etc. to target site (Calzoni et al. 2019;Mallakpour and Behranvand 2016). Immunotherapy on the other hand deals with utilizing molecules to boost the immune system, making the physiological system detect and eliminate cancer cells with optimum efficacy.

Future of Biopolymeric Nanocarriers
The very basic advantages of using a biopolymeric nanocarrier is its characteristics of being biodegradable, biocompatible, and non-toxic and its optimum biodistribution profile with wide opportunities for modification to achieve high precision in terms of drug loading and delivery to target site. Biopolymer-based drug carriers have been used widely for various diseases ranging from transdermal delivery to organ-specific delivery, wound healing to cancer treatment, and minor infections to life-threatening diseases (Sarkar et al. 2011;Singh and Lillard 2009;Hans and Lowman 2006). The advancements in this field have helped to target diseases like human immunodeficiency virus type 1 (HIV-1), herpes simplex virus type 1 (HSV-1), antifungal agent, etc. (Gopi and Amalraj 2016). Even though there have been immense development and intense research activities in these aspects, there are many obstacles and limitations with biopolymer-based delivery systems. More research endeavours focusing on locating the exact functionalization technique, standardizing protocols for surface modifications, optimizing the experimental parameters, choosing appropriate carrier candidate (proteins/polysaccharides), understanding the interaction between drug and biopolymers for optimum binding and release, etc. will help in formulations of smarter biopolymer-based drug carrier systems to target diseases in a highly sophisticated way with highest efficacy and lowest side effects for a better healthcare. The knowledge of chemical science, material science, physiological studies, and nanoscience would holistically help in bringing biopolymer-based drug delivery system in the forefront of dealing with complicated and life-threatening ailments.  males had higher values than subject females and it increased after treatment in ECOG3 and ECOG4 in significant findings of HRV measures depicting that its deviation on gender basis has been represented in (Fig. 13.1). Subject males had lower values of SD 2 in ECOG3 and ECOG4 and L mean in ECOG3 pre-and posttreatment. Control males had lower values of all HRV measures than control females except in mHR, LF/HF, α 1 , α 2 , ApEn, and SampEn features. Using statistical analysis, one-way ANOVA with p = 0.05 in treatment with no treatment group, mRR, RMSSD was found to be significant. In males who have undergone treatment, SDNN, STDHR, RMSSD, NN50, pNN50, and TiNN were significant. In males who have not undergone treatment, SDNN, RMSSD, NN50, pNN50, and TiNN were significant. In females who have undergone treatment, mRR, SDNN, mHR, STDHR, RMSSD, NN50, pNN50, TI, and TiNN were found to be significant. In females who have not undergone treatment, SDNN, STDHR, RMSSD, NN50, pNN50, and TiNN were found to be significant. Using students' t-test, between ECOG2 treatment and without treatment group, LF/HF was significant, whereas between ECOG3 treatment and without treatment group, NN50, SD 1 , SD 1 /SD 2 , α 1 , and α 2 were significant. Between ECOG4 treatment and without treatment group, SD 1 /SD 2 and α 2 were significant. Between ECOG3 male with treatment and without treatment group, NN50, LF, LF/HF, SD 1 , ApEn, and SampEn were significant. Between ECOG3 male and female treatment groups, SD 1 /SD 2 was significant. Between ECOG4 male with treatment and without treatment group, α 2 was significant. Between ECOG4 females with treatment and without treatment group, HF, SD 2 , and SD 1 /SD 2 were significant. Between ECOG4 male with treatment and female with treatment group, α 2 was significant. Between ECOG4 male without treatment and ECOG4 female without treatment, HF, LF/HF, and SD 1 /SD 2 were significant.

Discussion
After the treatment, HRV measures increased stating that sympathetic activity was higher before the treatment but after the treatment it decreased and parasympathetic activity increased. Considering gender as confounder, male subjects had higher values of HRV measures than control females which increased in ECOG3 and ECOG4. On the contrary, control males exhibited decreased HRV measure when compared to their female counterparts. Subjects undergoing combination of chemotherapy and radiotherapy had lower HRV measures than the ones undergoing only chemotherapy. Healthy subjects have higher HRV than cancerous patients (De Couck et al. 2013). The present findings were in line with alcoholic liver cirrhosis that showed lowered values of all time domain and spectral analysis parameters of SDNN, RMSSD, and pNN50 in comparison with controls. These values indicate vagal impairment and sympathetic predominance (Fleisher et al. 2000). Subjects have clinical history of loss of appetite, abdominal pain, diarrhea, ascites, vomiting, constipation, edema, and cachexia. Chemotherapy gave reduced HRV measures (Fadul et al. 2010;Fagundes et al. 2011;Hirvonen et al. 1989;Salminen et al. 2003;Brouwer et al. 2006). Further, Salminen gave eight cycles of treatment to cancer patients with combination chemotherapy but found neither any cardiotoxicity nor any HRV change (Salminen et al. 2003). On the same line, Brouwer analyzed HRV variations after 22 years of treatment in malignant bone tumor patients treated with doxorubicin and found decreased HRV as compared to the healthy subjects (Brouwer et al. 2006). Moreover, the induction treatment of acute lymphoblastic leukemia regulates the vagal chronotropic control of the heart due to injection of vincristine. The consequences are similar to the pattern observed in vagal blockade and in diabetic cardioneuropathy which vanishes slowly after the treatment is over (Hirvonen et al. 1989). Previous work also suggested that after treating non-Hodgkin's lymphoma patient with doxorubicin, there was attenuation in sympathetic tone which had predominance before the treatment (Bruchfeld et al. 2010). Therefore, with the current findings, sympathetic activity enhances and parasympathetic activity slows down at the time of stress as per HRV findings. Sinus node in the heart is affected by sympathetic and parasympathetic branches of autonomic nervous system which in turn affects the heart rate (Nousiainen et al. 2001). In acute myeloblastic leukemia, in their critical and poor clinical states, HRV was found to be reduced (Tiller et al., 1996). HRV was found to reduce in leukemia patients in which sympathetic activity increased due to reduction in parasympathetic activity (Drzewoski and Zawadzka 1992).
It was observed with the application of treatment (chemotherapy or radiotherapy) that LF and LF/HF decreased and HF increased in ECOG4 stating parasympathetic dominance. Sympathetic dominance occurred with disease, but gradually with the treatment given, parasympathetic dominance takes over (Nevruz et al. 2007). The current study was deprived of subjects having undergone surgery. Patients who have undergone surgery and are severely sick have ANS dysfunction to some extent although it remains unknown (Shukla and Aggarwal 2017b;Ushiyama et al. 2008;Laitio et al. 2007;Gang and Malik 2002). The current study excluded patients with diabetes, mental illness, hypertension, cardiac anomalies, and infectious disease to get exclusive information on the effect of treatment on lung cancer using HRV.

Conclusion
Increased HRV measures after the treatment leads to parasympathetic dominance and betterment in the quality of PS of patients. These objective findings can help clinicians to evaluate the PS and help them to improve the quality of life of their patients.

High Rates of PLMD Have Been Found in People with
• Spinal cord injury • Spinal cord tumors • Multiple system atrophy • Sleep-related eating disorder (SRED) • Bruxism (American academy of sleep medicine foundation 1998)

According to Age Group
It occurs in both children and adults and it also increases with age, and by the age of 60 years, nearly 34% of population is found to be affected. It is highly found in the age groups ranging from 5 to 9 years, and it increases after 40 years according to Table 14.1 (Trenkwalder et al. 2006). Higher risk of PLMD is found in shift workers, the one who snores, alcohol consumers (Wolters 2007), coffee drinkers, and people with excessive stress and hypnotics (Beena et al. 2015).

Comparison with Other Sleep Disorder
A survey was conducted of RLS (restless leg syndrome)/PLMD disorder with other sleep disorder based on NHP (National Health Portal), and it was found that the number of patients suffering from insomnia were almost double than RLS/PLMD. Less number of patients were found to have no disease. In comparison with all the disease, a greater number of patients suffer from OSA (obstructive sleep apnea). PLMD is closely related with restless leg syndrome (RLS) (Leonard 2017). A study of 133 people found that 80% of those with RLS also had PLMD, but in contradiction, it is not found. Those patients diagnosed with PLMD majorly have RLS (symptoms include tingling, pulling, pain, involuntary movement during daytime). Roughly, 0.9-8.3% of Asian population is affected by RLS. People suffering from insomnia have 20% of PLMD. PLMD in general population is 3.9% that is nearly 3 crores, and it is more prevalent in women. In pediatric population, it is 11.9% and it's highly associated with obstructive sleep apnea (OSA) and attention deficit hyperactivity disorder (ADHD) (Restless Leg Syndrome Fact Sheet 2017).

Review of Patients Diagnosed According to Countries
A survey was conducted by the Delhi Sleep Laboratory on 1000 uremia patients out of which nearly 708 patients have PLMD as mentioned in Table 14.2 (David et al. 2007). Furthermore, another rigorous survey of PLMS was conducted by ICSD (International Classification of Sleep Disorders) criteria suggests as follows: A PLMS index of 5-10 kicks per hour is considered as mild; a PLMS index of 10-25 kicks per hour is categorised as moderate and PLMS index of >25 kicks per hour is stated as severe as shown in Table 14.3.

Polysomnography Laboratory
The Indian Society for Sleep Research (IISR) initially opened more than 300 polysomnography centers across Andhra Pradesh, Delhi, and Mumbai. Currently, it has increased to more than 500 across India. Presently 100 centers are present in Ahmedabad and Delhi and more than that in Mumbai. Our team has collaborated with SPARSH hospital in Ahmedabad where we reviewed multiple patients throughout the whole night although we have found that patient must sleep with various sensors attached through the whole body for two-night sleep study as shown in Fig. 14.2 for the assessment of any abnormal disorder occurring during the night. At the end of two-night sleep study, patients get a report as shown in Fig. 14.3 which states the type of sleep disorder occurred to the patient during the study. Moreover, all these centers are limited up to diagnosis level, and there is no such therapy or treatment provided presently to any of such disorders. So, our major motive is to provide therapy for the same.

Patient Data Reviewed at Sparsh Hospital
During sleep study, the sample data of the patients was taken at the Sparsh Hospital as stated in Table 14.4. This data helps in analyzing the severity of periodic limb movement disorder in certain sets of patients.  (Picchietti and Walters 1999) These are the few drugs associated with treating the respective diseases, but side effects of these drugs cause periodic limb movement disorder, and the percentage of patients affected per year is many because of these drugs.

Limitations
• The abovementioned drugs if taken by the patient for longer period would risk up mental activity and thinking skills or fade memory of individuals. • The average cost of these drugs per pack is nearly 2304 rupees.

Other Medications
PLMD can be treated even with home remedies including hot bath and leg massage, but certainly, it does not alleviate muscle pain or stop involuntary movement. Therefore, most of the patients visit doctors, but doctors prescribe drugs which have many side effects as mentioned above.

Need of Adopting New Technique over Present Medications
If the person is sleeping besides debilitated or old person, it would cause harm to that person. Patients recognized to have PLMD generally find that their muscles are stained or cramped in the morning, and in some cases, it causes lack of sleep and patients regularly wake up tired in morning. Another major drawback of PLMD drugs is excessive sleepiness during daytime. Secondly, increase of nocturnal blood pressure levels both systolic and diastolic is closely associated with PLMD. This situation in turn directly risks up cardiovascular disease and mortality. Globally, nearly 45% of pregnant women are diagnosed with PLMS>5 (mild PLMD rates) and 25% suffers from PLMS>15 (moderate PLMD rates). During PLMS <15, there is no change in systolic blood pressure, but diastolic blood pressure increases, while in PLMS>15, systolic blood pressure decreases, but there is no change in diastolic pressure as mentioned in Fig. 14.4. Higher risk of cardiac arrest has been found in those patients having higher rates of PLMD as nocturnal systolic and diastolic blood pressure shoots higher during this activity (Coelho et al. 2010). Sometimes PLMD is associated with diabetes mellitus, uremia, chronic lung disease, leukemia, essential hypertension, severe congestive heart failure, and multiple sclerosis. In order to avoid increasing effects of such severe diseases, we are trying to essay this approach by comparing physiological signals and providing therapeutic technique. Our present mount is providing a real-time monitoring as well as therapeutics by initially setting up the controller unit for experimental purpose which would be slowly and gradually replaced by wireless setup in future expansion.

Analysis of PLMD by Co-relating Physiological Signal
PLMD sleep disorder could be observed with the help of neurodiagnostic test which is a combined examination of electroencephalograph (EEG) used for the analysis of brain signals and electromyograph (EMG) used for the analysis of muscular movement by simply comparing it with the known standard frequency during sleep. Generally, sleep is classified in two types, that is, rapid eye movement (REM) and non-rapid eye movement (non-REM) (Neil 2017). PLMD mainly shows abnormality during the alpha state of EEG which has a frequency range of 8-12 HZ, and type of sleep is REM. Normally, EMG has negative value referred to as the paralyzed state of the skeleton muscle during REM sleep as shown in Fig. 14.5.
Resting-state potential value of neuron is −70 millivolts, while for the skeleton muscle, it is −95 millivolts as shown in Table 14.5. Exceptionally, EMG depolarizes to some positive value that is 8 microvolts as mentioned in Table 14.6. It signifies the involuntary muscle movement of lower extremities during sleep.
So, our deliberated mechanism is to control the involuntary muscle movement by simply providing electrical stimulation by evoked potentials.

Flow Chart and Methodology
Diagnosis of PLMD is done by co-relating physiological signals. In PLMD, abnormality is found in alpha state of EEG and REM stage of sleep. Apparently according to Fig. 14.6., first placement of electrodes is done on head; it can also be done using electro-cap or button electrodes that are particularly set up on occipital and frontal lobe, a place from where alpha waves of the brain are fetched, and its frequency ranges from 8 to 12 HZ, and this whole setup will continuously monitor EEG signals. While the placement of the second electrode system will continuously measure the flexion movement of lower extremities, it should be placed between the knee and ankle. Accordingly, this portion can again be divided into three parts out of which the most mobile part is the lower one-third, and surface disposable electrodes are placed at longitudinally one-third of the tibial nerve in order to record the abnormal EMG. So, when EEG has frequency ranging between 8 and 12 HZ (alpha  waves) and 20 and 200 microvolt amplitude and 8 micro volts of spike of EMG, it goes to controller unit. Decision-making tool will continuously co-relate both the conditions throughout the night, and whenever the abnormal spike is obtained, reverse polarity of voltage is produces which furthermore goes to stimulating circuit. Here, voltage to current conversion is done and is given back to the posterior tibial nerve as it is connected from the spine to big toe and further it is divided into three segments: 1. Medial plantar 2. Medial calcaneal 3. Lateral plantar This link is continuous till the bottom of the foot. Lastly by performing it, resting potential of the muscle and neurons should be brought back to normal level by converting the polarity of +8 microvolt back to baseline that in turn would successively stop involuntary muscle limb movement.

Future Prospects
It helps in providing significant treatment of rest leg syndrome and somnambulism with minor changes in electronic circuitry (Lamm et al. 2012). Our target market would be adopted at all primary polysomnography centers as well as at sleep study lab in hospitals. Presently, there is no such treatment available for sleep disorders so there is no competition in the market. Worldwide, it can be introduced successfully to healthcare sector especially applicable for sleep disorders. The experimental layout shown in Fig. 14.6 could be designed in more advanced version by replacing it with wireless set-up in order to acheive the market compatibility. Our immense

221
interest is it to alleviate the hurdles by using artificial intelligence system to swap the whole controller unit and provide real-time monitoring and therapy. Polysomnography centers are available wide across but are limited up to diagnosis of sleep disorder with numerous electrodes attached to the body of the patient. Our goal is to deliver real-time analysis and monitoring and control involuntary movement by evoked potentials and the whole complete thing is done at home.
using the said devices prove that there is a need of a preventive measure to ensure a healthy and fit living. This chapter emphasizes the need of a good posture and also the need of a preventive measure in the current scenario where the world is functioning mostly via the technology and digital inventions.

Introduction
We are living in an era that works on computers, spending approximately half of our day sitting in cubicles. The computerized world has increased the rate of spinal disorder (Paris 1980). Poor posture can have wide-ranging detrimental effects on our body, several of which are shoulder, back and neck pain, degenerative disease of vertebral discs, kyphosis, scoliosis and spondylitis. Therefore, paying attention to our posture should be an integral part of our overall health plan. The forward head posture abbreviated as FHP (Haughie 2013), i.e. usually known as 'tech neck' or 'text neck', has also been linked to neurological problems, headaches or migraine and heart disease (Wong 2008). Proper posture is believed to be the state of musculoskeletal balance that involves a minimal amount of stress and strain on the body. Although proper posture is desired, many people do not exhibit good posture. In case of the cervical segment, i.e. from vertebrae 1 to 7 of the spine, one of the most frequently observed abnormal postures in and out of a clinical setting is the forward head posture also known as FHP.
FHP is a condition wherein the head is slightly forward, protruding in an anterior position with respect to the theoretical plumb line of the body. The theoretical plumb line is the imaginary line running perpendicular to the centre of gravity of the body when the body is in the standard anatomical position. The altered posture of the spine decreases the musculature efficiency due to which extra-muscular action is required to maintain the balanced position of the head and neck. As the head keeps on bending forward, there is an increase in the weight of the head, and thus, tension and strain exerted at the base of the cervical spine increase to balance the increasing weight. The bony structures of the cervical column are not that well positioned to support the increasing weight of the head. The moment arm and resistance arm that decide the mechanical advantage of the head-neck lever system keep on varying, resulting in generation of a varying torque force at the base of the cervical spine for maintenance of head-neck balance. Thus, overall stress and strain acting on the supporting structures especially the muscles of the neck increase and affect the spinal curvature (Kendall 1993).

Comparison Between Normal and FHP Posture
In a clinical setting, pain of the neck or back is frequently associated with poor posture. The problem with the condition of forward head posture is mainly the abnormal static posture, which is the body's alignment in an abnormal position for prolonged periods of time. FHP can have numerous possible causes, out of which one is sitting in one position for long periods of time. In this situation, the head begins to slowly weigh more, and it protrudes causing the muscles supporting the head and neck to undergo fatigue after prolonged unsupported and abnormal sitting posture. Hence, as the muscles tire, which is the main support factor, good posture is lost, thereby resulting in the condition of forward head posture.
Also, the weight experienced by the spine dramatically increases when we flex the head forward at varying degrees of craniovertebral angle. Loss of the natural curve of cervical spine leads to increased stress. These stresses may lead to early degeneration, wear, tear and possible requirement of surgeries to correct the abnormality in severe cases. The increasing rate of spinal ailments led to research looking for an alternative and precautionary method, apart from the norm of detecting disorders from X-ray and CT and MRI scans, instruments that would help in preventing spinal disorders and treating them before it reaches its peak (Joseph Mercola 2014). Therefore, the purpose of this survey is to prevent neck pain and misalignment of cervical column, thereby decreasing chances of any severe ailment in the near future.

Forward Head Posture Correction Collar
A forward head posture or cervical collar is available which includes a shoulder collar assembly and chin mastoid piece for positioning the head of the wearer on the collar which serves as a means for interconnecting the chin piece to the collar assembly so as to enable to be manually and preferably adjusted with respect to the shoulder collar assembly in the Z-direction. This helps in adjusting and correcting the supported part of the head of the wearer from the forward head position to the normal or corrected position (New Jersey Patent No. 11/172453, 2006).

FitNeck
Forward head posture or 'text neck' is a significant health risk over time. FitNeck corrects this issue by strengthening specific stabilizer muscles in the neck that have been proven to reverse text neck. By effectively realigning the head back over the shoulders, overall health can be improved dramatically. Designed in collaboration with chiropractors and doctors, it is a posture device that properly strengthens critical neck muscles to realign the head back over your body (Indiegogo, Inc. n.d.).

Demographic Survey Using Google Forms
We conducted a preliminary survey to judge the need of a neck corrective device. The following are the inferences obtained from it: • The responses were mainly from our own age group of students, i.e. 47% from age group 20-30 and 29.7% from the age group of 10-20. • Out of 411 responses, 62.8% of them suffer from neck pain which ranges 1-3 in pain scale. So, the pain is relatively mild. • About 74% of them experience occasional pain and 87.6% of them have not been diagnosed medically. It shows there is a need to prevent such occurrences. • 16.1% of them exhibit correct posture while sitting, whereas 81% exhibit proper posture while standing. So, our focus should be more on making a wearable device which is comfortable to use while the person is sitting.
Hence, the survey conducted via Google forms helped us in assessing the severity of the neck pain in younger generation, its duration, periodicity, whether the pain is related to work or is it a medically diagnosed pain and the measure of the pain scale.
We can thus say that neck pain is a matter of concern among the youth as they are more invested in technology and spend hours sitting in postures while using the said devices, which is a good enough proof to say that indeed there is a need of some kind of a preventive measure to ensure a healthy and fit living.

Editing Neck Model in Graphic User Interface (GUI)
GUI is used only for editing existing OpenSim models. So, we tried to simulate neck flexion by focusing on the muscles of our interest, i.e. sternocleidomastoid. The cervical ROM gives us the craniovertebral angle. According to the simulation, the curve generated has the flexion-extension range of 42.43779° (as shown in Fig. 15.1), whereas the standard value is 40° (Vasavada 1998).

Simulation in Python
OpenSim's functionality can be accessed through the following programming languages: the scripting shell in the OpenSim GUI (which is a Jython interpreter embedded in the application), C++, MATLAB and Python. The simulation has been scripted using Python. Scripting output is a model depicting the neck joint. The code starts with defining the bodies required in the model we want to build, that is, the skull, cervical spine, clavicles and sternum along with the sternocleidomastoid muscles. The defining bodies need functional properties like body name, mass, centre of mass and inertia. Joint simulation, muscle attachments, controller for excitation or movement, displaying visual geometry, configuring model, simulating code and printing/saving the model file are the steps incorporated to build and run the code. The algorithm for neck joint simulation is shown in Fig. 15.2. The functional properties of a joint are defining parent body and child. The Anaconda 2.7 and compiler Spyder version requires Simbody visualizer for displaying the running model. Muscles need functional properties like tendon slack length, optimal force, maximum isometric force, optimal fibre length, pennation angle, etc. Sternomastoid and cleidomastoid muscles, the main muscles that are involved in the flexion and extension movement, have origin at sternum and clavicles respectively, and insertion at the skull base that is the mastoid process or the occipital area. The output of the code is shown in Fig. 15.3.
In the above output, the geometric shapes can be taken as an approximation of the following body parts: Sphere -Skull Grey ellipsoid -cervical spine Blue ellipsoids -clavicles Green ellipsoid -sternum Blue lines -cleidomastoid muscles Red lines -sternomastoid muscles

Applications
The simulation of the spine, in general, and cervical region, in particular, can have impactful influence on developing measures to prevent, diagnose and treat the spinal disorders (Hojun Yeom 2014). The abnormal curvatures of the spine are among the most preventable disorders. Such kind of survey is very useful to identify the

229
target area for designing a preventive and diagnostic device. Also, it makes it easier for us in taking any required and necessary action regarding the cervical disorders (Addison 1990).