**3. Gene expression profiling**

188 Front Lines of Thoracic Surgery

blood results in the blue appearance of affected infants. Defects contributing to this condition include transposition of the great arteries (TGA), tetralogy of Fallot (TOF), tricuspid atresia, pulmonary atresia, Ebstein's anomaly of the tricuspid valve, double outlet right ventricle (DORV), persistent truncus arteriosus (PTA) and total anomalous pulmonary venous connection. The second main type of congenital heart disease, left-sided obstructive lesions, includes hypoplastic left heart syndrome (HLHS), mitral stenosis, aortic stenosis, aortic coarctation and interrupted aortic arch (IAA). The third type of congenital heart disease, septation defects, can affect septation of the atria (atrial septation defects, ASDs), septation of the ventricles (ventricular septal defects, VSDs) or formation of structures in the central part of the heart (atrioventricular septal defects, AVSDs). Other types of congenital defect that do not fit into the above three main categories are bicuspid aortic valve (BAV)

Fig. 1. Congenital heart defects. Diagram of heart illustrating the structures that are affected by congenital heart diseases. AC, aortic coarctation; AS, aortic stenosis; ASD, atrial septal defect; AVSD, atrioventricular septal defect; BAV, bicuspid aortic valve; DORV, double outlet right ventricle; Ebstein's, Ebstein's anomaly of the tricuspid valve; HLHS, hypoplastic left heart syndrome; HRHS, hypoplastic right heart; IAA, interrupted aortic arch; MA, mitral atresia; MS, mitral stenosis; PDA, patent ductus arteriosus; PS, pulmonary artery stenosis; PTA, persistent truncus arteriosus; TA, tricuspid atresia; TAPVR, total anomalous pulmonary venous return; TGA, transposition of the great arteries; TOF, tetralogy of Fallot;

Depending on the severity of the congenital heart disease, mortality and morbidity varies but can be serious. The number of surgeries needed to correct many of the anatomical defects can weaken the affected children and considerably compromise their quality of life. Congenital heart surgery has made tremendous gains over the past 10 years; however, recovery and outcome statistics continue to point out the need for improvements in this increasingly younger patient population. Paediatric patients undergoing cardiac surgery continue to need mechanical assist devices, as well as prolonged inotropic support or an open chest despite a technically perfect repair. Moreover, perioperative myocardial damage with low cardiac output remains the most common cause of morbidity and death after repair of congenital lesions (Hammon, 1995). It is therefore essential to improve our understanding of the genetic mechanisms and pathways associated with the different congenital heart conditions and their response to the surgical stress of ischaemia and

and patent ductus arteriosus (PDA).

VSD, ventricular septal defect.

reperfusion injury and cardiopulmonary bypass.

In the pre-human genome sequencing era, the possibility to identify a relevant set of causative genes for multigenic diseases such as cardiomyopathies was limited. Classical genetic approaches were developed to find single loci or genes with the power to cause Mendelian disorders. In this case, disorders can result from a single base change in the deoxyribonucleic acid (DNA) that leads to significant alteration in protein abundance or function. However in disorders resulting from multiple gene variants that collectively contribute to an individual's multigenic defect, new genomics approaches are needed. With the arrival of such genomics technologies, genes microarrays approach has emerged as a real opportunity allowing the performance of gene expression analysis of disease-relevant tissues. If DNA defines the inherent genetic make up of a person, it is the transcription of the DNA into RNA (that could be translated into protein) that integrates the dynamic interaction of an individual with the environment. Consequently, microarrays provide us with an opportunity to measure mRNA abundance that correlate with a particular disease state, clinical outcome, or therapeutic response, giving us an unprecedented opportunity to investigate the genomic contribution to cardiovascular diseases (Cook and Rosenzweig, 2002; Goldsmith and Dhanasekaran, 2004; Napoli et al., 2003).

#### **3.1 Microarray technology**

Expression profiling is the study of the expression level changes of large numbers of genes simultaneously. The concept of microarray technology is simple: specific DNA sequences, called "probes," are selected to "target" genes of interest. Microarrays refer to solid substrates of glass, plastic, or silicon containing hundreds or thousands of microscopic spots of DNA (Figure 2). Each of the DNA spots, apposed to the solid material, contains hundreds to thousands of identical probes. Each probe has a sequence of nucleotide bases that is complementary and unique to a single gene. Because of the technology's advances, it is common to use microarrays carrying the entire complement of the human genome in today's microarray experiments.

Fig. 2. Schematic of a microarray experiment. cDNA, complementary deoxyribonucleic acid; cRNA, complementary ribonucleic acid.

In a typical microarray experiment, RNA is extracted from a tissue or sample of interest. The small part of the RNA population that represents the transcribed genes, mRNA, is

Gene Expression Profiling – A New Approach in the Study of Congenital Heart Disease 191

alternative approach is the multiple pooling, in which for example 25 samples can be divided into 5 pools of 5 samples, and each pool may be used on distinct microarray platforms. Many

Because of variability of microarray data, each array must be brought into the same scale as others in order to compare 2 or more arrays. This normalization, performed by removing systematic variation between the arrays and rendering different experiments comparable, remains an issue that is not yet fully resolved. Although, many of the early microarray studies in the literature ignored this issue, a statistically rigorous approach is needed. Early software allowed for array-to-array comparisons by using a scaling factor to normalize gene expression patterns across arrays. However, in general, these algorithms assume that intensity differences between arrays are linearly related (Schadt et al., 2000). Such linear relationship often does not hold true. An example of this approach is the global method that

A more advanced normalization approaches have been developed. The invariant set method is based on the assumption that there is a subset of unchanged genes between any two samples compared by microarray analysis so that their fluorescence values can be used to normalize the entire expression data set (Li and Wong, 2001). The LOcally WEighted Scatterplot Smoothing (LOWESS) is a widely used technique based on non-linear regression (Yang et al., 2002). The quantile normalization uses non-parametric procedure to normalize each chip (Bolstad et al., 2003). However, one of the best statistically robust normalization methods is the "Robust Multiarray Average" technique (RMA) (Irizarry et al., 2003). This method corrects microarray data for local background; it normalizes data on normal distribution and uses a linear model to estimate expression values on a log scale. It has been demonstrated that RMA performs better than other normalization technologies for

The fold-change (FC) was the first used method to identify significantly deregulated genes. Although this simple technique can be efficient and effective for focusing on expanding sets of differentially expressed sequences, such an analysis does not take advantage of the full potential of genome-scale experiments to enhance our comprehension of cellular biology that would be provided by an inclusive analysis of the entire repertoire of transcripts in a cell as it goes through a biological process. Additionally, FC is now considered an inadequate statistical test since it does not incorporate variance and offers no associated levels of confidence. To compare groups of data, the parametric t-test and the analysis of variance (ANOVA) are more robust and commonly used. However, due to the small sample size in a microarray experiment, parametric methods are not recommended and should be

Statistical analysis of microarray data faces an important challenge because testing the expression level of tens of thousands of transcripts (multiple testing) may produce hundreds of false positive. Therefore, multiple test corrections should be applied. Since the Bonferroni approach is too conservative in the case of large number of tests, a good alternative is the use of false discovery rate (FDR) (Benjamini and Hochberg, 1995). FDR, defined as the expected number of false positive in a list of genes, is at the present widely used for gene

studies have used pooling. In contrast not many have used multiple pooling.

consists in re-scaling each chip data set by its total intensity.

Affymetrix data (Irizarry et al., 2003).

**3.5 Identification of differentially expressed genes** 

substituted with a non-parametric moderated test.

expression data analysis.

**3.4 Data normalization** 

amplified, and fluorescent tags are attached to each molecule. The labelled mRNA is then incubated with the microarray allowing the tagged mRNA molecules to hybridize to the probes containing the specific complementary sequence of genes. Complementary mRNAs bind to probes on the array during hybridization. The microarray is then washed and placed in a scanner where a laser is directed onto each spot, causing the tags to fluoresce. The resulting fluorescence intensity is proportional to the number of tagged RNAs that have hybridized to the probes in each spot. The measured intensity represents the expression level or activity of that gene. Consequently, an expression profile can be generated for a particular tissue at different stages of health or disease.

The identified gene signatures are useful because they represent an initial step in identifying disease-associated genes, or "candidate genes". Furthermore, they represent a disease biomarker. The genes that comprise the genomic signature are associated with disease susceptibility or a particular treatment outcome. These candidate genes might substantially improve our understanding of disease biology and subsequently lead to identification of potential targets for new treatment strategies (King et al., 2005). The other great potential for transcriptomic information is to use the gene signatures as a disease biomarker. Gene expression data could offer a remarkable detailed patient phenotype that could be used to correctly classify patient populations as to their disease risk or response to therapies. In both cases, genomic data represent a means to distinguish between patients who are otherwise alike by classical clinical variables (Cook and Rosenzweig, 2002; Goldsmith and Dhanasekaran, 2004; Napoli et al., 2003). This is still a work in progress in cardiovascular medical research. But we can look to the cancer research field to see where the future lies.

#### **3.2 Standardization**

Experimental conditions can affect microarray technology in a way that leads to considerable variability and low reproducibility of the results. To make it possible to compare data obtained from different laboratories, efforts have been directed towards the definition of standards in gene expression studies (Stoeckert et al., 2002). These standards cover all experimental steps in microarray investigations and extend from sample selection and experimental design to the functional classification of altered genes. Complex statistical algorithms are increasingly used for data modeling and expression change identification. Additionally, comparative approaches have been proposed to evaluate the performance of various algorithms on gene expression data (Bolstad et al., 2003; Cope et al., 2004; Irizarry et al., 2006).

#### **3.3 Experimental design**

Microarray experiments should always be replicated. There are two types of replications: biological replication, where multiple homogeneous samples are used on multiple arrays, and technical replication, where RNA from the same sample is used on multiple arrays. If biological replication allows the estimation of both measurements and biological variability, technical replication allows only the estimation of measurement variability. Ideally, experimental design should include both types. Although there is no agreement about the optimum number of biological replicates, a minimum of 5 for each group is generally considered as a minimum.

RNA pooling is another crucial issue. Pooling RNA from biological replicates can reduce variability among arrays. Pooling can be necessary for samples of limited quantity. However the analysis of one single pool can be misleading because it prevents the estimation of biological variability and the presence of outlier samples which could change the results. An alternative approach is the multiple pooling, in which for example 25 samples can be divided into 5 pools of 5 samples, and each pool may be used on distinct microarray platforms. Many studies have used pooling. In contrast not many have used multiple pooling.

#### **3.4 Data normalization**

190 Front Lines of Thoracic Surgery

amplified, and fluorescent tags are attached to each molecule. The labelled mRNA is then incubated with the microarray allowing the tagged mRNA molecules to hybridize to the probes containing the specific complementary sequence of genes. Complementary mRNAs bind to probes on the array during hybridization. The microarray is then washed and placed in a scanner where a laser is directed onto each spot, causing the tags to fluoresce. The resulting fluorescence intensity is proportional to the number of tagged RNAs that have hybridized to the probes in each spot. The measured intensity represents the expression level or activity of that gene. Consequently, an expression profile can be generated for a

The identified gene signatures are useful because they represent an initial step in identifying disease-associated genes, or "candidate genes". Furthermore, they represent a disease biomarker. The genes that comprise the genomic signature are associated with disease susceptibility or a particular treatment outcome. These candidate genes might substantially improve our understanding of disease biology and subsequently lead to identification of potential targets for new treatment strategies (King et al., 2005). The other great potential for transcriptomic information is to use the gene signatures as a disease biomarker. Gene expression data could offer a remarkable detailed patient phenotype that could be used to correctly classify patient populations as to their disease risk or response to therapies. In both cases, genomic data represent a means to distinguish between patients who are otherwise alike by classical clinical variables (Cook and Rosenzweig, 2002; Goldsmith and Dhanasekaran, 2004; Napoli et al., 2003). This is still a work in progress in cardiovascular medical research. But we can look to the cancer research field

Experimental conditions can affect microarray technology in a way that leads to considerable variability and low reproducibility of the results. To make it possible to compare data obtained from different laboratories, efforts have been directed towards the definition of standards in gene expression studies (Stoeckert et al., 2002). These standards cover all experimental steps in microarray investigations and extend from sample selection and experimental design to the functional classification of altered genes. Complex statistical algorithms are increasingly used for data modeling and expression change identification. Additionally, comparative approaches have been proposed to evaluate the performance of various algorithms on gene expression

Microarray experiments should always be replicated. There are two types of replications: biological replication, where multiple homogeneous samples are used on multiple arrays, and technical replication, where RNA from the same sample is used on multiple arrays. If biological replication allows the estimation of both measurements and biological variability, technical replication allows only the estimation of measurement variability. Ideally, experimental design should include both types. Although there is no agreement about the optimum number of biological replicates, a minimum of 5 for each group is generally

RNA pooling is another crucial issue. Pooling RNA from biological replicates can reduce variability among arrays. Pooling can be necessary for samples of limited quantity. However the analysis of one single pool can be misleading because it prevents the estimation of biological variability and the presence of outlier samples which could change the results. An

particular tissue at different stages of health or disease.

data (Bolstad et al., 2003; Cope et al., 2004; Irizarry et al., 2006).

to see where the future lies.

**3.3 Experimental design** 

considered as a minimum.

**3.2 Standardization** 

Because of variability of microarray data, each array must be brought into the same scale as others in order to compare 2 or more arrays. This normalization, performed by removing systematic variation between the arrays and rendering different experiments comparable, remains an issue that is not yet fully resolved. Although, many of the early microarray studies in the literature ignored this issue, a statistically rigorous approach is needed.

Early software allowed for array-to-array comparisons by using a scaling factor to normalize gene expression patterns across arrays. However, in general, these algorithms assume that intensity differences between arrays are linearly related (Schadt et al., 2000). Such linear relationship often does not hold true. An example of this approach is the global method that consists in re-scaling each chip data set by its total intensity.

A more advanced normalization approaches have been developed. The invariant set method is based on the assumption that there is a subset of unchanged genes between any two samples compared by microarray analysis so that their fluorescence values can be used to normalize the entire expression data set (Li and Wong, 2001). The LOcally WEighted Scatterplot Smoothing (LOWESS) is a widely used technique based on non-linear regression (Yang et al., 2002). The quantile normalization uses non-parametric procedure to normalize each chip (Bolstad et al., 2003). However, one of the best statistically robust normalization methods is the "Robust Multiarray Average" technique (RMA) (Irizarry et al., 2003). This method corrects microarray data for local background; it normalizes data on normal distribution and uses a linear model to estimate expression values on a log scale. It has been demonstrated that RMA performs better than other normalization technologies for Affymetrix data (Irizarry et al., 2003).

#### **3.5 Identification of differentially expressed genes**

The fold-change (FC) was the first used method to identify significantly deregulated genes. Although this simple technique can be efficient and effective for focusing on expanding sets of differentially expressed sequences, such an analysis does not take advantage of the full potential of genome-scale experiments to enhance our comprehension of cellular biology that would be provided by an inclusive analysis of the entire repertoire of transcripts in a cell as it goes through a biological process. Additionally, FC is now considered an inadequate statistical test since it does not incorporate variance and offers no associated levels of confidence. To compare groups of data, the parametric t-test and the analysis of variance (ANOVA) are more robust and commonly used. However, due to the small sample size in a microarray experiment, parametric methods are not recommended and should be substituted with a non-parametric moderated test.

Statistical analysis of microarray data faces an important challenge because testing the expression level of tens of thousands of transcripts (multiple testing) may produce hundreds of false positive. Therefore, multiple test corrections should be applied. Since the Bonferroni approach is too conservative in the case of large number of tests, a good alternative is the use of false discovery rate (FDR) (Benjamini and Hochberg, 1995). FDR, defined as the expected number of false positive in a list of genes, is at the present widely used for gene expression data analysis.

Gene Expression Profiling – A New Approach in the Study of Congenital Heart Disease 193

More recently, new analysis methods have been developed to help interpreting microarray data within biological and physiopathological contexts. These approaches are gene interaction network analysis and pathway analysis. In gene interaction network analysis (Figure 4), automated text mining software is usually used to scan the scientific literature to identify gene-gene interactions. A human expert can improve the quality of these gene relationships. When a list of differentially expressed genes is presented to the program, a search system returns relevant interaction networks. In pathway analysis, the pathways, made-up of gene-gene interactions, are accepted by the scientific community and entered into the system by human experts. Because of the higher confidence, needed before a set of interacting genes is called a pathway, these databases are smaller but more accurate. It is however possible to identify new pathways based on gene network interactions if the interactions can be validated and accepted by the peer review community. Many bioinformatics programs are now available enabling in-depth analysis of any interrelated biological data, finding common regulators and associating pathway components with like-

Fig. 4. Example of interaction network analysis (Ghorbel et al., 2010). The association

Successful interpretation of expression profiling investigations is likely to be dependent on the integration of experimental data with external information resources. Since the amount of data from multiple studies involving multiple cell types and tissues from multiple research groups is fast growing, data archiving is becoming an important issue. It is ideal if all gene expression data would be deposited in the public domain and are freely accessible.

between the network entities were based on available PubMed citations.

**3.7 Network and pathway analysis** 

behaving biological entities and processes.

**3.8 Data archiving** 

### **3.6 Data organization and presentation**

In order to detect and extract patterns within microarray data, statistical algorithms can be applied. The basic assumption of many expression profiling' investigations is that knowing where and when a gene is expressed provides information about the function of the gene. Therefore, organizing genes on the basis of similarities in their expression profiles is a crucial starting point in the analysis (Bassett et al., 1999). However, similarity of gene expression profile does not guaranty similarity of function or mechanistic pathway, and it may occur entirely by chance. Nonetheless, clustering genes on the basis of their expression patterns is well accepted (Eisen et al., 1998). Typically, cluster analysis is applied when more than 3 experiments' data sets are available. It is generally performed in a two-way method (cluster analysis on genes and samples). Nowadays, cluster analysis is one of the most popular statistical techniques applied to large-scale microarray data.

Another method called discriminant analysis, that is different from the descriptive cluster analysis, belongs to the group of predictive algorithms. At the start of the analysis, the samples are divided into two or more classes. Then based on a training set, where the scientist knows *a priori* the source class of each sample, the algorithms predict the class of new samples using its expression profile.

The different cluster techniques can usefully organize tables of gene expression values. However the resulting organized but still large collection of numbers remains hard to assimilate. Therefore, powerful data visualization methods and tools have been developed. These approaches present clustering results in simple graphical displays such as dendrograms (Figure 3). Dendrograms represent relationships among genes by a tree whose branch lengths reflect the degree of mathematically defined similarity in expression between the genes (Alon et al., 1999). It is possible to make visual assimilation more intuitive by combining clustering methods with representation of each data point with a color that quantitatively and qualitatively reflects the original experimental observations (Eisen et al., 1998).

Fig. 3. Hierarchical clustering of two groups of three samples. Hierarchical groups are displayed on the heat map and dendrogram at the left side of the heat map.

#### **3.7 Network and pathway analysis**

192 Front Lines of Thoracic Surgery

In order to detect and extract patterns within microarray data, statistical algorithms can be applied. The basic assumption of many expression profiling' investigations is that knowing where and when a gene is expressed provides information about the function of the gene. Therefore, organizing genes on the basis of similarities in their expression profiles is a crucial starting point in the analysis (Bassett et al., 1999). However, similarity of gene expression profile does not guaranty similarity of function or mechanistic pathway, and it may occur entirely by chance. Nonetheless, clustering genes on the basis of their expression patterns is well accepted (Eisen et al., 1998). Typically, cluster analysis is applied when more than 3 experiments' data sets are available. It is generally performed in a two-way method (cluster analysis on genes and samples). Nowadays, cluster analysis is one of the most

Another method called discriminant analysis, that is different from the descriptive cluster analysis, belongs to the group of predictive algorithms. At the start of the analysis, the samples are divided into two or more classes. Then based on a training set, where the scientist knows *a priori* the source class of each sample, the algorithms predict the class of

The different cluster techniques can usefully organize tables of gene expression values. However the resulting organized but still large collection of numbers remains hard to assimilate. Therefore, powerful data visualization methods and tools have been developed. These approaches present clustering results in simple graphical displays such as dendrograms (Figure 3). Dendrograms represent relationships among genes by a tree whose branch lengths reflect the degree of mathematically defined similarity in expression between the genes (Alon et al., 1999). It is possible to make visual assimilation more intuitive by combining clustering methods with representation of each data point with a color that quantitatively and

popular statistical techniques applied to large-scale microarray data.

qualitatively reflects the original experimental observations (Eisen et al., 1998).

Fig. 3. Hierarchical clustering of two groups of three samples. Hierarchical groups are

displayed on the heat map and dendrogram at the left side of the heat map.

**3.6 Data organization and presentation** 

new samples using its expression profile.

More recently, new analysis methods have been developed to help interpreting microarray data within biological and physiopathological contexts. These approaches are gene interaction network analysis and pathway analysis. In gene interaction network analysis (Figure 4), automated text mining software is usually used to scan the scientific literature to identify gene-gene interactions. A human expert can improve the quality of these gene relationships. When a list of differentially expressed genes is presented to the program, a search system returns relevant interaction networks. In pathway analysis, the pathways, made-up of gene-gene interactions, are accepted by the scientific community and entered into the system by human experts. Because of the higher confidence, needed before a set of interacting genes is called a pathway, these databases are smaller but more accurate. It is however possible to identify new pathways based on gene network interactions if the interactions can be validated and accepted by the peer review community. Many bioinformatics programs are now available enabling in-depth analysis of any interrelated biological data, finding common regulators and associating pathway components with likebehaving biological entities and processes.

Fig. 4. Example of interaction network analysis (Ghorbel et al., 2010). The association between the network entities were based on available PubMed citations.

#### **3.8 Data archiving**

Successful interpretation of expression profiling investigations is likely to be dependent on the integration of experimental data with external information resources. Since the amount of data from multiple studies involving multiple cell types and tissues from multiple research groups is fast growing, data archiving is becoming an important issue. It is ideal if all gene expression data would be deposited in the public domain and are freely accessible.

Gene Expression Profiling – A New Approach in the Study of Congenital Heart Disease 195

complexity of the response. Nevertheless, the general observation that the human neonatal heart copes well with dramatic degrees of hypoxia and hypertrophy is in accordance with these experimental findings. To determine the molecular basis of this putatively adaptive response, Konstantinov et al. conducted microarray-based differential gene expression profiling on tissue samples acquired in patients of varying ages undergoing repair of right ventricular (RV) obstructive heart lesions, including TOF and focusing on potential agerelated differences (Konstantinov et al., 2004). Their findings seem to confirm the existence of a protective reprogramming response that is most evident in the neonatal myocardium and is subject to the hemodynamic and metabolic stress imposed by structural congenital heart disease (Konstantinov et al., 2004). Indeed this study showed that neonatal myocardium has a unique pattern of gene expression dominated by genes with cardioprotective, antihypertrophic and antiproliferative properties reflecting a stress-

Another investigation examined the differential gene expression profile during RVH in the developing heart with TOF congenital defect (Sharma et al., 2006). Using high-density DNA microarray, the authors showed that more than 200 myocardial genes are up- or downregulated in patients with TOF (Sharma et al., 2006). Among other genes, the expression of ECM proteins, such as collagens and fibronectin, was predominantly elevated, whereas MMP and TIMP expression either remained unchanged or decreased in TOF patients (Sharma et al., 2006). These results indicate for the myocardial fibrosis that may account for diminished function in patients with TOF (Sharma et al., 2006). They also provide further evidence that myocardial architecture in patients with TOF depict a complex and differential gene expression pattern with drastically increased expression level of ECM proteins (collagen Iα and III and fibronectin) mRNA and the VEGF/VEGF-R system (Sharma et al., 2006). The authors concluded that this up-regulation of genes involved in ECM homeostasis is associated with RVH and diminished cardiac function in TOF patients. Furthermore, the VEGF/VEGF-receptor (R) system could play an important role in enhanced myocardial angiogenesis that could be stunted due to limited vascular

In contrast to the previous investigation examining gene expression changes in cyanotic TOF in comparison to normal hearts (Sharma et al., 2006), we recently determined the global gene expression profiles associated with chronic hypoxia by comparing gene expression of cyanotic and acyanotic patients with TOF (Ghorbel et al., 2010). Our data showed that, overall, the transcriptional profile in the cyanotic group was characterized by increased expression level of genes with literature-validated apoptosis and growth/morphogenesis/remodeling properties. It also showed decreased expression levels of genes with cardiac function, cell survival, and cytoprotective properties (Ghorbel et al., 2010). The molecular signatures identified suggest a reprogramming response in the cyanotic myocardium activated by the chronic hypoxia imposed by the structural congenital heart disease (Ghorbel et al., 2010). The difference between the adaptive and injury-related

In addition to investigating the gene expression profiling of the myocardium in response to congenital heart defect, other studies focused on gene expression alterations in response to surgery. One of these studies examined the gene expression profiles during intra-operative myocardial ischemia-reperfusion in corrective cardiac surgery of ventricular septal defect (Arab et al., 2007). It described the sequential changes in gene expression in the human ventricle during surgically imposed ischemia-reperfusion. The annotation of several genes

induced protective program (Konstantinov et al., 2004).

remodelling (Sharma et al., 2006).

responses would dictate the overall fate of heart cells.

Such endeavour would require a user-friendly and powerful database system and standardization of correction and normalization procedures to allow comparison of data sets from various projects (Granjeaud et al., 1999). Public data repositories have been developed at the European Bioinformatics Institute (ArrayExpress) and the National Center for Biotechnology Information (Gene Expression Omnibus), institutions that pursue these internationally recognized standards. Several scientific journals now request submission of microarray data into databases as a prerequisite for manuscript publication.

#### **4. Exploring transcriptomic alterations in congenital heart diseases**

The refinement of the human genome sequence and its associated annotation will soon have a great impact on the diagnosis and treatment of cardiovascular diseases when this information is coupled to the application of new technologies such as DNA microarrays. The latter provides a genomic approach to explore the genetic markers and molecular mechanisms leading to congenital heart disease and heart failure.

The first genome-wide gene expression study of congenitally malformed hearts in humans attempted to identify genes associated with dysdevelopment as well as genes involved in adaptation processes of the heart (Kaynak et al., 2003). The authors of this study, examined and compared genes dysregulated in defined congenitally malformed hearts with the molecular response to pressure overload leading to hypertrophy and the chamber-specific cardiac molecular portrait (Kaynak et al., 2003). By comparing the gene expression in atria and ventricle, this study found diverse previously unknown chamber-specific genes for muscle contraction, extracellular components, cell growth and differentiation, and energy metabolism (Kaynak et al., 2003). This is in addition to well-known chamber-specific genes, like atrial and ventricular myosin light chains. The comparison of Tetralogy of Fallot (TOF) and Right ventricular Hypertrophy (RVH), showed a distinct molecular portraits of TOF and RVH with genes of various functional classes (Kaynak et al., 2003) even though the right ventricular hypertrophy is part of TOF. Beside genes involved in cell cycle, a characteristic feature of the TOF signature is the upregulation of ribosomal proteins (Kaynak et al., 2003). Whereas a hypertrophy-specific gene expression pattern of genes mainly involved in stress response, cell proliferation, and metabolism was observed in RVH (Kaynak et al., 2003). In addition, to obtain a molecular portrait that is not influenced by biomechanical adaptation processes, RA samples of patients with ventricular septal defect (VSD), intact tricuspid valve, and normal RA pressure were studied (Kaynak et al., 2003). A VSD-specific molecular signature dominated by downregulated genes with respect to the other RA samples was shown (Kaynak et al., 2003). A thorough literature study of downregulated genes in VSD revealed that a major part is involved in cell proliferation and differentiation during embryogenesis as well as apoptosis (Kaynak et al., 2003). This investigation indicated distinct gene expression profiles associated with tetralogy of Fallot, ventricular septal defect, and right ventricular hypertrophy (Kaynak et al., 2003). Furthermore, the study design allowed the suggestion that alterations associated with primary genetic abnormalities can be distinguished from those associated with the adaptive response of the heart to the malformation (right ventricular pressure overload hypertrophy) (Kaynak et al., 2003).

The evidence for the developmentally regulated capacity of the immature heart to generate an adaptive response to reduced oxygen availability, although compelling, is derived largely from acute experimental interventions in animal models. Further, the reductionist approach of these studies, which typically address a single target, fails to capture the global

Such endeavour would require a user-friendly and powerful database system and standardization of correction and normalization procedures to allow comparison of data sets from various projects (Granjeaud et al., 1999). Public data repositories have been developed at the European Bioinformatics Institute (ArrayExpress) and the National Center for Biotechnology Information (Gene Expression Omnibus), institutions that pursue these internationally recognized standards. Several scientific journals now request submission of

The refinement of the human genome sequence and its associated annotation will soon have a great impact on the diagnosis and treatment of cardiovascular diseases when this information is coupled to the application of new technologies such as DNA microarrays. The latter provides a genomic approach to explore the genetic markers and molecular

The first genome-wide gene expression study of congenitally malformed hearts in humans attempted to identify genes associated with dysdevelopment as well as genes involved in adaptation processes of the heart (Kaynak et al., 2003). The authors of this study, examined and compared genes dysregulated in defined congenitally malformed hearts with the molecular response to pressure overload leading to hypertrophy and the chamber-specific cardiac molecular portrait (Kaynak et al., 2003). By comparing the gene expression in atria and ventricle, this study found diverse previously unknown chamber-specific genes for muscle contraction, extracellular components, cell growth and differentiation, and energy metabolism (Kaynak et al., 2003). This is in addition to well-known chamber-specific genes, like atrial and ventricular myosin light chains. The comparison of Tetralogy of Fallot (TOF) and Right ventricular Hypertrophy (RVH), showed a distinct molecular portraits of TOF and RVH with genes of various functional classes (Kaynak et al., 2003) even though the right ventricular hypertrophy is part of TOF. Beside genes involved in cell cycle, a characteristic feature of the TOF signature is the upregulation of ribosomal proteins (Kaynak et al., 2003). Whereas a hypertrophy-specific gene expression pattern of genes mainly involved in stress response, cell proliferation, and metabolism was observed in RVH (Kaynak et al., 2003). In addition, to obtain a molecular portrait that is not influenced by biomechanical adaptation processes, RA samples of patients with ventricular septal defect (VSD), intact tricuspid valve, and normal RA pressure were studied (Kaynak et al., 2003). A VSD-specific molecular signature dominated by downregulated genes with respect to the other RA samples was shown (Kaynak et al., 2003). A thorough literature study of downregulated genes in VSD revealed that a major part is involved in cell proliferation and differentiation during embryogenesis as well as apoptosis (Kaynak et al., 2003). This investigation indicated distinct gene expression profiles associated with tetralogy of Fallot, ventricular septal defect, and right ventricular hypertrophy (Kaynak et al., 2003). Furthermore, the study design allowed the suggestion that alterations associated with primary genetic abnormalities can be distinguished from those associated with the adaptive response of the heart to the malformation (right ventricular pressure overload hypertrophy) (Kaynak et al., 2003). The evidence for the developmentally regulated capacity of the immature heart to generate an adaptive response to reduced oxygen availability, although compelling, is derived largely from acute experimental interventions in animal models. Further, the reductionist approach of these studies, which typically address a single target, fails to capture the global

microarray data into databases as a prerequisite for manuscript publication.

mechanisms leading to congenital heart disease and heart failure.

**4. Exploring transcriptomic alterations in congenital heart diseases** 

complexity of the response. Nevertheless, the general observation that the human neonatal heart copes well with dramatic degrees of hypoxia and hypertrophy is in accordance with these experimental findings. To determine the molecular basis of this putatively adaptive response, Konstantinov et al. conducted microarray-based differential gene expression profiling on tissue samples acquired in patients of varying ages undergoing repair of right ventricular (RV) obstructive heart lesions, including TOF and focusing on potential agerelated differences (Konstantinov et al., 2004). Their findings seem to confirm the existence of a protective reprogramming response that is most evident in the neonatal myocardium and is subject to the hemodynamic and metabolic stress imposed by structural congenital heart disease (Konstantinov et al., 2004). Indeed this study showed that neonatal myocardium has a unique pattern of gene expression dominated by genes with cardioprotective, antihypertrophic and antiproliferative properties reflecting a stressinduced protective program (Konstantinov et al., 2004).

Another investigation examined the differential gene expression profile during RVH in the developing heart with TOF congenital defect (Sharma et al., 2006). Using high-density DNA microarray, the authors showed that more than 200 myocardial genes are up- or downregulated in patients with TOF (Sharma et al., 2006). Among other genes, the expression of ECM proteins, such as collagens and fibronectin, was predominantly elevated, whereas MMP and TIMP expression either remained unchanged or decreased in TOF patients (Sharma et al., 2006). These results indicate for the myocardial fibrosis that may account for diminished function in patients with TOF (Sharma et al., 2006). They also provide further evidence that myocardial architecture in patients with TOF depict a complex and differential gene expression pattern with drastically increased expression level of ECM proteins (collagen Iα and III and fibronectin) mRNA and the VEGF/VEGF-R system (Sharma et al., 2006). The authors concluded that this up-regulation of genes involved in ECM homeostasis is associated with RVH and diminished cardiac function in TOF patients. Furthermore, the VEGF/VEGF-receptor (R) system could play an important role in enhanced myocardial angiogenesis that could be stunted due to limited vascular remodelling (Sharma et al., 2006).

In contrast to the previous investigation examining gene expression changes in cyanotic TOF in comparison to normal hearts (Sharma et al., 2006), we recently determined the global gene expression profiles associated with chronic hypoxia by comparing gene expression of cyanotic and acyanotic patients with TOF (Ghorbel et al., 2010). Our data showed that, overall, the transcriptional profile in the cyanotic group was characterized by increased expression level of genes with literature-validated apoptosis and growth/morphogenesis/remodeling properties. It also showed decreased expression levels of genes with cardiac function, cell survival, and cytoprotective properties (Ghorbel et al., 2010). The molecular signatures identified suggest a reprogramming response in the cyanotic myocardium activated by the chronic hypoxia imposed by the structural congenital heart disease (Ghorbel et al., 2010). The difference between the adaptive and injury-related responses would dictate the overall fate of heart cells.

In addition to investigating the gene expression profiling of the myocardium in response to congenital heart defect, other studies focused on gene expression alterations in response to surgery. One of these studies examined the gene expression profiles during intra-operative myocardial ischemia-reperfusion in corrective cardiac surgery of ventricular septal defect (Arab et al., 2007). It described the sequential changes in gene expression in the human ventricle during surgically imposed ischemia-reperfusion. The annotation of several genes

Gene Expression Profiling – A New Approach in the Study of Congenital Heart Disease 197

Bassett, D.E., Jr., Eisen, M.B., and Boguski, M.S. (1999). Gene expression informatics--it's all

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate - A parctical and powerful approach for multicomparison testing. J R Stat Soc *57*, 289-300. Bolstad, B.M., Irizarry, R.A., Astrand, M., and Speed, T.P. (2003). A comparison of

Bruneau, B.G. (2008). The developmental genetics of congenital heart disease. Nature *451*,

Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. (2003). A vision for the future

Cook, S.A., and Rosenzweig, A. (2002). DNA microarrays: implications for cardiovascular

Cooper, W.O., Hernandez-Diaz, S., Arbogast, P.G., Dudley, J.A., Dyer, S., Gideon, P.S., Hall,

Cope, L.M., Irizarry, R.A., Jaffee, H.A., Wu, Z., and Speed, T.P. (2004). A benchmark for Affymetrix GeneChip expression measures. Bioinformatics *20*, 323-331. Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998). Cluster analysis and

Ghorbel, M.T., Cherif, M., Jenkins, E., Mokhtari, A., Kenny, D., Angelini, G.D., and Caputo,

Goldsmith, Z.G., and Dhanasekaran, N. (2004). The microrevolution: applications and

Granjeaud, S., Bertucci, F., and Jordan, B.R. (1999). Expression profiling: DNA arrays in

Hammon, J.W., Jr. (1995). Myocardial protection in the immature heart. Ann Thorac Surg *60*,

Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., and

Irizarry, R.A., Wu, Z., and Jaffee, H.A. (2006). Comparison of Affymetrix GeneChip

Jenkins, K.J., Correa, A., Feinstein, J.A., Botto, L., Britt, A.E., Daniels, S.R., Elixson, M.,

oligonucleotide array probe level data. Biostatistics *4*, 249-264.

expression measures. Bioinformatics *22*, 789-794.

exposure to ACE inhibitors. N Engl J Med *354*, 2443-2451.

K., and Ray, W.A. (2006). Major congenital malformations after first-trimester

display of genome-wide expression patterns. Proc Natl Acad Sci U S A *95*, 14863-

M. (2010). Transcriptomic analysis of patients with tetralogy of Fallot reveals the effect of chronic hypoxia on myocardial gene expression. J Thorac Cardiovasc Surg

impacts of microarray technology on molecular biology and medicine (review). Int

Speed, T.P. (2003). Exploration, normalization, and summaries of high density

Warnes, C.A., and Webb, C.L. (2007). Noninherited risk factors and congenital cardiovascular defects: current knowledge: a scientific statement from the American Heart Association Council on Cardiovascular Disease in the Young: endorsed by the American Academy of Pediatrics. Circulation *115*, 2995-3014.

Cardiovasc Surg *134*, 74-81, 81 e71-72.

Bell, J. (2004). Predicting disease using genomics. Nature *429*, 453-456.

variance and bias. Bioinformatics *19*, 185-193.

of genomics research. Nature *422*, 835-847.

medicine. Circ Res *91*, 559-564.

in your mine. Nat Genet *21*, 51-55.

943-948.

14868.

839-842.

*140*, 337-345 e326.

J Mol Med *13*, 483-495.

many guises. BioEssays *21*, 781-790.

intraoperative myocardial ischemia-reperfusion in cardiac surgery. J Thorac

normalization methods for high density oligonucleotide array data based on

exhibiting differential expression suggested that the elicited transcriptional response in this context is compensatory and adaptive. In silico functional clustering of several genes comprising this response revealed dominant up-regulation of transcripts encoding elements of pro-hypertrophic cellular growth factor pathways, involving multiple levels of regulation, including receptors, cognate signaling kinases, and programmatically linked transcription factors. The majority of genes up-regulated in response to cardioplegic arrest have literature-confirmed cytoprotective properties, including several that have been previously validated as endogenous mediators of ischemic preconditioning. This study provided evidence that myocardial ischemic stress associated with repair of VSD induces a net protective transcriptional response (Arab et al., 2007). It showed that reversible myocardial ischemia-reperfusion during cardiac surgery is associated with an immediate genomic response that predicts a net cardioprotective phenotype (Arab et al., 2007).

The molecular signatures identified with microarray technology can be interpreted as either mechanistically relevant to the congenital heart disease pathogenesis or as markers of disease progression. We believe that this approach can also be used to identify endogenous patterns of gene profiles that are activated in response to the primary disease-causing pathway and have the effect of generating a counteracting and highly adaptive pattern of gene activation, which serves to suppress aberrant disease-related molecular pathways.
