**Molecular Portrait of Clear Cell Renal Cell Carcinoma: An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling**

Cristina Battaglia et al.\*

*Dept. of Biomedical Sciences and Technologies, University of Milano, Milano, Doctoral School of Molecular Medicine, University of Milano, Milano, Italy* 

#### **1. Introduction**

22 Emerging Research and Treatments in Renal Cell Carcinoma

Srinivasan, R.; Choueiri, T.; Vaishampayan, U. (2008). A Phase II Study of the Dual

Sudarshan, S.; Sourbier, C.; Kong, H. (2009). Fumarate Hydratase Deficiency in Renal Cancer

Analysis with Pathogenetic Implications. *Urology,* Vol.60, pp. 1083–1089. Thomas, G.; Tran, C.; Mellinghoff, I. (2006). Hypoxia-inducible Factor Determines

Tomlinson, G.; Nisen, P.; Timmons, C. & Schneider, N. (1991). Cytogenetics of a Renal Cell

Tomlinson, I.; Alam, N.; Rowan, A. (2002). Germline Mutations in FH Predispose to

Turcotte, S. *et al.* (2008). A Molecule Targeting VHL-deficient Renal Cell Carcinoma that

Vocke, C.; Yang, Y.; Pavlovich, C. (2005). High Frequency of Somatic Frameshift BHD Gene

Washecka, R. & Hanna, M. (1991). Malignant Renal Tumors in Tuberous Sclerosis. *Urology*,

Yang, X.; Zhou, M.; Hes, O. (2008). Tubulocystic Carcinoma of the Kidney: Clinicopathologic and Molecular Characterization. *Am. J. Surg. Pathol*., Vol.32, pp. 177–187.

Breakpoint. *Cancer Genet. Cytogenet*., Vol.57, pp. 11–17.

Induces Autophagy. *Cancer Cell* Vol.14, pp. 90–102.

Cancer. *Nat. Genet.* Vol.30, pp. 406–410.

pp. 931–935.

Vol.37, No.4, pp. 340–343.

*Oncol.* Vol.27 (Suppl), pp.15s.

MET/VEGFR2 Inhibitor XL880 in Patients with Papillary Renal Carcinoma. *J. Clin.* 

Induces Glycolytic Addiction and HIF-1α Stabilization by Glucose-dependent Generation of Reactive Oxygen Species. *Mol. Cell Biol.* Vol*.*15, pp. 4080–4090. Swartz, M.; Karth, J.; Schneider, D.; Rodriguez, R.; Beckwith, J. & Perlman, E. (2002). Renal

Medullary Carcinoma: Clinical, Pathologic, Immunohistochemical, and Genetic

Sensitivity to Inhibitors of mTOR in Kidney Cancer. *Nat. Med.,* Vol.12, pp. 122–127.

Carcinoma in a 17-month-old Child. Evidence for Xp11.2 as a Recurring

Dominantly Inherited Uterine Fibroids, Skin Leiomyomata and Papillary Renal Cell

Mutations in Birt–Hogg–Dube Associated Renal Tumors. *J Natl Cancer Inst*, Vol.97,

Renal cell carcinoma (RCC) incidence accounts for about 3 to 10 cases per 100,000 individuals with a predilection for adult males over 60 year old (1.6:1 male/female ratio) (Chow, 2010; Nese, 2009). In Europe, about 60,000 individuals are affected by RCC every year, with a mortality rate of about 18,000 subjects and an incidence rate for all stages steadily rising over the last three decades. Although inherited forms occur in a number of familial cancer syndromes, as the well-known von Hippel-Lindau (VHL) syndrome, RCC is commonly sporadic (Cohen & McGovern, 2005; Kaelin, 2007) and, as recently highlighted by the National Cancer Institute (NCI), influenced by the interplay between exposure to environmental risk factors and genetic susceptibility of exposed individuals (Chow et al., 2010). Being poorly symptomatic in early phases, many cases become clinically detectable only when already advanced and, as such, therapy-resistant (Motzer, 2011). Based on histology, RCC can be classified into several subtypes, i.e., clear cell (80% of cases), papillary (10%), chromophobe (5%) and oncocytoma (5%), each one characterized by specific histopathological features, malignant potential and clinical outcome (Cohen & McGovern, 2005). Patient stratification is normally achieved using prognostic algorithms and nomograms based on multiple clinico-pathological factors such as TNM stage, Fuhrman nuclear grade, tumor size, performance status, necrosis and other hematological indices (Flanigan et al., 2011), although the most efficient predictors of survival and recurrence are based on nuclear grade alone (Nese et al., 2009). As recently reviewed by Brannon et al. (Brannon & Rathmell, 2010), a finer RCC subtype classification could be obtained exploiting the vast amount of


<sup>\*</sup> Eleonora Mangano3, Silvio Bicciato4, Fabio Frascati3, Simona Nuzzo4, Valentina Tinaglia1,2, Cristina Bianchi5, Roberto A. Perego5 and Ingrid Cifola3

*<sup>1</sup>Dept. of Biomedical Sciences and Technologies, University of Milano, Milano, Italy;* 

*<sup>2</sup>Doctoral School of Molecular Medicine, University of Milano, Milano, Italy 3Institute for Biomedical Technologies, National Research Council, Segrate, Italy 4Center for Genome Research, University of Modena and Reggio Emilia, Modena, Italy; 5Dept. of Experimental Medicine, University of Milano-Bicocca, Milano, Italy* 

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

tumorigenesis and malignant progression (Zhang et al., 2010b).

mechanisms (Gordan et al., 2008).

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 25

wild-type VHL (and active VHL function) present a peculiar pattern of altered genes, suggesting the involvement of other, still partially unknown, alternative regulatory

At DNA level, studies based on traditional cytogenetic and comparative genomic hybridization (CGH) techniques identified a panel of chromosomal aberrations typical of ccRCC (Höglund, 2004; Klatte, 2009). Moreover, high-density single nucleotide polymorphism (SNP) array technology, interrogating thousands of SNP markers distributed throughout the whole human genome, has significantly improved the detection of chromosomal aberrations and offered the opportunity to detect regions with loss of heterozygosity (LOH), an important information for the identification of novel tumor suppressor genes. SNP-arrays have been widely applied to characterize tumor genomic instability (Brenner & Rosenberg, 2010; Lisovich, 2011) and recently to perform the genomewide DNA profiling of ccRCC tissue samples (Beroukhim, 2009; Chen, 2009; Cifola, 2008). Overall, ccRCC is characterized by recurrent genetic anomalies at characteristic chromosomes, such as deletions with LOH on chromosomes 3p (involving also the VHL locus), 6q, 8p, 9p, and 14q, and duplications of chromosomes 5q and 7. Many evidences suggest that this peculiar pattern of genomic instability represents a tumor-specific molecular fingerprint that has a role in cancer pathogenesis and may be useful in diagnostic and prognostic applications (Gunawan, 2001; Klatte, 2009; Perego, 2008). Furthermore, a comprehensive study showed that cytogenetic alterations could be associated to ccRCC

Advances in high-throughput genome-wide profiling technologies allowed an unprecedented comprehensive view of the cancer genome landscape. In particular, highdensity microarrays and sequencing-based strategies have been widely used to identify genetic (gene dosage, allelic status, and mutations in gene sequence) and epigenetic (DNA methylation, histone modification, and microRNA) aberrations in cancer (Majewski & Bernards, 2011). The integrative approach of analyzing parallel dimensions has enabled the identification of genes that are often disrupted by multiple mechanisms but at low frequencies by any one mechanism and of pathways that are often disrupted at multiple components but at low frequencies at individual components (Chari et al., 2010). In these last years, there is an increasing tendency to combine genome-wide DNA copy number (CN) analysis with transcriptional profiles to investigate how alterations in DNA content (aneuploidy) can influence global expression patterns. In cancer research, this combined approach helps filtering the large amount of array-based data and, by narrowing down the hundreds of differentially expressed genes to those whose altered expression is attributable to underlying chromosomal alterations, allows highlighting candidate genes that are actively involved in the causation or maintenance of the malignant phenotype. This approach was applied in a wide range of tumor types, including breast (Hyman, 2002; Pollack, 2002), bladder (Harding et al., 2002), prostate (Saramäki et al., 2006), pancreas (Heidenblad et al., 2005), rectal (Grade et al., 2006) and melanoma (Akavia et al., 2010), demonstrating a strong genome-wide correlation between aneuploidy-associated genomic imbalances and global gene expression levels. Most studies focused on amplified and overexpressed genes and calculated that a fraction ranging from 44% to 62% of amplified genes showed concomitant up-regulated expression levels (Hyman et al., 2002). This suggests the presence of an aneuploidy-induced deregulation of the cancer transcriptome that occurs in

genomic and transcriptional data that have been presented in numerous studies. For instance, several authors proposed a molecular classification of RCC based on differential gene expression profiles, with any subtype characterized by the activation of distinct gene sets (Brannon, 2010; Furge, 2004; Skubitz, 2006; Sültmann, 2005; Zhang, 2008), while others identified RCC-specific biomarkers (e.g. CA9, ki67, VEGF proteins, phosphorylated AKT, PTEN, HIF-1). Lately, it has been reported that microRNAs, a small class of non coding RNA molecules, could contribute to RCC development at different levels and may represent a new group of potential tumor biomarkers (Redova et al., 2011). Despite the numerous efforts in dissecting the molecular features of RCC through functional genomics, not a single transcriptional signature or biomarker has gained approval for clinical application yet (Arsanious, 2009; Eichelberg, 2009; Lam, 2007; Yin-Goen, 2006), so that the identification of novel molecular markers to improve early diagnosis and prognostic prediction and of candidate targets to develop new therapeutic approaches remains of primary importance for this pathology.

Among the RCC histotypes, clear cell renal carcinoma (ccRCC) is the most frequent and aggressive subtype and is characterized by a specific pattern of chromosomal alterations (Yoshimoto et al., 2007) that represents a molecular fingerprint potentially useful for diagnostic and prognostic applications (Klatte et al., 2007). Nowadays, the standard clinical treatment comprises surgical resection followed by IFN- and/or IL2-based immunotherapy, although therapy toxicity still represents a major problem (Molina & Motzer, 2011). The development of approaches targeting specific biological pathways, typically deregulated in this tumor, is opening the way to new opportunities for therapeutic intervention (Pal et al., 2010). One of the most investigated processes is the hypoxia pathway (Cohen & McGovern2005; Kaelin, 2007; Wouters & Koritzinsky, 2008) that is genetically linked to ccRCC through one of its key players, i.e., the VHL (von Hippel-Lindau) gene, completely inactivated in all inherited forms and in 80% of sporadic cases. Cloned in 1993, the VHL gene (located at the 3p25.3 locus) is currently known as the main tumor suppressor gene involved in the very early steps of RCC pathogenesis (Banks et al., 2006). Normally, the VHL function is to ubiquinate the two hypoxia-inducible factors HIF-1 and HIF-2, addressing them to proteasome degradation (Kaelin, 2008). In ccRCC, the bi-allelic VHL inactivation, by combination of deletion and mutation/methylation (Banks et al., 2006), prevents the degradation of HIF-1 and HIF-2 that, in turn, can activate the transcription of a series of hypoxia-inducible genes, such as VEGF, VEGFR, EGFR, PDGF, IGF, GLUT-1, CXCR4, TGF- , CA9 and EPO, involved in processes like angiogenesis, survival, cell motility, pHregulation and glucose metabolism (Baldewijns et al., 2010). The complete loss of VHL function results in the up-regulation of a panel of genes that contributes to the ccRCC phenotype and represents a list of potential prognostic markers (Klatte et al., 2007) and/or therapeutic targets (Gong et al., 2010). Additionally, the transcription factor HIF-1 is commonly activated in cancer (Semenza, 2008) and is linked to oncogenic/tumor suppressor molecules implicated in cross-communication, such as the tubular sclerosis complex (TSC) and the mammalian target of rapamycin (mTOR) (Maxwell, 2005). As such, ccRCC represents an ideal model for developing novel targeted therapies directed against the hypoxia pathway and many molecules are already used in clinical trials targeting either HIF-1, or the upstream pathways regulating HIF (as the Akt-mTOR signal transduction pathway), or the downstream genes induced by HIF (e.g., VEGF and VEGFR) (Baldewijns et al., 2010). Intriguingly, recent evidences indicate that also 20% of RCC sporadic cases with

genomic and transcriptional data that have been presented in numerous studies. For instance, several authors proposed a molecular classification of RCC based on differential gene expression profiles, with any subtype characterized by the activation of distinct gene sets (Brannon, 2010; Furge, 2004; Skubitz, 2006; Sültmann, 2005; Zhang, 2008), while others identified RCC-specific biomarkers (e.g. CA9, ki67, VEGF proteins, phosphorylated AKT, PTEN, HIF-1). Lately, it has been reported that microRNAs, a small class of non coding RNA molecules, could contribute to RCC development at different levels and may represent a new group of potential tumor biomarkers (Redova et al., 2011). Despite the numerous efforts in dissecting the molecular features of RCC through functional genomics, not a single transcriptional signature or biomarker has gained approval for clinical application yet (Arsanious, 2009; Eichelberg, 2009; Lam, 2007; Yin-Goen, 2006), so that the identification of novel molecular markers to improve early diagnosis and prognostic prediction and of candidate targets to develop new therapeutic approaches remains of primary importance for

Among the RCC histotypes, clear cell renal carcinoma (ccRCC) is the most frequent and aggressive subtype and is characterized by a specific pattern of chromosomal alterations (Yoshimoto et al., 2007) that represents a molecular fingerprint potentially useful for diagnostic and prognostic applications (Klatte et al., 2007). Nowadays, the standard clinical treatment comprises surgical resection followed by IFN- and/or IL2-based immunotherapy, although therapy toxicity still represents a major problem (Molina & Motzer, 2011). The development of approaches targeting specific biological pathways, typically deregulated in this tumor, is opening the way to new opportunities for therapeutic intervention (Pal et al., 2010). One of the most investigated processes is the hypoxia pathway (Cohen & McGovern2005; Kaelin, 2007; Wouters & Koritzinsky, 2008) that is genetically linked to ccRCC through one of its key players, i.e., the VHL (von Hippel-Lindau) gene, completely inactivated in all inherited forms and in 80% of sporadic cases. Cloned in 1993, the VHL gene (located at the 3p25.3 locus) is currently known as the main tumor suppressor gene involved in the very early steps of RCC pathogenesis (Banks et al., 2006). Normally, the VHL function is to ubiquinate the two hypoxia-inducible factors HIF-1 and HIF-2, addressing them to proteasome degradation (Kaelin, 2008). In ccRCC, the bi-allelic VHL inactivation, by combination of deletion and mutation/methylation (Banks et al., 2006), prevents the degradation of HIF-1 and HIF-2 that, in turn, can activate the transcription of a series of hypoxia-inducible genes, such as VEGF, VEGFR, EGFR, PDGF, IGF, GLUT-1, CXCR4, TGF- , CA9 and EPO, involved in processes like angiogenesis, survival, cell motility, pHregulation and glucose metabolism (Baldewijns et al., 2010). The complete loss of VHL function results in the up-regulation of a panel of genes that contributes to the ccRCC phenotype and represents a list of potential prognostic markers (Klatte et al., 2007) and/or therapeutic targets (Gong et al., 2010). Additionally, the transcription factor HIF-1 is commonly activated in cancer (Semenza, 2008) and is linked to oncogenic/tumor suppressor molecules implicated in cross-communication, such as the tubular sclerosis complex (TSC) and the mammalian target of rapamycin (mTOR) (Maxwell, 2005). As such, ccRCC represents an ideal model for developing novel targeted therapies directed against the hypoxia pathway and many molecules are already used in clinical trials targeting either HIF-1, or the upstream pathways regulating HIF (as the Akt-mTOR signal transduction pathway), or the downstream genes induced by HIF (e.g., VEGF and VEGFR) (Baldewijns et al., 2010). Intriguingly, recent evidences indicate that also 20% of RCC sporadic cases with

this pathology.

wild-type VHL (and active VHL function) present a peculiar pattern of altered genes, suggesting the involvement of other, still partially unknown, alternative regulatory mechanisms (Gordan et al., 2008).

At DNA level, studies based on traditional cytogenetic and comparative genomic hybridization (CGH) techniques identified a panel of chromosomal aberrations typical of ccRCC (Höglund, 2004; Klatte, 2009). Moreover, high-density single nucleotide polymorphism (SNP) array technology, interrogating thousands of SNP markers distributed throughout the whole human genome, has significantly improved the detection of chromosomal aberrations and offered the opportunity to detect regions with loss of heterozygosity (LOH), an important information for the identification of novel tumor suppressor genes. SNP-arrays have been widely applied to characterize tumor genomic instability (Brenner & Rosenberg, 2010; Lisovich, 2011) and recently to perform the genomewide DNA profiling of ccRCC tissue samples (Beroukhim, 2009; Chen, 2009; Cifola, 2008). Overall, ccRCC is characterized by recurrent genetic anomalies at characteristic chromosomes, such as deletions with LOH on chromosomes 3p (involving also the VHL locus), 6q, 8p, 9p, and 14q, and duplications of chromosomes 5q and 7. Many evidences suggest that this peculiar pattern of genomic instability represents a tumor-specific molecular fingerprint that has a role in cancer pathogenesis and may be useful in diagnostic and prognostic applications (Gunawan, 2001; Klatte, 2009; Perego, 2008). Furthermore, a comprehensive study showed that cytogenetic alterations could be associated to ccRCC tumorigenesis and malignant progression (Zhang et al., 2010b).

Advances in high-throughput genome-wide profiling technologies allowed an unprecedented comprehensive view of the cancer genome landscape. In particular, highdensity microarrays and sequencing-based strategies have been widely used to identify genetic (gene dosage, allelic status, and mutations in gene sequence) and epigenetic (DNA methylation, histone modification, and microRNA) aberrations in cancer (Majewski & Bernards, 2011). The integrative approach of analyzing parallel dimensions has enabled the identification of genes that are often disrupted by multiple mechanisms but at low frequencies by any one mechanism and of pathways that are often disrupted at multiple components but at low frequencies at individual components (Chari et al., 2010). In these last years, there is an increasing tendency to combine genome-wide DNA copy number (CN) analysis with transcriptional profiles to investigate how alterations in DNA content (aneuploidy) can influence global expression patterns. In cancer research, this combined approach helps filtering the large amount of array-based data and, by narrowing down the hundreds of differentially expressed genes to those whose altered expression is attributable to underlying chromosomal alterations, allows highlighting candidate genes that are actively involved in the causation or maintenance of the malignant phenotype. This approach was applied in a wide range of tumor types, including breast (Hyman, 2002; Pollack, 2002), bladder (Harding et al., 2002), prostate (Saramäki et al., 2006), pancreas (Heidenblad et al., 2005), rectal (Grade et al., 2006) and melanoma (Akavia et al., 2010), demonstrating a strong genome-wide correlation between aneuploidy-associated genomic imbalances and global gene expression levels. Most studies focused on amplified and overexpressed genes and calculated that a fraction ranging from 44% to 62% of amplified genes showed concomitant up-regulated expression levels (Hyman et al., 2002). This suggests the presence of an aneuploidy-induced deregulation of the cancer transcriptome that occurs in

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

Dataset name

GSE781a Lenburg 9 8 Lenburg et al., 2003 GSE15641a Jones --- 23 Jones et al., 2005 GSE6344a Gumz --- 10 Gumz et al., 2007 GSE7023b Furge --- 13 Furge et al., 2007 GSE14762b Wang --- 12 Wang et al., 2009

GSE2109b Bittner 188 --- International Genomics

platforms used to obtain the original data are: aHG-U133A, bHG-U133 Plus 2.0, cHG-U133A 2.0, and dHT-HG-U133A. Samples from Beroukhim dataset (e) were used only in the integrative analysis of gene expression and copy number, since no grading annotation was

The integration and normalization of gene expression signals, obtained using different types of microarray in different experiments, is the most critical step for the meta-analysis of public available data since their direct integration may result in misleading results, due to dissimilar experimental conditions, laboratory-dependent bias, etc. Although Robust Multiarray Analysis (RMA; Irizarry et al., 2003) is the most effective signal quantification method, it cannot be applied to data obtained from different platforms (e.g., the HG-U133A and the HG-U133 Plus 2.0 arrays), due to differences in number, type and physical position of probes. As such, we implemented a procedure, called the Virtual Chip, to create a custom and virtual microarray grid that integrates the geometry and probe content of two or more types of Affymetrix arrays (Fallarino et al., 2010). Once defined the virtual grid, all raw data (represented by the so called CEL files) are re-organized to match a single platform, i.e. the virtual chip. At this point, raw data, originally from different types of microarrays, become homogeneous in terms of platform and can be preprocessed and normalized adopting standard approaches, as RMA. The Virtual Chip method allows combining data directly at the level of probe fluorescence intensity and presents the advantage that gene expression signals are generated with a single step of background correction, normalization and

GSE11151b Yusenko --- 3 Yusenko et al., 2009 E-TAM-282b Cifola 16 11 Cifola et al., 2008 GSE17895b Dalgliesh 83 13 Dalgliesh et al., 2010 GSE12606b Stickel 3 3 Stickel et al., 2009 GSE11904c Gordan 21 --- Gordan et al., 2008 GSE14994d Beroukhim 26e 11 Beroukhim et al., 2009 Table 1. Independent datasets included in the ccRCC compendium. The Affymetrix

G3 and 26 G4 samples.

Microarray repository code

available.

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 27

ArrayExpress (http://www.ebi.ac.uk/arrayexpress/; 1 dataset). Prior to analysis, we reorganized all datasets by manually annotating and tagging all samples, and re-named any original dataset after the first author's name of the corresponding publication. This reorganization resulted in a compendium of 426 samples comprising 320 ccRCCs and 106 normal renal tissues (Table 1). ccRCC samples have been further annotated according to nuclear grade and divided into a low-grade (n=197) and a high-grade (n=123) class, with the low-grade class comprising 29 G1 and 168 G2 samples and the high-grade class including 97

Samples References *ccRCC normal* 

Consortium

addition to the transcriptional and mutational deregulation of oncogenes and tumor suppressor genes. This combined approach is exemplified in the study by Garraway et al., in which the analysis of CN data obtained by SNP arrays drives the investigation of preexisting gene expression profiles (Garraway et al., 2005). Specifically, CN data were used to organize cancer samples into subgroups characterized by specific chromosomal aberrations associated to contiguous SNP chromosomal clusters. This genomic-based sub-grouping constituted the new phenotypic labeling of the samples in the gene expression analysis, i.e. samples from the NCI-60 cancer cell lines panel were re-grouped into two new classes based on the presence or absence of amplification at 3p14-p13 before performing the supervised analysis. The differential expression profiles, inside the SNP cluster characterizing the amplification at 3p14-p13, identified MITF gene as a novel melanoma-specific oncogene. This study clearly demonstrated the usefulness of an integrative approach to investigate candidate regions and genes specifically involved in tumor etiology and potentially useful as novel specific cancer biomarkers.

Clearly, to allow the rapid development of these innovative analytical procedures, it is necessary to implement novel and even more sophisticated mathematical and statistical algorithms. For instance, an important issue is to understand how combining and comparing microarray expression data of single genes with DNA copy number data of whole chromosomal regions. Thus, there is an increasing interest for developing computational tools able to link single differentially expressed genes to their chromosomal location, in order to calculate differentially expressed chromosomal regions and thus assemble regional transcriptional activity maps (Akavia, 2010; Schäfer, 2009). To address the integrative analysis of gene expression and copy number data in tumor samples, we recently developed a computational tool named Position RElated Data Analysis (*preda*, Ferrari et al., 2011). *preda* is particularly suited for the identification of chromosomal regions with concomitant and coordinated copy number and transcriptional imbalances (SODEGIRs, Bicciato et al., 2009), thus providing an opportunity for upgrading the information content of genomic data and for discovering novel cancer biomarkers.

In this chapter, we describe a general framework for depicting the molecular portrait of ccRCC through the integrative analysis of gene expression and copy number profiles obtained from publicly available datasets. The chapter is structured in Methods, Results and Discussion and addresses three major issues: i) the analysis and the functional characterization of a large compendium of gene expression data; ii) the identification of chromosomal alterations in ccRCC samples from SNP copy number data; iii) the integrative analysis of gene expression and copy number data.

#### **2. Methods**

#### **2.1 Gene expression analysis of ccRCC**

To characterize the transcriptional portrait of ccRCC, we retrieved 12 datasets containing microarray gene expression data of clear cell renal carcinoma and normal samples annotated with clinical information. All data were measured on several releases of the Affymetrix Human Genome HG-U133 arrays (i.e., HG-U133A; HG-U133 Plus 2.0, HG-U133A 2.0 and HT-HG-U133A) and have been downloaded from the public microarray data repositories Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/; 11 datasets) and

addition to the transcriptional and mutational deregulation of oncogenes and tumor suppressor genes. This combined approach is exemplified in the study by Garraway et al., in which the analysis of CN data obtained by SNP arrays drives the investigation of preexisting gene expression profiles (Garraway et al., 2005). Specifically, CN data were used to organize cancer samples into subgroups characterized by specific chromosomal aberrations associated to contiguous SNP chromosomal clusters. This genomic-based sub-grouping constituted the new phenotypic labeling of the samples in the gene expression analysis, i.e. samples from the NCI-60 cancer cell lines panel were re-grouped into two new classes based on the presence or absence of amplification at 3p14-p13 before performing the supervised analysis. The differential expression profiles, inside the SNP cluster characterizing the amplification at 3p14-p13, identified MITF gene as a novel melanoma-specific oncogene. This study clearly demonstrated the usefulness of an integrative approach to investigate candidate regions and genes specifically involved in tumor etiology and potentially useful

Clearly, to allow the rapid development of these innovative analytical procedures, it is necessary to implement novel and even more sophisticated mathematical and statistical algorithms. For instance, an important issue is to understand how combining and comparing microarray expression data of single genes with DNA copy number data of whole chromosomal regions. Thus, there is an increasing interest for developing computational tools able to link single differentially expressed genes to their chromosomal location, in order to calculate differentially expressed chromosomal regions and thus assemble regional transcriptional activity maps (Akavia, 2010; Schäfer, 2009). To address the integrative analysis of gene expression and copy number data in tumor samples, we recently developed a computational tool named Position RElated Data Analysis (*preda*, Ferrari et al., 2011). *preda* is particularly suited for the identification of chromosomal regions with concomitant and coordinated copy number and transcriptional imbalances (SODEGIRs, Bicciato et al., 2009), thus providing an opportunity for upgrading the information content

In this chapter, we describe a general framework for depicting the molecular portrait of ccRCC through the integrative analysis of gene expression and copy number profiles obtained from publicly available datasets. The chapter is structured in Methods, Results and Discussion and addresses three major issues: i) the analysis and the functional characterization of a large compendium of gene expression data; ii) the identification of chromosomal alterations in ccRCC samples from SNP copy number data; iii) the integrative

To characterize the transcriptional portrait of ccRCC, we retrieved 12 datasets containing microarray gene expression data of clear cell renal carcinoma and normal samples annotated with clinical information. All data were measured on several releases of the Affymetrix Human Genome HG-U133 arrays (i.e., HG-U133A; HG-U133 Plus 2.0, HG-U133A 2.0 and HT-HG-U133A) and have been downloaded from the public microarray data repositories Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/; 11 datasets) and

as novel specific cancer biomarkers.

of genomic data and for discovering novel cancer biomarkers.

analysis of gene expression and copy number data.

**2.1 Gene expression analysis of ccRCC** 

**2. Methods** 

ArrayExpress (http://www.ebi.ac.uk/arrayexpress/; 1 dataset). Prior to analysis, we reorganized all datasets by manually annotating and tagging all samples, and re-named any original dataset after the first author's name of the corresponding publication. This reorganization resulted in a compendium of 426 samples comprising 320 ccRCCs and 106 normal renal tissues (Table 1). ccRCC samples have been further annotated according to nuclear grade and divided into a low-grade (n=197) and a high-grade (n=123) class, with the low-grade class comprising 29 G1 and 168 G2 samples and the high-grade class including 97 G3 and 26 G4 samples.


Table 1. Independent datasets included in the ccRCC compendium. The Affymetrix platforms used to obtain the original data are: aHG-U133A, bHG-U133 Plus 2.0, cHG-U133A 2.0, and dHT-HG-U133A. Samples from Beroukhim dataset (e) were used only in the integrative analysis of gene expression and copy number, since no grading annotation was available.

The integration and normalization of gene expression signals, obtained using different types of microarray in different experiments, is the most critical step for the meta-analysis of public available data since their direct integration may result in misleading results, due to dissimilar experimental conditions, laboratory-dependent bias, etc. Although Robust Multiarray Analysis (RMA; Irizarry et al., 2003) is the most effective signal quantification method, it cannot be applied to data obtained from different platforms (e.g., the HG-U133A and the HG-U133 Plus 2.0 arrays), due to differences in number, type and physical position of probes. As such, we implemented a procedure, called the Virtual Chip, to create a custom and virtual microarray grid that integrates the geometry and probe content of two or more types of Affymetrix arrays (Fallarino et al., 2010). Once defined the virtual grid, all raw data (represented by the so called CEL files) are re-organized to match a single platform, i.e. the virtual chip. At this point, raw data, originally from different types of microarrays, become homogeneous in terms of platform and can be preprocessed and normalized adopting standard approaches, as RMA. The Virtual Chip method allows combining data directly at the level of probe fluorescence intensity and presents the advantage that gene expression signals are generated with a single step of background correction, normalization and

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

**2.2 Genomic copy number analysis of ccRCC** 

and 0.5, respectively.

Signal2Noise as metric and 1,000 permutations of phenotype labels.

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 29

taken from the Molecular Signatures Database (http://www.broadinstitute.org/ gsea/msigdb/index.jsp; version 3.0) and a list of 145 genes associated to HIF and VHL genes was downloaded from the NCBI Pathway Interaction Database (http://pid.nci.nih.gov). Gene sets have been considered significantly enriched at FDR≤0.25 when using

To assess copy number alterations in ccRCC, we used two datasets composed of 27 sporadic ccRCC samples profiled by Affymetrix Human Mapping 100K SNP arrays and downloaded from AE (E-TAM-283, E-TAM-284; Cifola et al., 2008) and 26 sporadic ccRCC samples profiled by Affymetrix Human Mapping 250K Sty SNP array and downloaded from GEO (GSE14994; Beroukhim et al., 2009). The genomic copy number values were quantified using Partek Genomics Suite and the presence of copy number alterations, i.e., chromosomal segments affected by amplification or deletion, was calculated using Partek Genomic Segmentation (GS) algorithm. Partek baseline generated from 90 Mapping 100K Hind/Xba HapMap trio samples (available at Affymetrix website; http://www.affymetrix.com/ support/technical/sample\_data/hapmap\_trio\_data.affx) and 270 Mapping 250K Sty HapMap samples (available at GEO, GSE5173) were used as diploid reference. In the Genomic Segmentation analysis, the cut-off values to identify gains and losses were set to 2.3 and 1.7, respectively, each segment was computed using a minimum of 10 consecutive filtered probe sets, and the threshold p-value and the signal to noise ratio were set to 0.001

**2.3 Integrative analysis of gene expression and genomic copy number in ccRCC** 

To address the integrative analysis of gene expression and copy number data we applied *preda* (Position RElated Data Analysis) tool, an R package for detecting regional variations of genomic features from high-throughput data (Ferrari et al., 2011). *preda* is particularly suited for the identification of chromosomal regions with coordinated copy number and transcriptional imbalances (SODEGIRs, Bicciato et al., 2009). In *preda*, custom-designed data structures allow to efficiently manage different types of genomics signals and annotations, different choices of smoothing functions and statistics empower a variety of flexible and robust workflows, and tabular and graphical representations facilitate downstream biological interpretation of results. The computational framework directly integrates copy number and gene expression profiles at genome-wide level, by statistically assessing the gene dosage and transcription statuses on common genomic positions. We applied *preda* to both Cifola and Beroukhim datasets (Table 1). Briefly, Cifola dataset comprises a subset of 11 ccRCC cases profiled by both Affymetrix Human Mapping 100K and HG-U133 Plus 2.0 arrays (Cifola et al., 2008), while Beroukhim dataset includes 26 ccRCC and 11 normal samples analyzed using both Affymetrix Human Mapping 250K and HT-HG-U133A arrays (Beroukhim et al., 2009). Copy number log-ratios were calculated using CNAG software (version 3.3.0.1, http://www.genome.umin.jp/; Nannya, 2005; Yamamoto, 2007), while gene expression levels were estimated using RMA algorithm. Both types of data were used as input to *preda* to identify regions harboring both down-regulated genes and CN loss or both up-regulated genes and CN gain (SODEGIR deleted and SODEGIR amplified signatures, respectively). To further validate the presence of areas of deletion and amplification in a larger panel of samples, we intersected the list of genes associated to the SODEGIR signatures with the list of

summarization. The construction of the virtual grid is inspired by the generation of custom Chip Definition Files (CDFs), i.e., of ad-hoc probe designs and array topologies. In custom CDFs, probes matching the same transcript, but belonging to different probe sets, are aggregated into putative custom-probe sets, each one including only those probes with a unique and exclusive correspondence with a single transcript. Similarly, probes matching the same transcript but located at different coordinates on different types of arrays may be merged in custom-probe sets and arranged in a virtual platform grid, whose geometry can be arbitrarily set. As for any other microarray geometry, this virtual grid may be used as a reference to create a virtual CDF file containing the probes of the Virtual Chip and their coordinates on the virtual platform. The probes included in the virtual CDF are those shared among the platforms of interest, with the additional condition of generating custom probe set of at least 4 probes. The virtual CDF can be derived from any custom CDF, e.g., those developed by Dai and publicly accessible at the Molecular and Behavioral Neuroscience Institute Microarray Lab (Dai et al., 2005). Finally, the virtual CDF can be used as the geometry file in RMA as far as the original CEL files are properly re-mapped to match the topology described in the virtual CDF. Re-mapped CEL files, called virtual CEL files, are homogeneous in terms of platform and gene expression data can be generated with a single step of background correction, normalization and summarization directly from the fluorescence signals of all microarrays composing the meta-dataset. In this particular case, expression values of the meta-dataset were generated from intensity signals using the combined HG-U133A/HG-U133 Plus 2.0/HG-U133A 2.0/HT-HG-U133A virtual-CDF file, the custom definition files for Affymetrix human arrays based on Entrez (version 12.1.0; http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/12.1.0/entrezg .asp), and the transformed virtual-CEL files. Intensity values of meta-probe sets have been background adjusted, normalized using quantile normalization, and gene expression levels calculated using median polish summarization (RMA algorithm; Irizarry et al., 2003). The final meta-dataset comprised gene expression values for a total of 11809 Entrez gene IDs and 426 samples.

The meta-dataset was analyzed using the Analysis of Variance (ANOVA) package of Partek Genomics Suite software (Version 6.5, http://www.partek.com/; Partek Inc., St Louis, MO, USA) to identify a list of differentially expressed genes (DEGs) between ccRCC samples and normal renal tissues. Specifically, genes have been defined as differentially expressed if the average expression values in the two groups differed of at least 2-folds and the False Discovery Rate (FDR; Benjamini-Hochberg method) of the statistical comparison was less than 0.05. Differentially expressed genes have been functionally characterized in term of Gene Ontology (GO) biological process (BP) using DAVID tool ( http://david.abcc. ncifcrf.gov/; (Huang, 2009a, 2009b) with an FDR≤0.001. Ingenuity Pathways Analysis (IPA, version 9.0) has been applied to assess functional connections that are statistically overrepresented among the differentially expressed genes. Briefly, in IPA, a p-value, calculated by a right tailed Fisher's Exact Test, quantifies the probability of observing the fraction of the focus genes in the canonical pathway as compared to the fraction expected by chance in the reference set, with the assumption that each gene is equally likely to be picked by chance. Finally, we investigated whether expression levels in ccRCCs and normal tissues were associated with elevated expression of biologically relevant gene sets using Gene Set Enrichment Analysis (GSEA, http://www.broadinstitute.org/gsea/index.jsp; Subramanian et al., 2005) on the meta-dataset. In particular, 217 BioCarta and 186 KEGG gene sets were taken from the Molecular Signatures Database (http://www.broadinstitute.org/ gsea/msigdb/index.jsp; version 3.0) and a list of 145 genes associated to HIF and VHL genes was downloaded from the NCBI Pathway Interaction Database (http://pid.nci.nih.gov). Gene sets have been considered significantly enriched at FDR≤0.25 when using Signal2Noise as metric and 1,000 permutations of phenotype labels.

#### **2.2 Genomic copy number analysis of ccRCC**

28 Emerging Research and Treatments in Renal Cell Carcinoma

summarization. The construction of the virtual grid is inspired by the generation of custom Chip Definition Files (CDFs), i.e., of ad-hoc probe designs and array topologies. In custom CDFs, probes matching the same transcript, but belonging to different probe sets, are aggregated into putative custom-probe sets, each one including only those probes with a unique and exclusive correspondence with a single transcript. Similarly, probes matching the same transcript but located at different coordinates on different types of arrays may be merged in custom-probe sets and arranged in a virtual platform grid, whose geometry can be arbitrarily set. As for any other microarray geometry, this virtual grid may be used as a reference to create a virtual CDF file containing the probes of the Virtual Chip and their coordinates on the virtual platform. The probes included in the virtual CDF are those shared among the platforms of interest, with the additional condition of generating custom probe set of at least 4 probes. The virtual CDF can be derived from any custom CDF, e.g., those developed by Dai and publicly accessible at the Molecular and Behavioral Neuroscience Institute Microarray Lab (Dai et al., 2005). Finally, the virtual CDF can be used as the geometry file in RMA as far as the original CEL files are properly re-mapped to match the topology described in the virtual CDF. Re-mapped CEL files, called virtual CEL files, are homogeneous in terms of platform and gene expression data can be generated with a single step of background correction, normalization and summarization directly from the fluorescence signals of all microarrays composing the meta-dataset. In this particular case, expression values of the meta-dataset were generated from intensity signals using the combined HG-U133A/HG-U133 Plus 2.0/HG-U133A 2.0/HT-HG-U133A virtual-CDF file, the custom definition files for Affymetrix human arrays based on Entrez (version 12.1.0; http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/12.1.0/entrezg .asp), and the transformed virtual-CEL files. Intensity values of meta-probe sets have been background adjusted, normalized using quantile normalization, and gene expression levels calculated using median polish summarization (RMA algorithm; Irizarry et al., 2003). The final meta-dataset comprised gene expression values for a total of 11809 Entrez gene IDs and

The meta-dataset was analyzed using the Analysis of Variance (ANOVA) package of Partek Genomics Suite software (Version 6.5, http://www.partek.com/; Partek Inc., St Louis, MO, USA) to identify a list of differentially expressed genes (DEGs) between ccRCC samples and normal renal tissues. Specifically, genes have been defined as differentially expressed if the average expression values in the two groups differed of at least 2-folds and the False Discovery Rate (FDR; Benjamini-Hochberg method) of the statistical comparison was less than 0.05. Differentially expressed genes have been functionally characterized in term of Gene Ontology (GO) biological process (BP) using DAVID tool ( http://david.abcc. ncifcrf.gov/; (Huang, 2009a, 2009b) with an FDR≤0.001. Ingenuity Pathways Analysis (IPA, version 9.0) has been applied to assess functional connections that are statistically overrepresented among the differentially expressed genes. Briefly, in IPA, a p-value, calculated by a right tailed Fisher's Exact Test, quantifies the probability of observing the fraction of the focus genes in the canonical pathway as compared to the fraction expected by chance in the reference set, with the assumption that each gene is equally likely to be picked by chance. Finally, we investigated whether expression levels in ccRCCs and normal tissues were associated with elevated expression of biologically relevant gene sets using Gene Set Enrichment Analysis (GSEA, http://www.broadinstitute.org/gsea/index.jsp; Subramanian et al., 2005) on the meta-dataset. In particular, 217 BioCarta and 186 KEGG gene sets were

426 samples.

To assess copy number alterations in ccRCC, we used two datasets composed of 27 sporadic ccRCC samples profiled by Affymetrix Human Mapping 100K SNP arrays and downloaded from AE (E-TAM-283, E-TAM-284; Cifola et al., 2008) and 26 sporadic ccRCC samples profiled by Affymetrix Human Mapping 250K Sty SNP array and downloaded from GEO (GSE14994; Beroukhim et al., 2009). The genomic copy number values were quantified using Partek Genomics Suite and the presence of copy number alterations, i.e., chromosomal segments affected by amplification or deletion, was calculated using Partek Genomic Segmentation (GS) algorithm. Partek baseline generated from 90 Mapping 100K Hind/Xba HapMap trio samples (available at Affymetrix website; http://www.affymetrix.com/ support/technical/sample\_data/hapmap\_trio\_data.affx) and 270 Mapping 250K Sty HapMap samples (available at GEO, GSE5173) were used as diploid reference. In the Genomic Segmentation analysis, the cut-off values to identify gains and losses were set to 2.3 and 1.7, respectively, each segment was computed using a minimum of 10 consecutive filtered probe sets, and the threshold p-value and the signal to noise ratio were set to 0.001 and 0.5, respectively.

#### **2.3 Integrative analysis of gene expression and genomic copy number in ccRCC**

To address the integrative analysis of gene expression and copy number data we applied *preda* (Position RElated Data Analysis) tool, an R package for detecting regional variations of genomic features from high-throughput data (Ferrari et al., 2011). *preda* is particularly suited for the identification of chromosomal regions with coordinated copy number and transcriptional imbalances (SODEGIRs, Bicciato et al., 2009). In *preda*, custom-designed data structures allow to efficiently manage different types of genomics signals and annotations, different choices of smoothing functions and statistics empower a variety of flexible and robust workflows, and tabular and graphical representations facilitate downstream biological interpretation of results. The computational framework directly integrates copy number and gene expression profiles at genome-wide level, by statistically assessing the gene dosage and transcription statuses on common genomic positions. We applied *preda* to both Cifola and Beroukhim datasets (Table 1). Briefly, Cifola dataset comprises a subset of 11 ccRCC cases profiled by both Affymetrix Human Mapping 100K and HG-U133 Plus 2.0 arrays (Cifola et al., 2008), while Beroukhim dataset includes 26 ccRCC and 11 normal samples analyzed using both Affymetrix Human Mapping 250K and HT-HG-U133A arrays (Beroukhim et al., 2009). Copy number log-ratios were calculated using CNAG software (version 3.3.0.1, http://www.genome.umin.jp/; Nannya, 2005; Yamamoto, 2007), while gene expression levels were estimated using RMA algorithm. Both types of data were used as input to *preda* to identify regions harboring both down-regulated genes and CN loss or both up-regulated genes and CN gain (SODEGIR deleted and SODEGIR amplified signatures, respectively). To further validate the presence of areas of deletion and amplification in a larger panel of samples, we intersected the list of genes associated to the SODEGIR signatures with the list of

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

four networks that are mainly enriched in up regulated genes.

ccRCC normal

metric and linkage, respectively.

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 31

*binding* (ABCA1, CAV1, CD2, COL4A3, CXCL12, CXCR4, FGF1, GPC3, IGFBP3, ITGA5, ITGAM, ITGB2, KNG1, SCARB1, SDC1, SERPINE1, SLC6A3, SPARC, ST6GAL1, TGFB1, TLR2, UMOD, VCAM1, VEGFA, VWF), *cell adhesion* (ADAM10, ADAM9, ANGPT2, C3, CCL5, CD93, CDH13, CR2, CXCL12, CYFIP2, FXYD5, INHBB, ITGA4, ITGA5, ITGAM, ITGB2, KDR, KLK6, MARCKS, PECAM1, PLAU, PLXND1, POSTN, ROCK1, SERPINE1, SLIT2, TGFB1, TIMP1, VCAM1, VEGFA, ZEB2), *chemotaxis* (ADAM10, CCL20, CCL5, CD36, CDH13, CXCL11, CXCL12, CXCR4, EGF, HMGB2, KDR, PDGFRA, PLAU, RARRES2, SERPINE1, SLIT2, TGFB1, TLR2, VEGFA), and *fragmentation of DNA* (ABCB1, AIFM1, BNIP3, CLU, DNASE1L3, EGF, FAS, NOX4, SFRP1, SOD2). Moreover, the IPA network analysis resulted in 20 networks including, each one, more than 13 focus molecules and confirmed the previous GO findings of functional activities in mechanisms related to *cell death*, *cell to cell signaling and interaction, cellular movement,* and *cancer.* Table 2 enlists the top

Fig. 1. Clustering map of ccRCC and normal samples based on the list of 1036 differentially expressed genes identified by ANOVA in the comparison between cancer and normal specimens. Each row represents a single gene and each column an experimental sample. Samples are separated into two main groups enriched for ccRCC (upper yellow bar) and normal tissues (upper blue bar). The map has been obtained using the hierarchical clustering of dChip (Li & Wong, 2001) with Pearson correlation and centroid as distance

differentially expressed genes obtained from the ANOVA comparison of the 320 ccRCCs with the 106 normal samples of the meta-dataset (Table 1). Differentially expressed genes and genes comprised in the SODEGIR signatures were annotated using GeneDistiller 2 tool (http://www.genedistiller.org; Seelow et al., 2008). Literature mining was performed using PubMatrix tool (http://pubmatrix.grc.nia.nih.gov/; Becker et al., 2003) and applying specific keywords such as *cancer*, *renal cell carcinoma, amplification, methylation, oncogene, tumor suppressor and biomarker.* 

#### **3. Result**

#### **3.1 Differential gene expression profiling of ccRCC**

The aim of this analysis was to functionally characterize the transcriptional profiles that differentiate cancer specimens from normal tissues. We based our initial analysis on the weight of gene expression data, taking advantage of bioinformatics techniques that allow direct interrogation of differentially expressed genes for activation of specific signaling pathways. The cohort of 426 samples composing the meta-dataset was analyzed by ANOVA to identify a list of differentially expressed genes between ccRCC and normal renal tissues. This comparison resulted in 1036 genes specifically modulated more than 2 folds in ccRCC cancers and that showed 95% of statistical confidence for differential expression. The fold change distribution ranged from -210 to 41, although the majority of DEGs showed an expression modulation varying from 2 to 4 folds. As depicted in the clustering map of Figure 1, the 534 up-regulated and 502 down-regulated genes grouped the meta-dataset samples into two clearly defined differential patterns of transcriptional activation in tumor samples as compared to normal tissues.

The functional and biological characterization of the 1036 differentially expressed genes using Gene Ontology (GO) annotation highlighted that the most significant processes and pathways altered in ccRCC are consistent with the important role of aerobic metabolism typically associated to epithelial cancers (Figure 2). In particular, we observed a down regulation of genes associated to metabolism and transport counteracted by the up regulation of genes associated to signal transduction and cell communication. The GO functional characterization indicated that ccRCC decrements the expression of genes related to oxido-reductase activity, amine catabolism, amine and exose biosynthesis, fatty acid metabolism, excretion and secretion, response to hormone, ion transport (Figure 2, panel A) while induces the transcription of genes related to the immune response, response to wounding, defense response, angiogenesis, response to oxygen level, cell proliferation, chemotaxis, cell adhesion and motility, and T-cell activation (Figure 2, panel B).

A further functional characterization of the differentially expressed genes using the knowledge database of Ingenuity Pathway Analysis (IPA) pointed out *cancer* and *genetic disorder* as the most significant enriched categories (p-value≤0.0001 and more than 200 genes). Specifically, IPA analysis associated the modulated genes to the categories of *renal cancer* (ACAT1, BTG2, C7orf68, CA9, CD70, CDH6, CLCNKB, CP, CSF1R, DEFB1, EDNRA, EGF, EPCAM, FGFR3, GPC3, IGF2BP3, IGFBP2, INHBB, KDR, KNG1, MME, MMP9, MUC1, MYC, NR3C1, PDGFRA, RRM2, SFRP1, SLC6A3, TIMP1, TOP2A, TUBA1A, TUBB2A, VEGFA), *cancer progression* (AHR, BCL6, CCND1, CDKN1B, CXCL12, IFI16, KIF2A, MYC, NR4A1, PLAGL1, TGFB1), *angiogenesis* (ANGPTL3, ANGPTL4, ANXA3, APOH, AQP1, ARHGAP24, BTG1, COL4A2, COL4A3, CXCR4, EGF, ITGA5, KDR, MTDH, SERPINE1, SPARC, VASH1, VEGF), *cell cycle* (AHR, CCND1, DEGS1, NEFL, CDKN1B, MMP9), *cell* 

differentially expressed genes obtained from the ANOVA comparison of the 320 ccRCCs with the 106 normal samples of the meta-dataset (Table 1). Differentially expressed genes and genes comprised in the SODEGIR signatures were annotated using GeneDistiller 2 tool (http://www.genedistiller.org; Seelow et al., 2008). Literature mining was performed using PubMatrix tool (http://pubmatrix.grc.nia.nih.gov/; Becker et al., 2003) and applying specific keywords such as *cancer*, *renal cell carcinoma, amplification, methylation, oncogene, tumor* 

The aim of this analysis was to functionally characterize the transcriptional profiles that differentiate cancer specimens from normal tissues. We based our initial analysis on the weight of gene expression data, taking advantage of bioinformatics techniques that allow direct interrogation of differentially expressed genes for activation of specific signaling pathways. The cohort of 426 samples composing the meta-dataset was analyzed by ANOVA to identify a list of differentially expressed genes between ccRCC and normal renal tissues. This comparison resulted in 1036 genes specifically modulated more than 2 folds in ccRCC cancers and that showed 95% of statistical confidence for differential expression. The fold change distribution ranged from -210 to 41, although the majority of DEGs showed an expression modulation varying from 2 to 4 folds. As depicted in the clustering map of Figure 1, the 534 up-regulated and 502 down-regulated genes grouped the meta-dataset samples into two clearly defined differential patterns of transcriptional activation in tumor

The functional and biological characterization of the 1036 differentially expressed genes using Gene Ontology (GO) annotation highlighted that the most significant processes and pathways altered in ccRCC are consistent with the important role of aerobic metabolism typically associated to epithelial cancers (Figure 2). In particular, we observed a down regulation of genes associated to metabolism and transport counteracted by the up regulation of genes associated to signal transduction and cell communication. The GO functional characterization indicated that ccRCC decrements the expression of genes related to oxido-reductase activity, amine catabolism, amine and exose biosynthesis, fatty acid metabolism, excretion and secretion, response to hormone, ion transport (Figure 2, panel A) while induces the transcription of genes related to the immune response, response to wounding, defense response, angiogenesis, response to oxygen level, cell proliferation,

A further functional characterization of the differentially expressed genes using the knowledge database of Ingenuity Pathway Analysis (IPA) pointed out *cancer* and *genetic disorder* as the most significant enriched categories (p-value≤0.0001 and more than 200 genes). Specifically, IPA analysis associated the modulated genes to the categories of *renal cancer* (ACAT1, BTG2, C7orf68, CA9, CD70, CDH6, CLCNKB, CP, CSF1R, DEFB1, EDNRA, EGF, EPCAM, FGFR3, GPC3, IGF2BP3, IGFBP2, INHBB, KDR, KNG1, MME, MMP9, MUC1, MYC, NR3C1, PDGFRA, RRM2, SFRP1, SLC6A3, TIMP1, TOP2A, TUBA1A, TUBB2A, VEGFA), *cancer progression* (AHR, BCL6, CCND1, CDKN1B, CXCL12, IFI16, KIF2A, MYC, NR4A1, PLAGL1, TGFB1), *angiogenesis* (ANGPTL3, ANGPTL4, ANXA3, APOH, AQP1, ARHGAP24, BTG1, COL4A2, COL4A3, CXCR4, EGF, ITGA5, KDR, MTDH, SERPINE1, SPARC, VASH1, VEGF), *cell cycle* (AHR, CCND1, DEGS1, NEFL, CDKN1B, MMP9), *cell* 

chemotaxis, cell adhesion and motility, and T-cell activation (Figure 2, panel B).

*suppressor and biomarker.* 

**3.1 Differential gene expression profiling of ccRCC** 

samples as compared to normal tissues.

**3. Result** 

*binding* (ABCA1, CAV1, CD2, COL4A3, CXCL12, CXCR4, FGF1, GPC3, IGFBP3, ITGA5, ITGAM, ITGB2, KNG1, SCARB1, SDC1, SERPINE1, SLC6A3, SPARC, ST6GAL1, TGFB1, TLR2, UMOD, VCAM1, VEGFA, VWF), *cell adhesion* (ADAM10, ADAM9, ANGPT2, C3, CCL5, CD93, CDH13, CR2, CXCL12, CYFIP2, FXYD5, INHBB, ITGA4, ITGA5, ITGAM, ITGB2, KDR, KLK6, MARCKS, PECAM1, PLAU, PLXND1, POSTN, ROCK1, SERPINE1, SLIT2, TGFB1, TIMP1, VCAM1, VEGFA, ZEB2), *chemotaxis* (ADAM10, CCL20, CCL5, CD36, CDH13, CXCL11, CXCL12, CXCR4, EGF, HMGB2, KDR, PDGFRA, PLAU, RARRES2, SERPINE1, SLIT2, TGFB1, TLR2, VEGFA), and *fragmentation of DNA* (ABCB1, AIFM1, BNIP3, CLU, DNASE1L3, EGF, FAS, NOX4, SFRP1, SOD2). Moreover, the IPA network analysis resulted in 20 networks including, each one, more than 13 focus molecules and confirmed the previous GO findings of functional activities in mechanisms related to *cell death*, *cell to cell signaling and interaction, cellular movement,* and *cancer.* Table 2 enlists the top four networks that are mainly enriched in up regulated genes.

Fig. 1. Clustering map of ccRCC and normal samples based on the list of 1036 differentially expressed genes identified by ANOVA in the comparison between cancer and normal specimens. Each row represents a single gene and each column an experimental sample. Samples are separated into two main groups enriched for ccRCC (upper yellow bar) and normal tissues (upper blue bar). The map has been obtained using the hierarchical clustering of dChip (Li & Wong, 2001) with Pearson correlation and centroid as distance metric and linkage, respectively.

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

ACTN1, ANGPTL4, ARPC1B, BARD1, BTG1, CASP1, CASP4, CD2, CD70, CLU, CORO1C, CSTA, DNASE1L3, EDN1, GLIPR1, GLUL, GPR65, IFIH1, IL7, IL7R, KDM3A, LGALS1, MAL, NOL3, NR3C1, PLAGL1, PLP2, SCARB1, SERPINB1,

SERPINE1, STAT5a/b, TMSB10/TMSB4X, TNFAIP6,

TNFAIP8, TNFRSF1B

TRIB3, VCAM1

VDR, VEGFA

UBE2D1

ADAM10, AHR, Akt, ANXA1, BAZ1A, C3, C3AR1, CASR, CDH13, CR2, CXCL12, CXCR4, EGF, EIF4EBP1, ERBB4, ERK1/2, GJB1, IGFBP2, IGFBP3, IL1RL1, ITGA5, KDR, KL, LDL, MYOF, PI3K

(complex), PLG, PRKCZ, PTPRC, Ras homolog, RCAN1, SLC6A3, TCF4,

ACTG2, AGTR1, ANK2, AUH, BDKRB2, CCNDBP1, CLMN, COL5A1, COL5A2, COL5A3, CSDA, CTH, FBL, GNL2, ID2, IL7, MYH10, NAP1L1, NCL, NTRK2, PLK2, PMP22, PTPN3, RB1, RRAD, S100A2, SPTBN1, TNFRSF1B, TOP2A, TP53, TP73, TP53I3, TSPAN1, TUBA1A,

APOH, BCR, CCL5, CD14, COL4A1, COL4A2, DDX58, Fibrinogen, HLA-F, IFN Beta, IL12 (complex), ISG15, ITGAM, ITGB2, KNG1, LY96, MMP9, NFkB (complex), P38 MAPK, PLAT, PLAU, POSTN, PYCARD, ROCK1, TAP1, TGFB1, TIMP1, TLR1, TLR2, TLR3, TLR7, TNIP1, TRAF3IP2,

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 33

34 34

26 30

26 30

16 23

Table 2. Top four significant networks identified by the IPA network analysis on the list of differentially expressed genes (red, up-regulated DEG; green, down-regulated DEG; black, not regulated). a The score column indicates the -log(p-value), while b the focus molecules

Among the most activated pathways (Table 4), we found association to cancer (renal cell carcinoma and chronic myeloid leukemia) and oncogenic signatures characterized by the

column quantifies the number of modulated genes in the network.

Moleculesb Top Functions

Cell Death, Inflammatory Response, Cellular Growth and Proliferation

Cell-To-Cell Signaling and Interaction, Inflammatory Response, Hematological System Development and Function

Cellular Movement, Inflammatory Disease, Cellular Growth and Proliferation

Cancer, Neurological Disease, Cellular Development

Molecules in Network Scorea Focus

Fig. 2. Functional characterization in terms of GO Biological Process of the 502 downregulated genes (panel A) and of the 534 up-regulated genes (panel B). On the X-axis the log(FDR) of DAVID enrichment test is reported.

To gain further insight into the biological pathways engaged in ccRCC phenotype, we used bioinformatics classifiers, or gene signatures, that register a modulated activity (either activation or inactivation) of specific signaling pathways in tumor samples. In particular, Gene Set Enrichment Analysis (GSEA) allowed identifying 25 inactivated and 50 activated pathways in cancer samples. The inactivated signaling modules relate to aminoacid metabolism, glucose and lipid metabolism, molecule transport, drug metabolism, glycolysis and gluconeogenesis, oxidative metabolism and immune signaling (Table 3).

Fig. 2. Functional characterization in terms of GO Biological Process of the 502 downregulated genes (panel A) and of the 534 up-regulated genes (panel B). On the X-axis the

and gluconeogenesis, oxidative metabolism and immune signaling (Table 3).

To gain further insight into the biological pathways engaged in ccRCC phenotype, we used bioinformatics classifiers, or gene signatures, that register a modulated activity (either activation or inactivation) of specific signaling pathways in tumor samples. In particular, Gene Set Enrichment Analysis (GSEA) allowed identifying 25 inactivated and 50 activated pathways in cancer samples. The inactivated signaling modules relate to aminoacid metabolism, glucose and lipid metabolism, molecule transport, drug metabolism, glycolysis

log(FDR) of DAVID enrichment test is reported.


Table 2. Top four significant networks identified by the IPA network analysis on the list of differentially expressed genes (red, up-regulated DEG; green, down-regulated DEG; black, not regulated). a The score column indicates the -log(p-value), while b the focus molecules column quantifies the number of modulated genes in the network.

Among the most activated pathways (Table 4), we found association to cancer (renal cell carcinoma and chronic myeloid leukemia) and oncogenic signatures characterized by the

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

Cell differentiation

Cell to cell signaling

Cell fate and survival

DNA repair

Glyco-metabolism

Immuno signaling

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 35

including for instance NFKB, TOLL like receptor, T cell receptor, and NK cell, in which also many cytokines (i.e. IL18, CCL5, IL8, CCL4, IL7) and their receptors (i.e. IL7R, IL2RG) are involved. Finally, the enrichment analysis evidenced a role for genes involved in *DNA repair* 

Biological context GSEA gene set ES FDR Angiogenesis VEGF pathway 0.478 0.222 Cancer Chronic myeloid leukemia 0.410 0.216

Renal cell carcinoma 0.441 0.206

Notch signaling pathway 0.388 0.245 Calcineurin pathway 0.539 0.235 Dorso ventral axis formation 0.586 0.232

Apoptosis 0.372 0.232 Raccycd pathway 0.479 0.243 PTEN pathway 0.550 0.220 Chemical pathway 0.555 0.246 PML pathway 0.601 0.192

Systemic lupus erythematosus 0.510 0.233 Viral myocarditis 0.544 0.155 Leishmania infection 0.621 0.196 Graft versus host disease 0.682 0.148 Asthma 0.687 0.166 Allograft rejection 0.711 0.180

Nucleotide excision repair 0.513 0.152 DNA replication 0.682 0.205 Mismatch repair 0.687 0.242

Type I diabetes mellitus 0.636 0.173

sulfate 0.666 0.157

T cell receptor signaling pathway 0.413 0.227 NFKB pathway 0.449 0.232 Natural killer cell mediated cytoxicity 0.457 0.211 HIVNEF pathway 0.460 0.240 TOLL like receptor signaling pathway 0.486 0.224 HCMV pathway 0.503 0.241 NOD like receptor signaling pathway 0.510 0.235 Cytosolic DNA sensing pathway 0.540 0.230 IL7 pathway 0.565 0.239 CSK pathway 0.577 0.248 Autoimmune Thyroid disease 0.596 0.237 Intestinal immune network for IGA production 0.609 0.207 NKT pathway 0.618 0.227 NKCELLS pathway 0.645 0.219 NO2IL12 pathway 0.684 0.237 TH1TH2 pathway 0.733 0.221

Glycosaminoglycan biosynthesis chondroitin

Hypoxia HIF and VHL 0.518 0.197

*and replication* (e.g. MSH2, POLD2, RFC2, RFC4, RFC5, PCNA, SSBP1, LIG1).

presence of several well-known cancer genes (CCND1, MYC, RB1, TP53, RUNX1, AKT2, KRAS, CRKL, CSK, MDM2, NRAS, MET, RAP1A, APC, SHC1, PTEN, ATR, ATM, VAV1, LYN, ROCK1). Some of these signatures are inter-connected through key genes, as the tumor suppressor gene TP53 and the oncogene MYC. As expected, given the fundamental role of hypoxia in renal cell carcinoma, the *HIF and VHL* gene set resulted activated in ccRCC, as illustrated by the high ES score and by the clear-cut pattern of expression of HIFand VHL- regulated genes in ccRCCs and normal tissues (Table 4 and Figure 3). Among the most active players of this signature, there are genes associated to *angiogenesis* (EDN1, VEGFA), *cell survival* (ATM, MYC), *glucose influx* (SLC2A1), *pH control* (CA9), *oxidative and iron metabolism* (PGK1, HK2, CP, HMOX1) and *HIF processing* (EGLN3, EGLN1). Additional gene sets were related to *cell fate and survival*, *cell to cell signaling* and *kinase signaling*. Furthermore, several pathways activated in ccRCC are associated to *immune signaling,*


Table 3. List of pathways identified as inactivated in the cancer phenotype by GSEA. All pathways belong to gene sets derived from the KEGG pathway database. The ES and FDR columns indicate the enrichment score (i.e., the degree to which a gene set is overrepresented at the top or bottom of a ranked list of genes) and the statistical significance (i.e., the estimated probability that a gene set with a given ES represents a false positive finding).

presence of several well-known cancer genes (CCND1, MYC, RB1, TP53, RUNX1, AKT2, KRAS, CRKL, CSK, MDM2, NRAS, MET, RAP1A, APC, SHC1, PTEN, ATR, ATM, VAV1, LYN, ROCK1). Some of these signatures are inter-connected through key genes, as the tumor suppressor gene TP53 and the oncogene MYC. As expected, given the fundamental role of hypoxia in renal cell carcinoma, the *HIF and VHL* gene set resulted activated in ccRCC, as illustrated by the high ES score and by the clear-cut pattern of expression of HIFand VHL- regulated genes in ccRCCs and normal tissues (Table 4 and Figure 3). Among the most active players of this signature, there are genes associated to *angiogenesis* (EDN1, VEGFA), *cell survival* (ATM, MYC), *glucose influx* (SLC2A1), *pH control* (CA9), *oxidative and iron metabolism* (PGK1, HK2, CP, HMOX1) and *HIF processing* (EGLN3, EGLN1). Additional gene sets were related to *cell fate and survival*, *cell to cell signaling* and *kinase signaling*. Furthermore, several pathways activated in ccRCC are associated to *immune signaling,*

Biological context GSEA gene set ES FDR

Differentiation Taste transduction -0.601 0.227

Drug metabolism Drug metabolism cytochrome P450 -0.662 0.137

Glyco-metabolism Pyruvate metabolism -0.624 0.141

Immuno signaling Vibrio cholerae infection -0.465 0.185 Lipid metabolism Glycerolipid metabolism -0.517 0.152

metabolism Citrate cycle TCA cycle -0.718 0.196 Molecule transport Aldosterone regulated sodium reabsorption -0.612 0.213

Table 3. List of pathways identified as inactivated in the cancer phenotype by GSEA. All pathways belong to gene sets derived from the KEGG pathway database. The ES and FDR

overrepresented at the top or bottom of a ranked list of genes) and the statistical significance (i.e., the estimated probability that a gene set with a given ES represents a false positive

columns indicate the enrichment score (i.e., the degree to which a gene set is

Valine leucine and isoleucine degradation -0.807 0.141 Propanoate metabolism -0.804 0.168 Beta alanine metabolism -0.769 0.138 Glycine serine and threonine metabolism -0.723 0.133 Arginine and proline metabolism -0.695 0.153 Tryptophan metabolism -0.655 0.152 Histidine metabolism -0.654 0.157 Alanine aspartate glutamate metabolism -0.604 0.144 Lysine degradation -0.580 0.140 Selenoamino acid metabolism -0.572 0.145 Cysteine and methionine metabolism -0.492 0.174

Cardiac muscle contraction -0.462 0.198

Metabolism of xenobiotics by cytochrome P450 -0.638 0.139

Glycolysis and gluconeogenesis -0.504 0.227

Fatty acid metabolism -0.691 0.150

Peroxisome -0.588 0.143

Butanoate metabolism -0.763 0.171 Retinol metabolism -0.641 0.168

Amino acid metabolism

Mitochondrial

Oxidative metabolism

finding).

including for instance NFKB, TOLL like receptor, T cell receptor, and NK cell, in which also many cytokines (i.e. IL18, CCL5, IL8, CCL4, IL7) and their receptors (i.e. IL7R, IL2RG) are involved. Finally, the enrichment analysis evidenced a role for genes involved in *DNA repair and replication* (e.g. MSH2, POLD2, RFC2, RFC4, RFC5, PCNA, SSBP1, LIG1).


Molecular Portrait of Clear Cell Renal Cell Carcinoma:

Cytoplasm PPP1R1A, COPG,

Extracellular Space SPOCK1, IGFBP1,

Plasma Membrane RARRES, OSMR

PPP1R1A, EMX2, NAT8, RCAN2).

Endoplasmic Reticulum

and low grade samples.

expression (Figure 4).

low grade high grade

linkage, respectively.

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 37

BHMT, HAO2, ADH1B, MGAM, FMO2, ALDOB, GBA3, BBOX1, ABP1, DNASE1L3), *Gprotein coupled receptors* (EDNRB, AGTR1, RGS5), *growth factors* (IGFBP1, PDGFD), *transmembrane receptors* (TMEM204, OSMR) and five *transcription regulators* (TFPI2,

Cellular location Up-regulated Down-regulated

Membrane FMO2, SLC27A2, SLC17A3

Nucleus DNASE1L3, EMX2, XIST, AUTS2,

Table 5. Cellular location of the 44 differentially expressed genes identified between high

Fig. 4. Clustering map of high and low grade ccRCC samples based on the list of 44 differentially expressed genes identified by ANOVA in the comparison between high and low grade samples. Each row represents a single gene and each column an experimental sample. Samples are separated into two main groups enriched for low (upper blue bars) and high grade (upper orange bars). The map has been obtained using the hierarchical clustering of dChip (Li and Wong2001) with Pearson correlation and centroid as distance metric and

Despite the intrinsic heterogeneity of the meta-dataset (due to the combination of different experimental sets), when applied to cluster the 320 ccRCC samples, the gradedependent specific transcriptional signature was able to segregate the high-grade phenotypes in an homogenous group characterized by a general down regulation of gene

PCK1, FABP4, BBOX1, GBA3, C13orf15, C5orf23, ALDOB, SCGN, ADH1B, HAO2, BHMT, APOLD1

SLC6A3, AGTR1, RGS5, SLC47A1, SLC17A4, EDNRB, SLCO2A1, TMEM204, MGAM, NAT8

TFPI2, MT1X EMCN, ABP1, PDGFD, UMOD

RCAN2

KRT19, SOD2


Table 4. List of pathways identified as activated in the cancer phenotype by GSEA. All pathways belong to gene sets derived from BioCarta and KEGG pathway databases, with the exception of the HIF and VHL list that has been derived from NCBI Pathway Interaction Database. The ES and FDR columns indicate the enrichment score (i.e., the degree to which a gene set is overrepresented at the top or bottom of a ranked list of genes) and the statistical significance (i.e., the estimated probability that a gene set with a given ES represents a false positive finding).

ccRCC normal

Fig. 3. Standardized gene expression levels of the 145 genes composing the HIF and VHL signaling pathway in ccRCC (upper yellow bar) and normal samples (upper blue bar). Each row represents a single gene and each column an experimental sample. Genes are ordered according to GSEA enrichment score. The map has been obtained using the hierarchical clustering of dChip (Li & Wong, 2001).

We finally investigated whether exists a grade-dependent specific transcriptional signature and compared the two groups of ccRCC cases previously classified as high (G3 and G4) and low grade (G1 and G2) classes. ANOVA differential analysis identified 44 differentially expressed genes (10 up-regulated and 34 down-regulated genes in high grade) that have been grouped according to their cellular localization to highlight putative grade-dependent clinical biomarkers (Table 5). Among the modulated genes, we found *transporters* (COPG, SLC27A2, FABP4, SLCO2A1, SLC17A4, SLC47A1, SLC17A3, SLC6A3), *enzymes* (SOD2,

Kinase signaling PAR1 pathway 0.419 0.241

Molecule transport Snare interactions in vesicular transport 0.455 0.232

Transcription RNA degradation 0.482 0.231

Table 4. List of pathways identified as activated in the cancer phenotype by GSEA. All pathways belong to gene sets derived from BioCarta and KEGG pathway databases, with the exception of the HIF and VHL list that has been derived from NCBI Pathway Interaction Database. The ES and FDR columns indicate the enrichment score (i.e., the degree to which a gene set is overrepresented at the top or bottom of a ranked list of genes) and the statistical significance (i.e., the estimated probability that a gene set with a given ES represents a false

Fig. 3. Standardized gene expression levels of the 145 genes composing the HIF and VHL signaling pathway in ccRCC (upper yellow bar) and normal samples (upper blue bar). Each row represents a single gene and each column an experimental sample. Genes are ordered according to GSEA enrichment score. The map has been obtained using the hierarchical

We finally investigated whether exists a grade-dependent specific transcriptional signature and compared the two groups of ccRCC cases previously classified as high (G3 and G4) and low grade (G1 and G2) classes. ANOVA differential analysis identified 44 differentially expressed genes (10 up-regulated and 34 down-regulated genes in high grade) that have been grouped according to their cellular localization to highlight putative grade-dependent clinical biomarkers (Table 5). Among the modulated genes, we found *transporters* (COPG, SLC27A2, FABP4, SLCO2A1, SLC17A4, SLC47A1, SLC17A3, SLC6A3), *enzymes* (SOD2,

Oncogenic signaling

positive finding).

ccRCC normal

clustering of dChip (Li & Wong, 2001).

P38MAPK pathway 0.459 0.219

MTOR signaling pathway 0.431 0.226 FCER1 pathway 0.443 0.239 WNT pathway 0.458 0.230 GCR pathway 0.471 0.245 P53 signaling pathway 0.559 0.166 GSK3 pathway 0.566 0.236 ARF pathway 0.572 0.231 ATRBRCA pathway 0.627 0.185 BHMT, HAO2, ADH1B, MGAM, FMO2, ALDOB, GBA3, BBOX1, ABP1, DNASE1L3), *Gprotein coupled receptors* (EDNRB, AGTR1, RGS5), *growth factors* (IGFBP1, PDGFD), *transmembrane receptors* (TMEM204, OSMR) and five *transcription regulators* (TFPI2, PPP1R1A, EMX2, NAT8, RCAN2).


Table 5. Cellular location of the 44 differentially expressed genes identified between high and low grade samples.

Despite the intrinsic heterogeneity of the meta-dataset (due to the combination of different experimental sets), when applied to cluster the 320 ccRCC samples, the gradedependent specific transcriptional signature was able to segregate the high-grade phenotypes in an homogenous group characterized by a general down regulation of gene expression (Figure 4).

low grade high grade

Fig. 4. Clustering map of high and low grade ccRCC samples based on the list of 44 differentially expressed genes identified by ANOVA in the comparison between high and low grade samples. Each row represents a single gene and each column an experimental sample. Samples are separated into two main groups enriched for low (upper blue bars) and high grade (upper orange bars). The map has been obtained using the hierarchical clustering of dChip (Li and Wong2001) with Pearson correlation and centroid as distance metric and linkage, respectively.

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

**A. B.** 

Position (Mb)

associated to *gene amplification* (CSF1R, PDGFRB, LOX, NSD1).

A) and Beroukhim dataset (panel B).

0 20 40 60 80 100 120 140 160 180 200 220 240

to Cifola dataset.

Chromosome

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 39

SODEGIR located at 5q21.3-q35.3 (from 130 to 180Mb) and a single deleted SODEGIR at 3p14.1-p22.3 (from 35 to 60 Mb) (Figure 6, panel A). Similar imbalanced regions were found for chromosomes 3 and 5 in Beroukhim dataset (Figure 6, panel B), although the lower probe density of the gene expression platform utilized in this study (i.e., the HG-U133A arrays) did not allow a finer resolution of the chromosomal segments as compared

Chromosome

Fig. 6. SODEGIR amplified (red) and deleted (green) chromosomal regions identified by *preda* in the integrative analysis of gene expression and copy number data for Cifola (panel

To further study the influence of gene dosage associated to structural position as one of the mechanism of transcriptional regulation, the genes located at SODEGIR signature (199 and 147 genes in deleted and amplified SODEGIRs, respectively) were intersected with the list of differentially expressed genes, identified by ANOVA in the comparison between ccRCC and normal tissues of meta-dataset. Overall, we found that 68% of the genes associated to the deleted signature (136 out of 199 genes) resulted down-regulated in the meta-dataset, while 61% of the genes associated to amplified signature (90 out of 147 genes) were up regulated at a statistically significant level. The most differentially down-regulated genes ranged from -2 to -10 fold changes (PTH1R, ACY1, ACOX2, IL17RB, HYAL1, UQCRC1, ACAA1, DNASE1L3, SEMA3G, ABHD14A, AMT, APEH, ALS2CL, CISH, MYL3, SEMA3B, HIGD1A, PLXNB1, PDHB), while the most up regulated ranged from 2 to 3.5 fold changes (TNFAIP8, LOX, SPARC, CSF1R, TCERG1, LOXL2, SPARCL1, YIPF5, RPS14, ABLIM3, TNIP1, STK10, CLK4). IPA annotation grouped these genes in the biological categories of *transcription and translation regulator, transmembrane receptor, enzyme* and *kinase* (Table 6), while Gene Distiller and PubMatrix highlighted that genes of the deleted SODEGIR are associated to *tumor suppressor function* (DLEC1, TMEM158, PTHR1, SEDT2, LIMD19, FAM107A, BAP1), *epigenetic modification* (STAC, CTDSPL, DLEC1, PRSS50, SEDT2, IP6K1, SEMA3B, TUSC2, PARP3, PRKCD) and *chromosomal deletion* (DLEC1, LIMD1, LTF, RBM6, IRFd2, TUSC2, COL7A1), and genes of the amplified SODEGIR are enriched in *oncogenes* (CSF1R, PDGFRB, LOX, DUSP1, SPARC, ITK, FLT4, GNB2L1, LARS, CD74, F12, MAML1, SQSTM1) and

Position (Mb)

0 20 40 60 80 100 120 140 160 180 200 220 240

#### **3.2 Copy number profiling of ccRCC**

Genetic studies on ccRCC clinical samples characterized some recurrent alterations in precise chromosomal regions (i.e. deletions of chromosomes 3p, 6q, 8p, 9p, 14q, and amplifications of chromosomes 5q and 7). To confirm the copy number signature of ccRCC, we analyzed the CN profile of two independent datasets by SNP array technology with different resolution level. As showed in Figure 5, the genome-wide assessment of copy number alterations characterizing 27 and 26 sporadic ccRCC samples profiled by Affymetrix Human Mapping 100K and 250K Sty Array, respectively, revealed that all autosomes were affected by either CN gain or loss or both of them. In Cifola dataset (panel A), the most frequently amplified regions were on chromosomes 4q, 5 (p and q arms), 7 (p and q arms), 11p and 12q, whereas the most recurrent deleted region was identified on chromosome 3p. The longest recurrent amplifications resulted on chromosomes 1 (p and q arms), 2 (p and q arms), 3q, 11q, 16q, 18q and 19p, often spanning two or more consecutive megabases. These DNA alterations presented frequencies ranging from 6 to 12 samples. Similarly, the CN profile of Beroukhim dataset (panel B), obtained with a denser SNP array, showed that the most frequently amplified regions were on chromosomes 5 (p and q arms), 7 (p and q arms), 11p, 12q, 19 and 20, whereas the most recurrent deleted regions were identified on chromosomes 3p, 6, 8q, 9 and 14. Overall, we observed that the CNA profile obtained from the two datasets were globally overlapping, so confirming the typical ccRCC genomic signature. Due to the higher density of SNP array used in their study, Beroukhim et al. were able to better discriminate some CNAs as compared to Cifola dataset (i.e. the loss on chromosomes 8p, 11q, 14q, 15, and the gain on chromosomes 11p, 12, 19, 20).

Fig. 5. Visualization of the CNA frequencies occurring in Cifola (panel A) and Beroukhim datasets (panel B). Regions of DNA copy number gain (red bar) and copy number loss (blue bar) are represented along each chromosome (from 1 to 22, ordered horizontally). X chromosome was omitted from this analysis.

#### **3.3 Integrative analysis of gene expression and copy number data**

In order to identify chromosomal regions with coordinated copy number and transcriptional imbalances (SODEGIRs), we performed the integrative analysis on the two independent datasets with paired gene expression and copy number data (namely, Cifola and Beroukhim). In Cifola dataset, *preda* analysis revealed segments of amplified

Genetic studies on ccRCC clinical samples characterized some recurrent alterations in precise chromosomal regions (i.e. deletions of chromosomes 3p, 6q, 8p, 9p, 14q, and amplifications of chromosomes 5q and 7). To confirm the copy number signature of ccRCC, we analyzed the CN profile of two independent datasets by SNP array technology with different resolution level. As showed in Figure 5, the genome-wide assessment of copy number alterations characterizing 27 and 26 sporadic ccRCC samples profiled by Affymetrix Human Mapping 100K and 250K Sty Array, respectively, revealed that all autosomes were affected by either CN gain or loss or both of them. In Cifola dataset (panel A), the most frequently amplified regions were on chromosomes 4q, 5 (p and q arms), 7 (p and q arms), 11p and 12q, whereas the most recurrent deleted region was identified on chromosome 3p. The longest recurrent amplifications resulted on chromosomes 1 (p and q arms), 2 (p and q arms), 3q, 11q, 16q, 18q and 19p, often spanning two or more consecutive megabases. These DNA alterations presented frequencies ranging from 6 to 12 samples. Similarly, the CN profile of Beroukhim dataset (panel B), obtained with a denser SNP array, showed that the most frequently amplified regions were on chromosomes 5 (p and q arms), 7 (p and q arms), 11p, 12q, 19 and 20, whereas the most recurrent deleted regions were identified on chromosomes 3p, 6, 8q, 9 and 14. Overall, we observed that the CNA profile obtained from the two datasets were globally overlapping, so confirming the typical ccRCC genomic signature. Due to the higher density of SNP array used in their study, Beroukhim et al. were able to better discriminate some CNAs as compared to Cifola dataset (i.e. the loss on

chromosomes 8p, 11q, 14q, 15, and the gain on chromosomes 11p, 12, 19, 20).

Fig. 5. Visualization of the CNA frequencies occurring in Cifola (panel A) and Beroukhim datasets (panel B). Regions of DNA copy number gain (red bar) and copy number loss (blue

In order to identify chromosomal regions with coordinated copy number and transcriptional imbalances (SODEGIRs), we performed the integrative analysis on the two independent datasets with paired gene expression and copy number data (namely, Cifola and Beroukhim). In Cifola dataset, *preda* analysis revealed segments of amplified

bar) are represented along each chromosome (from 1 to 22, ordered horizontally). X

**3.3 Integrative analysis of gene expression and copy number data** 

**A. B.** 

chromosome was omitted from this analysis.

**3.2 Copy number profiling of ccRCC** 

SODEGIR located at 5q21.3-q35.3 (from 130 to 180Mb) and a single deleted SODEGIR at 3p14.1-p22.3 (from 35 to 60 Mb) (Figure 6, panel A). Similar imbalanced regions were found for chromosomes 3 and 5 in Beroukhim dataset (Figure 6, panel B), although the lower probe density of the gene expression platform utilized in this study (i.e., the HG-U133A arrays) did not allow a finer resolution of the chromosomal segments as compared to Cifola dataset.

Fig. 6. SODEGIR amplified (red) and deleted (green) chromosomal regions identified by *preda* in the integrative analysis of gene expression and copy number data for Cifola (panel A) and Beroukhim dataset (panel B).

To further study the influence of gene dosage associated to structural position as one of the mechanism of transcriptional regulation, the genes located at SODEGIR signature (199 and 147 genes in deleted and amplified SODEGIRs, respectively) were intersected with the list of differentially expressed genes, identified by ANOVA in the comparison between ccRCC and normal tissues of meta-dataset. Overall, we found that 68% of the genes associated to the deleted signature (136 out of 199 genes) resulted down-regulated in the meta-dataset, while 61% of the genes associated to amplified signature (90 out of 147 genes) were up regulated at a statistically significant level. The most differentially down-regulated genes ranged from -2 to -10 fold changes (PTH1R, ACY1, ACOX2, IL17RB, HYAL1, UQCRC1, ACAA1, DNASE1L3, SEMA3G, ABHD14A, AMT, APEH, ALS2CL, CISH, MYL3, SEMA3B, HIGD1A, PLXNB1, PDHB), while the most up regulated ranged from 2 to 3.5 fold changes (TNFAIP8, LOX, SPARC, CSF1R, TCERG1, LOXL2, SPARCL1, YIPF5, RPS14, ABLIM3, TNIP1, STK10, CLK4). IPA annotation grouped these genes in the biological categories of *transcription and translation regulator, transmembrane receptor, enzyme* and *kinase* (Table 6), while Gene Distiller and PubMatrix highlighted that genes of the deleted SODEGIR are associated to *tumor suppressor function* (DLEC1, TMEM158, PTHR1, SEDT2, LIMD19, FAM107A, BAP1), *epigenetic modification* (STAC, CTDSPL, DLEC1, PRSS50, SEDT2, IP6K1, SEMA3B, TUSC2, PARP3, PRKCD) and *chromosomal deletion* (DLEC1, LIMD1, LTF, RBM6, IRFd2, TUSC2, COL7A1), and genes of the amplified SODEGIR are enriched in *oncogenes* (CSF1R, PDGFRB, LOX, DUSP1, SPARC, ITK, FLT4, GNB2L1, LARS, CD74, F12, MAML1, SQSTM1) and associated to *gene amplification* (CSF1R, PDGFRB, LOX, NSD1).

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

al., 2009).

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 41

the adaptation of cancer cells to low oxygen levels (Baldewijns, 2010; Bristow & Hill, 2008) and to their continuous proliferation even in the presence of compromised DNA repair mechanisms (Semenza, 2008). These results find an additional confirmation from the analysis of genomic data presented in this chapter. Indeed, the application of different bioinformatics tools resulted in a list of genes (e.g., VEGFA, MYC, CA9, SLC2A1, BNIP3, CXCR4, EGLN3 alias PDH3, SERPINA1, KDR, ATM, CP) highly activated in ccRCC and related to hypoxia signaling, known to be targets of the transcription factor HIF-1 or involved in cancer and pathways (as apoptosis and angiogenesis) which have been already targeted for therapeutic intervention in RCC (Pantuck et al., 2003). As expected, among the up-regulated genes, there is the well-known cancer gene MYC (Gordan, 2007, 2008) that several studies indicated as modulated by HIF-1 (Dang, 2008; Gordan, 2007; Podar & Anderson, 2010) and playing a fundamental role in ccRCC proliferation (Tang et

Focusing the investigation to genes and pathways more specifically associated to ccRCC, the analysis of molecular profiles confirmed the presence of the *adipogenic signature* characterized by the up-regulation of genes such as FABP7, NR3C1, ANGPTL4, CAV2, CAV1, and the down-regulation of FABP1 and of the transcription factors TFCP2L1 and GATA3, as previously reported by Tun et al. (Tun et al., 2010). Loss of cell-cell adhesion and cell polarity is commonly observed in epithelial tumors and correlates with their invasion into adjacent tissues and generation of metastases. Many evidences indicate that loss of cell polarity and cell-cell adhesion may also be important in early stages of neoplastic transformation (Coradini et al., 2011). Disruption of intercellular junctions and alterations in cell polarity are specific hallmarks of epithelial cancer cells. In fact, most human tumors arising in epithelial tissues gradually lose their polarized morphology and acquire a mesenchymal phenotype (*epithelial-mesenchymal transition*, EMT) (Thiery, 2003, 2009). Accordingly, and in concordance with Tun et al. (Tun et al., 2010), we observed the upregulation of several EMT-associated genes (TGFB1, SPARC, VIM, MTHFD2, HSPG2, PROCR, COL3A1, ZEB2), indicating the involvement of this biological process in cancer cell progression and spreading in host tissues, as confirmed very recently by a study on the protein expression of important EMT mediators in ccRCC (Mikami et al., 2011). Among the other up-regulated genes and pathways (Table 4), the up regulation of gene transcription factor 4 (TCF4) confirmed previous evidences of the interplay between Wnt/-catenin and PI3K/Akt signaling cascades and its involvement in tumor development and progression (Chen et al., 2011). Furthermore, the activation of a series of *immuno pathways*, especially antigen presenting and processing pathways, is quite striking in ccRCC and has been recently demonstrated by the proteomic identification of tumor antigen-derived peptides in RCC (Seliger et al., 2011). In particular, the CD74 up-regulation is suggested to be linked to the PI3K/Akt- and MEK/ERK-dependent intracellular signaling cascades, both associated

with NF-kB nuclear translocation and DNA-binding activity (Liu et al., 2008).

represent candidate biomarkers for further investigations (Table 7).

Overall, the elucidation of the functional role of the ccRCC activated signaling pathways could be useful for the identification of novel cancer markers or for the development of molecular–targeted therapeutic agents. Taking into account the biological localization and functional roles of genes up regulated in ccRCC, we propose a series of genes that could


Table 6. Biological function of the subset of differentially expressed genes located into SODEGIRs.

#### **4. Discussion**

In this chapter we illustrated the identification of distinct molecular profiles in ccRCC samples using experimental data available in public repositories and published in peerreviewed articles (Brannon & Rathmell, 2010). To exemplify how genomic data can be exploited to functionally characterize the molecular characteristics of renal carcinoma, we downloaded more than 500 ccRCC samples from public repositories of genomic data and, after manual selection, we created a compendium (meta-dataset) of gene expression and copy number profiles in 320 ccRCCs, annotated with the nuclear grade information, and 106 normal samples mainly representing adjacent renal tissues from the same surgical specimen. The bioinformatics analysis of gene expression profiles allowed the identification of lists of differentially expressed genes and of gene signatures activated in the cancer phenotype. Additionally, the comprehensive analysis of copy number profiles highlighted characteristic chromosomal aberrations affecting ccRCC cases and the integration of gene expression and copy number data revealed the presence of chromosomal regions with concomitant transcriptional and gene dosage imbalances.

As recently reviewed by Pal et al. (Pal et al., 2010), several gene expression and proteomic studies carried out on fresh and archival ccRCC tissues (Perroud, 2009; Seliger, 2009) evidenced a series of molecular processes and pathways involved in ccRCC tumorigenesis (Banumathy & Cairns, 2010) and indicated that ccRCC progression is strictly associated to

IL17RB CD74, FLT4

Amplified SODEGIR 5q21.3 q35.3

TCERG1, FEM1C, CNOT8, ZNF354A, NSD1, SQSTM1, MED7, MAML1, SOX30,

LOX, LOXL2, DDX41, LTC4S, GM2A, THG1L, GNB2L1, DPYSL3, MGAT1, LARS, MGAT4B, HINT1,

HNRNPAB, PGGT1B, G3BP1, GFPT2, PPIC, B4GALT7

CSF1R, STK10, CLK4, ITK, PDGFRB, CSNK1A1, HK3, CSNK1G3, MAPK9

MXD3, RPS14

3p14.1-p22

RAD54L2, LIMD1, ZNF197, ZNF35, SMARCC1, EIF1B

DAG1, NISCH, PLXNB1,

HEMK1, ARIH2, TKTL1, GMPPB, PARP3, MLH1, DHX30, SETD2, LARS2, ABHD5, P4HTM, ABHD6, CYB561D2, RPP14, ENTPD3, PLCD1, EXOSC7, ALAS1, PDHB, AMT, ABHD14A, DNASE1L3, ACAA1, UQCRC1, HYAL1, ACOX2

MAP4K2, PRKAR2A, MST1R, OXSR1, ULK4, PRKCD, CAMKV, ACVR2B, NPRL2, MAPKAPK3, NME6, IP6K1

Table 6. Biological function of the subset of differentially expressed genes located into

In this chapter we illustrated the identification of distinct molecular profiles in ccRCC samples using experimental data available in public repositories and published in peerreviewed articles (Brannon & Rathmell, 2010). To exemplify how genomic data can be exploited to functionally characterize the molecular characteristics of renal carcinoma, we downloaded more than 500 ccRCC samples from public repositories of genomic data and, after manual selection, we created a compendium (meta-dataset) of gene expression and copy number profiles in 320 ccRCCs, annotated with the nuclear grade information, and 106 normal samples mainly representing adjacent renal tissues from the same surgical specimen. The bioinformatics analysis of gene expression profiles allowed the identification of lists of differentially expressed genes and of gene signatures activated in the cancer phenotype. Additionally, the comprehensive analysis of copy number profiles highlighted characteristic chromosomal aberrations affecting ccRCC cases and the integration of gene expression and copy number data revealed the presence of chromosomal regions with concomitant

As recently reviewed by Pal et al. (Pal et al., 2010), several gene expression and proteomic studies carried out on fresh and archival ccRCC tissues (Perroud, 2009; Seliger, 2009) evidenced a series of molecular processes and pathways involved in ccRCC tumorigenesis (Banumathy & Cairns, 2010) and indicated that ccRCC progression is strictly associated to

Biological category Deleted SODEGIR

Transcription and translation regulator

Transmembrane

receptor

Enzyme

Kinase

SODEGIRs.

**4. Discussion** 

transcriptional and gene dosage imbalances.

the adaptation of cancer cells to low oxygen levels (Baldewijns, 2010; Bristow & Hill, 2008) and to their continuous proliferation even in the presence of compromised DNA repair mechanisms (Semenza, 2008). These results find an additional confirmation from the analysis of genomic data presented in this chapter. Indeed, the application of different bioinformatics tools resulted in a list of genes (e.g., VEGFA, MYC, CA9, SLC2A1, BNIP3, CXCR4, EGLN3 alias PDH3, SERPINA1, KDR, ATM, CP) highly activated in ccRCC and related to hypoxia signaling, known to be targets of the transcription factor HIF-1 or involved in cancer and pathways (as apoptosis and angiogenesis) which have been already targeted for therapeutic intervention in RCC (Pantuck et al., 2003). As expected, among the up-regulated genes, there is the well-known cancer gene MYC (Gordan, 2007, 2008) that several studies indicated as modulated by HIF-1 (Dang, 2008; Gordan, 2007; Podar & Anderson, 2010) and playing a fundamental role in ccRCC proliferation (Tang et al., 2009).

Focusing the investigation to genes and pathways more specifically associated to ccRCC, the analysis of molecular profiles confirmed the presence of the *adipogenic signature* characterized by the up-regulation of genes such as FABP7, NR3C1, ANGPTL4, CAV2, CAV1, and the down-regulation of FABP1 and of the transcription factors TFCP2L1 and GATA3, as previously reported by Tun et al. (Tun et al., 2010). Loss of cell-cell adhesion and cell polarity is commonly observed in epithelial tumors and correlates with their invasion into adjacent tissues and generation of metastases. Many evidences indicate that loss of cell polarity and cell-cell adhesion may also be important in early stages of neoplastic transformation (Coradini et al., 2011). Disruption of intercellular junctions and alterations in cell polarity are specific hallmarks of epithelial cancer cells. In fact, most human tumors arising in epithelial tissues gradually lose their polarized morphology and acquire a mesenchymal phenotype (*epithelial-mesenchymal transition*, EMT) (Thiery, 2003, 2009). Accordingly, and in concordance with Tun et al. (Tun et al., 2010), we observed the upregulation of several EMT-associated genes (TGFB1, SPARC, VIM, MTHFD2, HSPG2, PROCR, COL3A1, ZEB2), indicating the involvement of this biological process in cancer cell progression and spreading in host tissues, as confirmed very recently by a study on the protein expression of important EMT mediators in ccRCC (Mikami et al., 2011). Among the other up-regulated genes and pathways (Table 4), the up regulation of gene transcription factor 4 (TCF4) confirmed previous evidences of the interplay between Wnt/-catenin and PI3K/Akt signaling cascades and its involvement in tumor development and progression (Chen et al., 2011). Furthermore, the activation of a series of *immuno pathways*, especially antigen presenting and processing pathways, is quite striking in ccRCC and has been recently demonstrated by the proteomic identification of tumor antigen-derived peptides in RCC (Seliger et al., 2011). In particular, the CD74 up-regulation is suggested to be linked to the PI3K/Akt- and MEK/ERK-dependent intracellular signaling cascades, both associated with NF-kB nuclear translocation and DNA-binding activity (Liu et al., 2008).

Overall, the elucidation of the functional role of the ccRCC activated signaling pathways could be useful for the identification of novel cancer markers or for the development of molecular–targeted therapeutic agents. Taking into account the biological localization and functional roles of genes up regulated in ccRCC, we propose a series of genes that could represent candidate biomarkers for further investigations (Table 7).

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 43

CA9 expression can be detected in the tumor by immunohistochemistry (IHC) and in blood and tissue by ELISA assay and RT-PCR (Truong & Shen, 2011). In metastatic disease, high CA9 expression reported by IHC was indicated as a powerful prognostic marker for better survival and sensitivity to IL-2 treatment, although the robustness of this association is still debated (Atkins, 2004; Pantuck, 2005). Almost no data are currently available about the association of CA9 expression and response to targeted drugs. The prognostic value of CA9 in ccRCC could be explained by the frequent VHL gene inactivation driving an early activation of the HIF pathway. The poorer prognosis associated with low CA9 expressing tumors could be attributed to the simultaneous over-expression of EGFR contributing to the activation of Akt-mTOR pathways. Targeting CA9 by inhibitors, radioimmunotherapy, monoclonal antibodies or vaccination is promising and offers new avenues for clinical research (Tostain et al., 2010). Recently, it was reported that serum CA9 levels are significantly higher in ccRCC than in non-ccRCC samples and may help in the differential diagnosis of RCC. Serum CA9 levels also correlate with tumor size in ccRCC patients (Zhou et al., 2010). The role of *caveolin-1* (CAV1) in RCC pathogenesis is still controversial, as it is considered involved in both suppression and promotion of tumor growth and development. However, its increased expression has been used as marker of less favorable outcome in patients with both clinically confined ccRCC (Campbell et al., 2003) and distant metastasis (Waalkes et al., 2011), thus suggesting to be a candidate prognostic marker for RCC aggressiveness. *CD70 protein* (CD70) is a type II transmembrane protein belonging to the tumor necrosis factor family. It represents the ligand for CD27, a glycosylated transmembrane protein of the tumor necrosis factor receptor family. CD70 protein has been found expressed at a high level in ccRCCs by IHC (Junker et al., 2005). The role of this protein in tumorigenesis and its utility as diagnostic marker in serum and urine or as therapeutic tool certainly deserves further studies. *Cadherin-6* (CDH6) is an adhesion molecule that was proved to be marker of poor prognosis and metastases development in ccRCC (Paul, 2004; Shimazui, 2004). *Ceruroplasmin* (CP) is a protein involved in iron metabolism, is regulated by HIF-1 (Martin et al., 2005) and has been associated to metastatic potential and tumor progression. Serum CP protein level has been found elevated in RCC and other malignancies as compared to healthy controls, indicating its potentiality as a cancer biomarker (Osunkoya et al., 2009). *CXC chemokine receptor-4* (CXCR4) is a target of the VHL-HIF pathway and Staller et al. (Staller, 2003; Struckmann, 2008) demonstrated that its high expression is associated to poor survival. *Prolyl hydroxylase-3* (PHD3/ENGL3) is a member of the PHD family, which is involved in the degradation of HIF proteins in cooperation with VHL protein under normoxic conditions. PHD3 was found frequently over-expressed in RCC tissues, with high specificity to cancer samples (Zhao et al., 2006) and its usefulness as a novel tumor antigen for RCC immunotherapy has been recently demonstrated in clinical serum samples from RCC patients (Sato, 2008; Tanaka, 2011). *Insulin-like growth factor binding protein 3* (IGFBP3) is one of the most over-expressed genes in ccRCC (Takahashi, 2005; Yao, 2005) and its increased protein expression has been demonstrated in 74% of ccRCCs by IHC and associated with higher Fuhrman nuclear grade (Chuang et al., 2008). *Matrix metallopeptidase 9* (MMP9) has been reported increased in ccRCC and associated to survival. Statistical analysis indicated that elevated Snail, MMP2 and MMP9 protein expression are significantly correlated to worse disease-free and diseasespecific survival of RCC patients (Mikami et al., 2011). MMP9, TIMP1 and CXCR4 have been studied both in vitro and in vivo and the data strongly indicated that VHL coordinately regulates the expression of metastasis-associated genes CXCR4/CXCL12 and


Table 7. List of candidate biomarker genes up regulated in ccRCC.

In particular, *Annexin A4* (ANXA4) is a member of the annexin family of calcium-dependent phospholipid binding proteins and can exist as a soluble protein as well as a membraneassociated protein. ANXA4 could play an important role in regulating the cellular functions at the level of cell–cell interaction, cell adhesion and motility and, although increased protein expression level of ANXA4 has been confirmed in ccRCC by global proteomic analysis (Seliger et al., 2009), its possible implication in the carcinogenesis of RCC deserves further studies. *Carbonic anhydrase 9* (CA9) is a transmembrane member of the carbonic anhydrase family that catalyses the reversible hydration of carbon dioxide into bicarbonate and a proton, thus enabling tumor cells to maintain a neutral pH despite an acidic microenvironment. CA9 is not expressed in healthy renal tissue but is expressed in most ccRCCs through HIF-1 accumulation driven by hypoxia and inactivation of the VHL gene.

CAV1 caveolin-1 Campbell et al., 2003; Waalkes

CD70 CD70 molecule Junker et al., 2005; Law et al.,

CDH6 cadherin 6, type 2, K-cadherin (fetal kidney) Shimazui et al., 2004; Paul et

CXCR4 CXC chemokine receptor-4 Staller et al., 2003; Struckmann

IGFBP3 insulin-like growth factor binding protein 3 Yao et al., 2005; Takahashi et

In particular, *Annexin A4* (ANXA4) is a member of the annexin family of calcium-dependent phospholipid binding proteins and can exist as a soluble protein as well as a membraneassociated protein. ANXA4 could play an important role in regulating the cellular functions at the level of cell–cell interaction, cell adhesion and motility and, although increased protein expression level of ANXA4 has been confirmed in ccRCC by global proteomic analysis (Seliger et al., 2009), its possible implication in the carcinogenesis of RCC deserves further studies. *Carbonic anhydrase 9* (CA9) is a transmembrane member of the carbonic anhydrase family that catalyses the reversible hydration of carbon dioxide into bicarbonate and a proton, thus enabling tumor cells to maintain a neutral pH despite an acidic microenvironment. CA9 is not expressed in healthy renal tissue but is expressed in most ccRCCs through HIF-1 accumulation driven by hypoxia and inactivation of the VHL gene.

CP ceruloplasmin (ferroxidase) Osunkoya et al., 2009;

2005; Seliger et al., 2009

al., 2010

2009

2008

al., 2004

et al., 2008

al., 2011

Zhou et al., 2010

et al., 2011

Atkins et al., 2004; Pantuck et al., 2005; Zhao et al., 2006; Osunkoya et al., 2009; Zhou et

Young et al., 2001; Liu et al.,

Zhao et al., 2006; Sato et al., 2008; Tanaka et al., 2011; Dalgliesh et al., 2010

al., 2005; Chuang et al., 2008

Yao et al., 2005; Seliger et al., 2009; Kim et al., 2010; Teng et

Skubitz & Skubitz, 2002; Lam et al., 2005; Liu et al., 2010;

Struckmann et al., 2008; Mikami et al., 2011

Symbol Description References ANXA4 annexin A4 Shi et al., 2004; Jones et al.,

CA9 carbonic anhydrase IX

CD74 class II major histocompatibility complexassociated invariant chain

> matrix metallopeptidase 9 (gelatinase B, 92kDa gelatinase, 92kDa type IV

Table 7. List of candidate biomarker genes up regulated in ccRCC.

STC2 stanniocalcin 2 Meyer et al., 2009

ENGL3 prolyl hydroxylase-3 (PHD3)

collagenase)

NNMT nicotinamide N-methyltransferase

VEGFA vascular endothelial growth factor A

MMP9

CA9 expression can be detected in the tumor by immunohistochemistry (IHC) and in blood and tissue by ELISA assay and RT-PCR (Truong & Shen, 2011). In metastatic disease, high CA9 expression reported by IHC was indicated as a powerful prognostic marker for better survival and sensitivity to IL-2 treatment, although the robustness of this association is still debated (Atkins, 2004; Pantuck, 2005). Almost no data are currently available about the association of CA9 expression and response to targeted drugs. The prognostic value of CA9 in ccRCC could be explained by the frequent VHL gene inactivation driving an early activation of the HIF pathway. The poorer prognosis associated with low CA9 expressing tumors could be attributed to the simultaneous over-expression of EGFR contributing to the activation of Akt-mTOR pathways. Targeting CA9 by inhibitors, radioimmunotherapy, monoclonal antibodies or vaccination is promising and offers new avenues for clinical research (Tostain et al., 2010). Recently, it was reported that serum CA9 levels are significantly higher in ccRCC than in non-ccRCC samples and may help in the differential diagnosis of RCC. Serum CA9 levels also correlate with tumor size in ccRCC patients (Zhou et al., 2010). The role of *caveolin-1* (CAV1) in RCC pathogenesis is still controversial, as it is considered involved in both suppression and promotion of tumor growth and development. However, its increased expression has been used as marker of less favorable outcome in patients with both clinically confined ccRCC (Campbell et al., 2003) and distant metastasis (Waalkes et al., 2011), thus suggesting to be a candidate prognostic marker for RCC aggressiveness. *CD70 protein* (CD70) is a type II transmembrane protein belonging to the tumor necrosis factor family. It represents the ligand for CD27, a glycosylated transmembrane protein of the tumor necrosis factor receptor family. CD70 protein has been found expressed at a high level in ccRCCs by IHC (Junker et al., 2005). The role of this protein in tumorigenesis and its utility as diagnostic marker in serum and urine or as therapeutic tool certainly deserves further studies. *Cadherin-6* (CDH6) is an adhesion molecule that was proved to be marker of poor prognosis and metastases development in ccRCC (Paul, 2004; Shimazui, 2004). *Ceruroplasmin* (CP) is a protein involved in iron metabolism, is regulated by HIF-1 (Martin et al., 2005) and has been associated to metastatic potential and tumor progression. Serum CP protein level has been found elevated in RCC and other malignancies as compared to healthy controls, indicating its potentiality as a cancer biomarker (Osunkoya et al., 2009). *CXC chemokine receptor-4* (CXCR4) is a target of the VHL-HIF pathway and Staller et al. (Staller, 2003; Struckmann, 2008) demonstrated that its high expression is associated to poor survival. *Prolyl hydroxylase-3* (PHD3/ENGL3) is a member of the PHD family, which is involved in the degradation of HIF proteins in cooperation with VHL protein under normoxic conditions. PHD3 was found frequently over-expressed in RCC tissues, with high specificity to cancer samples (Zhao et al., 2006) and its usefulness as a novel tumor antigen for RCC immunotherapy has been recently demonstrated in clinical serum samples from RCC patients (Sato, 2008; Tanaka, 2011). *Insulin-like growth factor binding protein 3* (IGFBP3) is one of the most over-expressed genes in ccRCC (Takahashi, 2005; Yao, 2005) and its increased protein expression has been demonstrated in 74% of ccRCCs by IHC and associated with higher Fuhrman nuclear grade (Chuang et al., 2008). *Matrix metallopeptidase 9* (MMP9) has been reported increased in ccRCC and associated to survival. Statistical analysis indicated that elevated Snail, MMP2 and MMP9 protein expression are significantly correlated to worse disease-free and diseasespecific survival of RCC patients (Mikami et al., 2011). MMP9, TIMP1 and CXCR4 have been studied both in vitro and in vivo and the data strongly indicated that VHL coordinately regulates the expression of metastasis-associated genes CXCR4/CXCL12 and

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

modification machinery (Dalgliesh et al., 2010).

this pathology and to highlight key players in RCC biology.

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 45

Similarly, Chen et al. detected gains of chromosome 5q33.1-qter and losses of chromosome 3p21.31-p22.3 in 58% and 80% of the 80 RCC samples analyzed using Illumina 317K SNP arrays (Chen et al., 2009), respectively. Noticeably, these regions have great influence on the expression levels of the resident genes as previously demonstrated by integrative genomic studies (Beroukhim, 2009; Bicciato, 2009; Cifola, 2008; Furge, 2004). In accordance, the comprehensive integrative analysis pinpointed that the two most significant chromosomal regions with coordinated copy number and transcriptional imbalances (SODEGIRs) are localized at the same chromosomal arms (Figure 6). Although the integrative analysis presented here was conducted using a completely different approach from that applied by Beroukhim et al. (Beroukhim et al., 2009), both studies identified 12 over-expressed genes located at the 5q peak region (GNB2L1, MGAT1, RUFY, RNF130, MAPK9, CANX, SQSTM1, LTC4S, TBC1D9B, HNRPH1, FLT4). Among them, the ubiquitin-binding protein *sequestosome 1* (SQSTM1) was found also in the focal amplification region at 5q35.3 by Chen et al. (Chen et al., 2009) and was reported over-expressed in breast and prostate tumors (Kitamura, 2006; Thompson, 2003). Moreover, we confirmed that, as previously evidenced by Cifola and co-workers (Cifola et al., 2008) and recently confirmed at proteomic level (Liu et al., 2010), *lyxyl oxidase* (LOX) is over-expressed in ccRCC. LOX is one of the critical HIF-1 targets mediating tumor progression and catalyzes the cross-linking of collagens and elastin in the extracellular matrix, thereby regulating tissue tensile strength (Erler & Giaccia, 2006). Paradoxically, LOX has been reported to be both up-regulated and down-regulated in cancer cells, especially in colorectal cancer (Baker, 2011; Pez, 2011). Mechanistic investigations revealed that LOX activates the PI3K-Akt signaling pathway, thereby upregulating HIF-1 protein synthesis in a manner requiring LOX-mediated hydrogen peroxide production. Concordantly with these results, cancer cell proliferation was stimulated by secreted and active LOX in a HIF-1-dependent fashion (Pez et al., 2011). Our data suggest that the transcriptional modulation of LOX might be also driven by genomic imbalance. Among the significant down-modulated genes located at the deleted SODEGIR on chromosome 3p14.1-p22, it is worthwhile mentioning two potential tumor suppressor genes, i.e. *deleted in lung cancer* (DLEC1), previously reported as candidate tumor suppressor silenced by methylation in RCC cell lines and primary tumors and with growth inhibitory function tested in in vitro experiments (Zhang et al., 2010a), and *SET domain containing 2* (SETD2), encoding for an histone H3 methyltransferase and found affected by inactivating mutations in 12-17% of ccRCCs, together with other components of the chromatin

Although some of these genes could represent novel candidate biomarkers, their role in ccRCC etiology requires further investigations and, given the heterogeneity of tumor tissues, the functional analysis of molecular mechanisms associated to ccRCC progression should be likely conducted on primary cultures as in vitro model of ccRCC. Indeed, primary cultures from RCC and normal tissues at early passages retain the phenotypic features (Bianchi,2010; Perego, 2005) and genomic profile (Cifola et al., 2011) of corresponding original tissues, while providing a more homogeneous cytological material. The integrative analysis of molecular profiles of RCC primary cultures may be particularly useful to elucidate the role of some of the many genes and pathways found typically deregulated in

MMP2/MMP9, but the exact regulatory molecular mechanism remains to be determined (Struckmann et al., 2008). Some of the genes here mentioned have been validated at protein level, as *nicotinamide n-methyltransferase* (NNMT) and *enolase 2* (ENO2) proteins whose expression was found increased in RCC by Western blot (Teng et al., 2011). Increased cytoplasmic expression of *stanniocalcin 2* (STC2) was found correlated to other conventional indicators of RCC aggressiveness and to shorter overall survival. STC2 could become an additional tissue biomarker that may be useful in the post-operative risk stratification of RCC patients (Meyer et al., 2009). The increased expression of vascular *endothelial growth factor A* (VEGFA) was predictive of distant metastases development and lymph node involvement and was significantly associated with poor survival (Lam et al., 2005). These studies have paved the way for the development of new therapeutic agents to block VEGF signaling and the cascade of events leading to tumor formation. In a randomized phase II clinical trial on 116 metastatic ccRCC patients, the use at high doses of a neutralizing antibody against VEGFA (bevacizumab) resulted in a significant prolongation of the time to progression of disease (Yang et al., 2003).

According to the canonical classification of ccRCC (Flanigan et al., 2011), the Furhman nuclear grade is one of the most important parameters for RCC prognosis prediction (Nese et al., 2009), together with stage, age, tumor position and size, necrosis and other few molecular biomarkers (e.g., CA9). Noticeably, recent grade-dependent proteomic characterization reported that MYC, HIF-1 and p53 are the major hubs of the network obtained analyzing formalin-fixed paraffin embedded ccRCC tissues (Perroud et al., 2009). Chen et al (Chen et al., 2009) analyzed the correlation between chromosome aberrations and clinical pathological variables, including tumor stage and nuclear grade, and observed a significant association between LOH at chromosomes 9, 14q and 18q and higher nuclear grade. In the present study, we identified SOD2, KRT19 and OSM as potential gradedependent ccRCC biomarkers. Briefly, *manganese superoxide dismutase* (SOD2) belongs to the antioxidant gene family and has emerged as a key enzyme with a dual role in tumorigenic progression (Hempel et al., 2011). Recently, SOD2 has been indicated as marker for circulating tumor cells in prostate cancer (Giesing et al., 2010) and potentially predictive for lymph node metastasis in tongue squamous cell carcinoma (Liu et al., 2010). *Keratin 19* (KRT19) encodes for one of the cytoskeleton cytokeratins and has been identified as a novel candidate tumor suppressor gene epigenetically inactivated in RCC cell lines and primary tumors (Morris et al., 2008). This gene was found to be functionally related to miR-492 and crucially involved in neoplastic progression of malignant embryonic liver tumors (von Frowein et al., 2011). *Oncostatin M* (OSM) is a member of the IL-6 cytokine family implicated in signal transduction; its receptor (OSMR) was found increased at both gene copy number and expression levels in gastric cancer (Junnila et al., 2010) and cervical squamous cell carcinomas, in association with poor survival (Scotto et al., 2008). However, to our knowledge, no previous studies exist that link OSMR to renal carcinogenesis. The clinical application of these genes as potential ccRCC grade-dependent biomarkers deserves further investigation in well curate and extensive collections of ccRCC cases.

The analysis of copy number levels in a total of 53 ccRCC samples profiled with SNP arrays (Beroukhim, 2009; Cifola, 2008) identified and confirmed the typical genomic signature of ccRCC, as recently showed by higher density SNP arrays (Dalgliesh et al., 2010). The most frequent CN alterations in ccRCC samples are the deletion of 3p and the amplification of 5q.

MMP2/MMP9, but the exact regulatory molecular mechanism remains to be determined (Struckmann et al., 2008). Some of the genes here mentioned have been validated at protein level, as *nicotinamide n-methyltransferase* (NNMT) and *enolase 2* (ENO2) proteins whose expression was found increased in RCC by Western blot (Teng et al., 2011). Increased cytoplasmic expression of *stanniocalcin 2* (STC2) was found correlated to other conventional indicators of RCC aggressiveness and to shorter overall survival. STC2 could become an additional tissue biomarker that may be useful in the post-operative risk stratification of RCC patients (Meyer et al., 2009). The increased expression of vascular *endothelial growth factor A* (VEGFA) was predictive of distant metastases development and lymph node involvement and was significantly associated with poor survival (Lam et al., 2005). These studies have paved the way for the development of new therapeutic agents to block VEGF signaling and the cascade of events leading to tumor formation. In a randomized phase II clinical trial on 116 metastatic ccRCC patients, the use at high doses of a neutralizing antibody against VEGFA (bevacizumab) resulted in a significant prolongation of the time to

According to the canonical classification of ccRCC (Flanigan et al., 2011), the Furhman nuclear grade is one of the most important parameters for RCC prognosis prediction (Nese et al., 2009), together with stage, age, tumor position and size, necrosis and other few molecular biomarkers (e.g., CA9). Noticeably, recent grade-dependent proteomic characterization reported that MYC, HIF-1 and p53 are the major hubs of the network obtained analyzing formalin-fixed paraffin embedded ccRCC tissues (Perroud et al., 2009). Chen et al (Chen et al., 2009) analyzed the correlation between chromosome aberrations and clinical pathological variables, including tumor stage and nuclear grade, and observed a significant association between LOH at chromosomes 9, 14q and 18q and higher nuclear grade. In the present study, we identified SOD2, KRT19 and OSM as potential gradedependent ccRCC biomarkers. Briefly, *manganese superoxide dismutase* (SOD2) belongs to the antioxidant gene family and has emerged as a key enzyme with a dual role in tumorigenic progression (Hempel et al., 2011). Recently, SOD2 has been indicated as marker for circulating tumor cells in prostate cancer (Giesing et al., 2010) and potentially predictive for lymph node metastasis in tongue squamous cell carcinoma (Liu et al., 2010). *Keratin 19* (KRT19) encodes for one of the cytoskeleton cytokeratins and has been identified as a novel candidate tumor suppressor gene epigenetically inactivated in RCC cell lines and primary tumors (Morris et al., 2008). This gene was found to be functionally related to miR-492 and crucially involved in neoplastic progression of malignant embryonic liver tumors (von Frowein et al., 2011). *Oncostatin M* (OSM) is a member of the IL-6 cytokine family implicated in signal transduction; its receptor (OSMR) was found increased at both gene copy number and expression levels in gastric cancer (Junnila et al., 2010) and cervical squamous cell carcinomas, in association with poor survival (Scotto et al., 2008). However, to our knowledge, no previous studies exist that link OSMR to renal carcinogenesis. The clinical application of these genes as potential ccRCC grade-dependent biomarkers deserves further

investigation in well curate and extensive collections of ccRCC cases.

The analysis of copy number levels in a total of 53 ccRCC samples profiled with SNP arrays (Beroukhim, 2009; Cifola, 2008) identified and confirmed the typical genomic signature of ccRCC, as recently showed by higher density SNP arrays (Dalgliesh et al., 2010). The most frequent CN alterations in ccRCC samples are the deletion of 3p and the amplification of 5q.

progression of disease (Yang et al., 2003).

Similarly, Chen et al. detected gains of chromosome 5q33.1-qter and losses of chromosome 3p21.31-p22.3 in 58% and 80% of the 80 RCC samples analyzed using Illumina 317K SNP arrays (Chen et al., 2009), respectively. Noticeably, these regions have great influence on the expression levels of the resident genes as previously demonstrated by integrative genomic studies (Beroukhim, 2009; Bicciato, 2009; Cifola, 2008; Furge, 2004). In accordance, the comprehensive integrative analysis pinpointed that the two most significant chromosomal regions with coordinated copy number and transcriptional imbalances (SODEGIRs) are localized at the same chromosomal arms (Figure 6). Although the integrative analysis presented here was conducted using a completely different approach from that applied by Beroukhim et al. (Beroukhim et al., 2009), both studies identified 12 over-expressed genes located at the 5q peak region (GNB2L1, MGAT1, RUFY, RNF130, MAPK9, CANX, SQSTM1, LTC4S, TBC1D9B, HNRPH1, FLT4). Among them, the ubiquitin-binding protein *sequestosome 1* (SQSTM1) was found also in the focal amplification region at 5q35.3 by Chen et al. (Chen et al., 2009) and was reported over-expressed in breast and prostate tumors (Kitamura, 2006; Thompson, 2003). Moreover, we confirmed that, as previously evidenced by Cifola and co-workers (Cifola et al., 2008) and recently confirmed at proteomic level (Liu et al., 2010), *lyxyl oxidase* (LOX) is over-expressed in ccRCC. LOX is one of the critical HIF-1 targets mediating tumor progression and catalyzes the cross-linking of collagens and elastin in the extracellular matrix, thereby regulating tissue tensile strength (Erler & Giaccia, 2006). Paradoxically, LOX has been reported to be both up-regulated and down-regulated in cancer cells, especially in colorectal cancer (Baker, 2011; Pez, 2011). Mechanistic investigations revealed that LOX activates the PI3K-Akt signaling pathway, thereby upregulating HIF-1 protein synthesis in a manner requiring LOX-mediated hydrogen peroxide production. Concordantly with these results, cancer cell proliferation was stimulated by secreted and active LOX in a HIF-1-dependent fashion (Pez et al., 2011). Our data suggest that the transcriptional modulation of LOX might be also driven by genomic imbalance. Among the significant down-modulated genes located at the deleted SODEGIR on chromosome 3p14.1-p22, it is worthwhile mentioning two potential tumor suppressor genes, i.e. *deleted in lung cancer* (DLEC1), previously reported as candidate tumor suppressor silenced by methylation in RCC cell lines and primary tumors and with growth inhibitory function tested in in vitro experiments (Zhang et al., 2010a), and *SET domain containing 2* (SETD2), encoding for an histone H3 methyltransferase and found affected by inactivating mutations in 12-17% of ccRCCs, together with other components of the chromatin modification machinery (Dalgliesh et al., 2010).

Although some of these genes could represent novel candidate biomarkers, their role in ccRCC etiology requires further investigations and, given the heterogeneity of tumor tissues, the functional analysis of molecular mechanisms associated to ccRCC progression should be likely conducted on primary cultures as in vitro model of ccRCC. Indeed, primary cultures from RCC and normal tissues at early passages retain the phenotypic features (Bianchi,2010; Perego, 2005) and genomic profile (Cifola et al., 2011) of corresponding original tissues, while providing a more homogeneous cytological material. The integrative analysis of molecular profiles of RCC primary cultures may be particularly useful to elucidate the role of some of the many genes and pathways found typically deregulated in this pathology and to highlight key players in RCC biology.

Molecular Portrait of Clear Cell Renal Cell Carcinoma:

*kidney*. Cancer Res, 69(11):4674–4681.

*A3*. Am J Pathol, 176(4):1660–1670.

Nucleic Acids Res, 37(15):5057–5070.

*us?* Curr Oncol Rep, 12(3):193–201.

*patterns*. Genes Cancer, 1(2):152–163.

*instability*. Nat Rev Cancer, 8(3):180–192.

*cancer*. Cancer Metastasis Rev, 29(1):73–93.

*cancer*. Nat Rev Urol, 7(5):245–257.

*clear cell renal cell carcinoma*. J Urol, 179(2):445–449.

Cancer, 89(10):1909–1913.

890.

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 47

Becker, K. G., Hosack, D. A., Dennis, G., Lempicki, R. A., Bright, T. J., Cheadle, C., & Engel, J. (2003). *PubMatrix: a tool for multiplex literature mining.* BMC Bioinformatics, 4:61. Beroukhim, R., Brunet, J. P., Napoli, A. D., Mertz, K. D., Seeley, A., Pires, M. M., Linhart, D.,

Bianchi, C., Bombelli, S., Raimondo, F., Torsello, B., Angeloni, V., Ferrero, S., Stefano, V. D.,

Bicciato, S., Spinelli, R., Zampieri, M., Mangano, E., Ferrari, F., Beltrame, L., Cifola, I., Peano,

Brannon, A. R. & Rathmell, W. K. (2010). *Renal cell carcinoma: where will the state-of-the-art lead* 

Brannon, A. R., Reddy, A., Seiler, M., Arreola, A., Moore, D. T., Pruthi, R. S., Wallen, E. M.,

Brenner, B. M. & Rosenberg, D. (2010). *High-throughput SNP/CGH approaches for the analysis of* 

Bristow, R. G. & Hill, R. P. (2008). *Hypoxia and metabolism. hypoxia, DNA repair and genetic* 

Campbell, L., Gumbleton, M., & Griffiths, D. F. R. (2003). *Caveolin-1 overexpression predicts* 

Chari, R., Thu, K. L., Wilson, I. M., Lockwood, W. W., Lonergan, K. M., Coe, B. P., Malloff,

Chen, L., Huang, K., Han, L., Shi, Z., Zhang, K., Pu, P., Jiang, C., & Kang, C. (2011).

Chen, M., Ye, Y., Yang, H., Tamboli, P., Matin, S., Tannir, N. M., Wood, C. G., Gu, J., & Wu,

Chuang, S. T., Patton, K. T., Schafernak, K. T., Papavero, V., Lin, F., Baxter, R. C., Teh, B. T.,

*poor disease-free survival of patients with clinically confined renal cell carcinoma*. Br J

C. A., Gazdar, A. F., Lam, S., Garnis, C., MacAulay, C. E., Alvarez, C. E., & Lam, W. L. (2010). *Integrating the multiple dimensions of genomic and epigenomic landscapes of* 

*catenin/tcf-4 complex transcriptionally regulates AKT1 in glioma*. Int J Oncol, 39(4):883–

X. (2009). *Genome-wide profiling of chromosomal alterations in renal cell carcinoma using high-density single nucleotide polymorphism arrays*. Int J Cancer, 125(10):2342–2348. Chow, W. H., Dong, L. M., & Devesa, S. S. (2010). *Epidemiology and risk factors for kidney* 

& Yang, X. J. (2008). *Over expression of insulin-like growth factor binding protein 3 in* 

*-*

*genomic instability in colorectal cancer*. Mutat Res, 693(1-2):46–52.

Worrell, R. A., Moch, H., Rubin, M. A., Sellers, W. R., Meyerson, M., Linehan, W. M., Kaelin, W. G., & Signoretti, S. (2009). *Patterns of gene expression and copy-number alterations in von-Hippel Lindau disease-associated and sporadic clear cell carcinoma of the* 

Chinello, C., Cifola, I., Invernizzi, L., Brambilla, P., Magni, F., Pitto, M., Zanetti, G., Mocarelli, P., & Perego, R. A. (2010). *Primary cell cultures from human renal cortex and renal-cell carcinoma evidence a differential expression of two spliced isoforms of annexin* 

C., Solari, A., & Battaglia, C. (2009). *A computational procedure to identify significant overlap of differentially expressed and genomic imbalanced regions in cancer datasets*.

Nielsen, M. E., Liu, H., Nathanson, K. L., Ljungberg, B., Zhao, H., Brooks, J. D., Ganesan, S., Bhanot, G., & Rathmell, W. K. (2010). *Molecular stratification of clear cell renal cell carcinoma by consensus clustering reveals distinct subtypes and survival* 

#### **5. Conclusion**

As showed in this chapter, the availability of high-density molecular data as gene expression and copy number profiles, and of bioinformatics approaches for their analysis, allows depicting a finer molecular portrait of ccRCC and confirming previous findings about important genes and gene regulatory pathways associated to this renal cancer subtype. The genome-wide integration of DNA copy number data and transcriptional profiles elucidates the interplay between DNA content and global expression patterns and highlights candidate genes that are actively involved in the causation or maintenance of the malignant phenotype. Altogether, these data indicate the presence of candidate *driver* genes important for ccRCC development that undoubtedly deserve further investigation since they may constitute novel specific cancer biomarkers.

#### **6. Acknowledgment**

This work was supported by grants from the Italian Ministry of University and Research: FIRB 2003 (n. RBLA03ER38\_004); PRIN 2008 (GDB); FIRB 2007 (Rete nazionale per lo studio del proteoma umano, n. RBRN07BMCT); AIRC Special Program Molecular Clinical Oncology "5 per mille".VT is recipient of a fellow of Scuola di dottorato di medicina molecolare,Università degli Studi di Milano. SN is a PhD student of the School of Biosciences and Biotechnology, curriculum Genetics and Molecular Biology of Development, University of Padova.

#### **7. References**


As showed in this chapter, the availability of high-density molecular data as gene expression and copy number profiles, and of bioinformatics approaches for their analysis, allows depicting a finer molecular portrait of ccRCC and confirming previous findings about important genes and gene regulatory pathways associated to this renal cancer subtype. The genome-wide integration of DNA copy number data and transcriptional profiles elucidates the interplay between DNA content and global expression patterns and highlights candidate genes that are actively involved in the causation or maintenance of the malignant phenotype. Altogether, these data indicate the presence of candidate *driver* genes important for ccRCC development that undoubtedly deserve further investigation since

This work was supported by grants from the Italian Ministry of University and Research: FIRB 2003 (n. RBLA03ER38\_004); PRIN 2008 (GDB); FIRB 2007 (Rete nazionale per lo studio del proteoma umano, n. RBRN07BMCT); AIRC Special Program Molecular Clinical Oncology "5 per mille".VT is recipient of a fellow of Scuola di dottorato di medicina molecolare,Università degli Studi di Milano. SN is a PhD student of the School of Biosciences and Biotechnology, curriculum Genetics and Molecular Biology of

Akavia, U. D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H. C., Pochanard,

Arsanious, A., Bjarnason, G. A., & Yousef, G. M. (2009). *From bench to bedside: current and future applications of molecular profiling in renal cell carcinoma*. Mol Cancer, 8:20. Atkins, M., Regan, M., Stanbridge, E., Upton, M., Youmans, A., Febbo, P., Lechpammer, M.,

Baker, A. M., Cox, T. R., Bird, D., Lang, G., Murray, G. I., Sun, X. F., Southall, S. M., Wilson,

Baldewijns, M. M., van Vlodrop, I. J. H., Vermeulen, P. B., Soetekouw, P. M., van Engeland,

Banks, R. E., Tirukonda, P., Taylor, C., Hornigold, N., Astuti, D., Cohen, D., Maher, E. R.,

*clinical variables in sporadic renal cancer*. Cancer Res, 66(4):2000–2011. Banumathy, G. & Cairns, P. (2010). *Signaling pathways in renal cell carcinoma*. Cancer Biol

*metastasis of colorectal cancer*. J Natl Cancer Inst, 103(5):407–424.

P., Mozes, E., Garraway, L. A., & Pe'er, D. (2010). *An integrated approach to uncover* 

& Signoretti S. (2004). *Carbonic anhydrase IX (CAIX) expression predicts for renal cell cancer (RCC) patient response and survival to IL-2 therapy*. In Journal of Clinical

J. R., & Erler, J. T. (2011). *The role of lysyl oxidase in src-dependent proliferation and* 

M., & de Bruïne, A. P. (2010). *VHL and HIF signalling in renal cell carcinogenesis*. J

Stanley, A. J., Harnden, P., Joyce, A., Knowles, M., & Selby, P. J. (2006). *Genetic and epigenetic analysis of von Hippel-Lindau (VHL) gene alterations and relationship with* 

**5. Conclusion** 

**6. Acknowledgment** 

**7. References** 

Development, University of Padova.

they may constitute novel specific cancer biomarkers.

*drivers of cancer*. Cell, 143(6):1005–1017.

Oncology. 22(Supplement 14):4512.

Pathol, 221(2):125–138.

Ther, 10(7):658–664.


Molecular Portrait of Clear Cell Renal Cell Carcinoma:

*malignant melanoma*. Nature, 436(7047):117–122.

*radical prostatectomy*. BJU Int, 105(7):1000–1010.

*clear cell renal carcinoma*. Cancer Cell, 14(6):435–446.

*favorable prognosis*. Cancer Res, 61(21):7731–7738.

*expression profiling*. Cancer Res, 62(23):6981–6989.

 *and -2*

Res, 64(12):4117–4121.

*inducible factor-1*

26(6):907–912.

282.

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 49

Furge, K. A., Lucas, K. A., Takahashi, M., Sugimura, J., Kort, E. J., Kanayama, H., Kagawa,

Garraway, L. A., Widlund, H. R., Rubin, M. A., Getz, G., Berger, A. J., Ramaswamy, S.,

Giesing, M., Suchy, B., Driesel, G., & Molitor, D. (2010). *Clinical utility of antioxidant gene* 

Gong, K., Zhang, N., Zhang, K., & Na, Y. (2010). *The relationship of erythropoietin* 

Gordan, J. D., Lal, P., Dondeti, V. R., Letrero, R., Parekh, K. N., Oquendo, C. E., Greenberg,

Gordan, J. D., Thompson, C. B., & Simon, M. C. (2007). *HIF and c-MYC: sibling rivals for control of cancer cell metabolism and proliferation*. Cancer Cell, 12(2):108–113. Grade, M., Ghadimi, B. M., Varma, S., Simon, R., Wangsa, D., Barenboim-Stapleton, L.,

Gumz, M. L., Zou, H., Kreinest, P. A., Childs, A. C., Belmonte, L. S., LeGrand, S. N., Wu, K.

*phenotype of clear cell renal cell carcinoma*. Clin Cancer Res, 13(16):4740–4749. Gunawan, B., Huber, W., Holtrup, M., von Heydebreck, A., Efferth, T., Poustka, A., Ringert,

Harding, M. A., Arden, K. C., Gildea, J. W., Gildea, J. J., Perlman, E. J., Viars, C., &

Heidenblad, M., Lindgren, D., Veltman, J. A., Jonson, T., Mahlamäki, E. H., Gorunova, L.,

S., Hoekstra, P., Curry, J., Yang, X. J., & Teh, B. T. (2004). *Robust classification of renal cell carcinoma based on gene expression data and predicted cytogenetic profiles*. Cancer

Beroukhim, R., Milner, D. A., Granter, S. R., Du, J., Lee, C., Wagner, S. N., Li, C., Golub, T. R., Rimm, D. L., Meyerson, M. L., Fisher, D. E., & Sellers, W. R. (2005). *Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in* 

*expression levels in circulating cancer cell clusters for the detection of prostate cancer in patients with prostate-specific antigen levels of 4-10 ng/ml and disease prognostication after* 

*overexpression with von hippel-lindau tumour suppressor gene mutations between hypoxia-*

R. A., Flaherty, K. T., Rathmell, W. K., Keith, B., Simon, M. C., & Nathanson, K. L. (2008). *HIF-alpha effects on c-MYC distinguish two subtypes of sporadic VHL-deficient* 

Liersch, T., Becker, H., Ried, T., & Difilippantonio, M. J. (2006*). Aneuploidydependent massive deregulation of the cellular transcriptome and apparent divergence of the Wnt/beta-catenin signaling pathway in human rectal carcinomas*. Cancer Res, 66(1):267–

J., Luxon, B. A., Sinha, M., Parker, A. S., Sun, L. Z., Ahlquist, D. A., Wood, C. G., & Copland, J. A. (2007). *Secreted frizzled-related protein 1 loss contributes to tumor* 

R. H., Jakse, G., & Füzesi, L. (2001). *Prognostic impacts of cytogenetic findings in clear cell renal cell carcinoma: gain of 5q31-qter predicts a distinct clinical phenotype with* 

Theodorescu, D. (2002). *Functional genomic comparison of lineage-related human bladder cancer cell lines with differing tumorigenic and metastatic potentials by spectral karyotyping, comparative genomic hybridization, and a novel method of positional* 

van Kessel, A. G., Schoenmakers, E. F., & Höglund, M. (2005). *Microarray analyses reveal strong influence of DNA copy number alterations on the transcriptional patterns in* 

 *in sporadic clear cell renal carcinoma*. Int J Mol Med,


Cifola, I., Bianchi, C., Mangano, E., Bombelli, S., Frascati, F., Fasoli, E., Ferrero, S., Stefano,

Cifola, I., Spinelli, R., Beltrame, L., Peano, C., Fasoli, E., Ferrero, S., Bosari, S., Signorini, S.,

*carcinomas and integration with gene expression profile*. Mol Cancer, 7:6. Cohen, H. T. & McGovern, F. J. (2005). *Renal-cell carcinoma*. N Engl J Med, 353(23):2477–2490. Coradini, D., Casarsa, C., & Oriana, S. (2011). *Epithelial cell polarity and tumorigenesis: new perspectives for cancer detection and treatment*. Acta Pharmacol Sin, 32(5):552–564. Dai, M., Wang, P., Boyd, A. D., Kostov, G., Athey, B., Jones, E. G., Bunney, W. E., Myers, R.

*tumor tissues*. BMC Cancer, 11:244.

33(20):e175.

Nature, 463(7279):360–363.

Res, 66(21):10238–10241.

*cancer*. Nat Rev Cancer, 8(1):51–56.

V. D., Zipeto, M. A., Magni, F., Signorini, S., Battaglia, C., & Perego, R. A. (2011). *Renal cell carcinoma primary cultures maintain genomic and phenotypic profile of parental* 

Rocco, F., Perego, R., Proserpio, V., Raimondo, F., Mocarelli, P., & Battaglia, C. (2008). *Genome-wide screening of copy number alterations and LOH events in renal cell* 

M., Speed, T. P., Akil, H., Watson, S. J., & Meng, F. (2005). *Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data*. Nucleic Acids Res,

Edkins, S., Hardy, C., Latimer, C., Teague, J., Andrews, J., Barthorpe, S., Beare, D., Buck, G., Campbell, P. J., Forbes, S., Jia, M., Jones, D., Knott, H., Kok, C. Y., Lau, K. W., Leroy, C., Lin, M. L., McBride, D. J., Maddison, M., Maguire, S., McLay, K., Menzies, A., Mironenko, T., Mulderrig, L., Mudie, L., O'Meara, S., Pleasance, E., Rajasingham, A., Shepherd, R., Smith, R., Stebbings, L., Stephens, P., Tang, G., Tarpey, P. S., Turrell, K., Dykema, K. J., Khoo, S. K., Petillo, D., Wondergem, B., Anema, J., Kahnoski, R. J., Teh, B. T., Stratton, M. R., & Futreal, P. A. (2010). *Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes*.

Dalgliesh, G. L., Furge, K., Greenman, C., Chen, L., Bignell, G., Butler, A., Davies, H.,

Dang, C. V., Kim, J. W., Gao, P., & Yustein, J. (2008). *The interplay between MYC and HIF in* 

Eichelberg, C., Junker, K., Ljungberg, B., & Moch, H. (2009). *Diagnostic and prognostic* 

Erler, J. T. & Giaccia, A. J. (2006). *Lysyl oxidase mediates hypoxic control of metastasis*. Cancer

Fallarino, F., Volpi, C., Fazio, F., Notartomaso, S., Vacca, C., Busceti, C., Bicciato, S.,

Ferrari, F., Solari, A., Battaglia, C., & Bicciato, S. (2011). *Preda: an R-package to identify regional* 

Flanigan, R. C., Polcari, A. J., & Hugen, C. M. (2011). *Prognostic variables and nomograms for* 

Furge, K. A., Chen, J., Koeman, J., Swiatek, P., Dykema, K., Lucin, K., Kahnoski, R., Yang, X.

*research and clinical applicability*. Eur Urol, 55(4):851–863.

*restrains neuroinflammation*. Nat Med, 16(8):897–902.

*renal cell carcinoma*. Int J Urol, 18(1):20–31.

*renal cell carcinoma*. Cancer Res, 67(7):3171–3176.

*variations in genomic data*. Bioinformatics, 27(17):2446–2447.

*molecular markers for renal cell carcinoma: a critical appraisal of the current state of* 

Battaglia, G., Bruno, V., Puccetti, P., Fioretti, M. C., Nicoletti, F., Grohmann, U., & Marco, R. D. (2010). *Metabotropic glutamate receptor-4 modulates adaptive immunity and* 

J., & Teh, B. T. (2007). *Detection of DNA copy number changes and oncogenic signaling abnormalities from gene expression data reveals MYC activation in high-grade papillary* 


Molecular Portrait of Clear Cell Renal Cell Carcinoma:

Cancer Res, 13(2 Pt 2):703s–708s.

Lymphoma Myeloma, 9(1):90–93.

*samples*. BMC Med Genomics, 4:14.

*treatment of cancer*. Nat Med, 17(3):304–312.

*regulation*. Blood, 105(12):4613–4619.

Cancer, 10:365.

of Clinical Oncology, 23(Supplement 16):4538.

*by parametric analysis of microarray data*. BMC Cancer, 3:31.

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 51

Klatte, T., Seligson, D. B., Riggs, S. B., Leppert, J. T., Berkman, M. K., Kleid, M. D., Yu, H.,

Lam, J. S., Yu, H., Seligson, D. B., Dong, J., Horvath, S., Pantuck, A. J., Figlin, R. A. &

Law, C. L., McEarchern, J. A., & Grewal, I. S. (2009). *Novel antibody-based therapeutic agents* 

Lenburg, M. E., Liou, L. S., Gerry, N. P., Frampton, G. M., Cohen, H. T., & Christman, M. F.

Li, C. & Wong, W. H. (2001). *Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application*. Genome Biol, 2(8):RESEARCH0032. Lisovich, A., Chandran, U. R., Lyons-Weiler, M. A., LaFramboise, W. A., Brown, A. R.,

Liu, X., Wang, A., Muzio, L. L., Kolokythas, A., Sheng, S., Rubini, C., Ye, H., Shi, F., Yu, T.,

Liu, Y. H., Lin, C. Y., Lin, W. C., Tang, S. W., Lai, M. K., & Lin, J. Y. (2008). *Up-regulation of* 

Martin, F., Linden, T., Katschinski, D. M., Oehme, F., Flamme, I., Mukhopadhyay, C. K.,

Mikami, S., Katsube, K. I., Oya, M., Ishida, M., Kosaka, T., Mizuno, R., Mukai, M., & Okada,

Molina, A. M., & Motzer, R. J. (2011). *Clinical practice guidelines for the treatment of metastatic renal cell carcinoma: today and tomorrow*. Oncologist, 16 Suppl 2:45–50.

*a critical role in cancer cell tumorigenesis*. J Immunol, 181(9):6584–6594. Majewski, I. J. & Bernards, R. (2011). *Taming the dragon: genomic biomarkers to individualize the* 

Maxwell, P. H. (2005). *The HIF pathway in cancer*. Semin Cell Dev Biol, 16(4-5):523–530. Meyer, H. A., Tölle, A., Jung, M., Fritzsche, F. R., Haendler, B., Kristiansen, I., Gaspert, A.,

*prognostic marker in renal cell carcinoma*. Eur Urol, 55(3):669–678.

*associated with cancer invasion and prognosis*. Lab Invest .

*1 alpha in clear cell renal cell carcinoma*. Clin Cancer Res, 13(24):7388–7393. Lam, J. S., Pantuck, A. J., Belldegrun, A. S., & Figlin, R. A. (2007). *Protein expression profiles in* 

Kabbinavar, F. F., Pantuck, A. J., & Belldegrun, A. S. (2007). *Hypoxia-inducible factor* 

*renal cell carcinoma: staging, prognosis, and patient selection for clinical trials*. Clin

Belldegrun A. S. (2005). *Expression of the vascular endothelial growth factor family in tumor dissemination and disease free survival in clear cell renal cell carcinoma*. In Journal

*targeting CD70: a potential approach for treating Waldenström's macroglobulinemia*. Clin

(2003). *Previously unidentified changes in renal cell carcinoma gene expression identified* 

Jakacki, R. I., Pollack, I. F., & Sobol, R. W. (2011). *A novel SNP analysis method to detect copy number alterations with an unbiased reference signal directly from tumor* 

Crowe, D. L., & Zhou, X. (2010). *Deregulation of manganese superoxide dismutase (SOD2) expression and lymph node metastasis in tongue squamous cell carcinoma*. BMC

*vascular endothelial growth factor-D expression in clear cell renal cell carcinoma by CD74:* 

Eckhardt, K., Tröger, J., Barth, S., Camenisch, G., & Wenger, R. H. (2005). *Copperdependent activation of hypoxia-inducible factor (HIF)-1: implications for ceruloplasmin* 

Johannsen, M., Jung, K., & Kristiansen, G. (2009). *Identification of stanniocalcin 2 as* 

Y. (2011). *Expression of snail and slug in renal cell carcinoma: E-cadherin repressor snail is* 

*pancreatic cancer: implications for the interpretation of genomic amplifications*. Oncogene, 24(10):1794–1801.


Hempel, N., Carrico, P. M., & Melendez, J. A. (2011). *Manganese superoxide dismutase (SOD2)* 

Höglund, M., Gisselsson, D., Soller, M., Hansen, G. B., Elfving, P., & Mitelman, F. (2004).

Huang, D. W., Sherman, B. T., & Lempicki, R. A. (2009). *Bioinformatics enrichment tools: paths* 

Huang, D. W., Sherman, B. T., & Lempicki, R. A. (2009a). *Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources*. Nat Protoc, 4(1):44–57. Hyman, E., Kauraniemi, P., Hautaniemi, S., Wolf, M., Mousses, S., Rozenblum, E., Ringnér,

Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., &

Jones, J., Otu, H., Spentzos, D., Kolia, S., Inan, M., Beecken, W. D., Fellbaum, C., Gu, X.,

Junnila, S., Kokkola, A., Karjalainen-Lindsberg, M. L., Puolakkainen, P., & Monni, O. (2010).

Kaelin, W. G. (2007). *The von Hippel-Lindau tumor suppressor protein and clear cell renal* 

Kaelin, W. G. (2008). *The von Hippel-Lindau tumour suppressor protein: O2 sensing and cancer*.

Kim, D. S., Choi, Y. P., Kang, S., Gao, M. Q., Kim, B., Park, H. R., Choi, Y. D., Lim, J. B., Na,

Kitamura, H., Torigoe, T., Asanuma, H., Hisasue, S. I., Suzuki, K., Tsukamoto, T., Satoh, M.,

Klatte, T., Rao, P. N., de Martino, M., LaRochelle, J., Shuch, B., Zomorodian, N., Said, J.,

*oligonucleotide array probe level data*. Biostatistics, 4(2):249–264.

*cytogenetic data*. Cancer Genet Cytogenet, 153(1):1–9.

24(10):1794–1801.

37(1):1–13.

Chem, 11(2):191–201.

Res, 62(21):6240–6245.

173(6):2150–2153.

Res, 9(7):3710–3719.

753.

*gastric cancer cell lines*. BMC Cancer, 10:73.

Nat Rev Cancer, 8(11):865–873.

*tissue*. Histopathology, 48(2):157–161.

*carcinoma*. Clin Cancer Res, 13(2 Pt 2):680s–684s.

*pancreatic cancer: implications for the interpretation of genomic amplifications*. Oncogene,

*and redox-control of signaling events that drive metastasis*. Anticancer Agents Med

*Dissecting karyotypic patterns in renal cell carcinoma: an analysis of the accumulated* 

*toward the comprehensive functional analysis of large gene lists*. Nucleic Acids Res,

M., Sauter, G., Monni, O., Elkahloun, A., Kallioniemi, O. P., & Kallioniemi, A. (2002). *Impact of DNA amplification on gene expression patterns in breast cancer.* Cancer

Speed, T. P. (2003). *Exploration, normalization, and summaries of high density* 

Joseph, M., Pantuck, A. J., Jonas, D., & Libermann, T. A. (2005). *Gene signatures of progression and metastasis in renal cell cancer*. Clin Cancer Res, 11(16):5730–5739. Junker, K., Hindermann, W., von Eggeling, F., Diegmann, J., Haessler, K., & Schubert, J.

(2005). *CD70: a new tumor specific biomarker for renal cell carcinoma*. J Urol,

*Genome-wide gene copy number and expression analysis of primary gastric tumors and* 

H. J., Kim, H. K., Nam, Y. P., Moon, M. H., Yun, H. R., Lee, D. H., Park, W. M., & Cho, N. H. (2010). *Panel of candidate biomarkers for renal cell carcinoma*. J Proteome

& Sato, N. (2006). *Cytosolic overexpression of p62 sequestosome 1 in neoplastic prostate* 

Kabbinavar, F. F., Belldegrun, A. S., & Pantuck, A. J. (2009). *Cytogenetic profile predicts prognosis of patients with clear cell renal cell carcinoma*. J Clin Oncol, 27(5):746–


Molecular Portrait of Clear Cell Renal Cell Carcinoma:

*signaling pathways*. Cell Cycle, 9(9):1722–1728.

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 53

Pollack, J. R., Sørlie, T., Perou, C. M., Rees, C. A., Jeffrey, S. S., Lonning, P. E., Tibshirani, R.,

Redova, M., Svoboda, M., & Slaby, O. (2011). *MicroRNAs and their target gene networks in renal* 

Saramäki, O. R., Porkka, K. P., Vessella, R. L., & Visakorpi, T. (2006). *Genetic aberrations in* 

Sato, E., Torigoe, T., Hirohashi, Y., Kitamura, H., Tanaka, T., Honma, I., Asanuma, H.,

Scotto, L., Narayan, G., Nandula, S. V., Subramaniyam, S., Kaufmann, A. M., Wright, J. D.,

Seelow, D., Schwarz, J. M., & Schuelke, M. (2008). *GeneDistiller–distilling candidate genes from* 

Seliger, B., Dressler, S. P., Massa, C., Recktenwald, C. V., Altenberend, F., Bukur, J.,

Seliger, B., Dressler, S. P., Wang, E., Kellner, R., Recktenwald, C. V., Lottspeich, F.,

Semenza, G. L. (2008). *Hypoxia-inducible factor 1 and cancer pathogenesis*. IUBMB Life,

Shi, T., Dong, F., Liou, L. S., Duan, Z. H., Novick, A. C., & DiDonato, J. A. (2004). *Differential* 

Shimazui, T., Yoshikawa, K., Uemura, H., Hirao, Y., Saga, S., & Akaza, H. (2004). *The level of* 

Skubitz, K. M., Zimmermann, W., Zimmerman, W., Kammerer, R., Pambuccian, S., &

*biomarkers in renal cell carcinoma.* Proteomics, 9(6):1567–1581.

*protein profiling in renal-cell carcinoma.* Mol Carcinog, 40(1):47–61.

*immunotherapy of renal cell carcinoma*. Clin Cancer Res, 14(21):6916–6923. Schäfer, M., Schwender, H., Merk, S., Haferlach, C., Ickstadt, K., & Dugas, M. (2009).

*of equally directed abnormalities*. Bioinformatics, 25(24):3228–3235.

*expressed genes, including Drosha*. Mol Cancer, 7:58.

*linkage intervals*. PLoS One, 3(12):e3874.

Proteomics, 11(12):2528–2541.

60(9):591–597.

Clin Med, 140(1):52–64.

*carcinoma*. J Lab Clin Med, 147(5):250–267.

*breast tumors.* Proc Natl Acad Sci U S A, 99(20):12963–12968.

*cell carcinoma*. Biochem Biophys Res Commun, 405(2):153–156.

*prostate cancer by microarray analysis*. Int J Cancer, 119(6):1322–1329.

*lysyl oxidase activates HIF-1 via the AKT pathway in a positive regulation loop and synergizes with HIF-1 in promoting tumor cell growth*. Cancer Res, 71(5):1647–1657. Podar, K. & Anderson, K. C. (2010). *A therapeutic role for targeting c-MYC/HIF-1-dependent* 

Botstein, D., Børresen-Dale, A. L., & Brown, P. O. (2002). *Microarray analysis reveals a major direct role of dna copy number alteration in the transcriptional program of human* 

Harada, K., Takasu, H., Masumori, N., Ito, N., Hasegawa, T., Tsukamoto, T., & Sato, N. (2008). *Identification of an immunogenic ctl epitope of HIFPH3 for* 

*Integrated analysis of copy number alterations and gene expression: a bivariate assessment* 

Pothuri, B., Mansukhani, M., Schneider, A., Arias-Pulido, H., & Murty, V. V. (2008). *Integrative genomics analysis of chromosome 5p gain in cervical cancer reveals target over-*

Marincola, F. M., Wang, E., Stevanovic, S., & Lichtenfels, R. (2011). *Identification and characterization of human leukocyte antigen class I ligands in renal cell carcinoma cells*.

Marincola, F. M., Baumgärtner, M., Atkins, D., & Lichtenfels, R. (2009). *Combined analysis of transcriptome and proteome data as a tool for the identification of candidate* 

*cadherin-6 mRNA in peripheral blood is associated with the site of metastasis and with the subsequent occurrence of metastases in renal cell carcinoma*. Cancer, 101(5):963–968. Skubitz, K. M. & Skubitz, A. P. N. (2002). *Differential gene expression in renal-cell cancer*. J Lab

Skubitz, A. P. N. (2006). *Differential gene expression identifies subgroups of renal cell* 


Morris, M. R., Gentle, D., Abdulrahman, M., Clarke, N., Brown, M., Kishida, T., Yao, M.,

Motzer, R. J. (2011). *New perspectives on the treatment of metastatic renal cell carcinoma: an* 

Nannya, Y., Sanada, M., Nakazaki, K., Hosoya, N., Wang, L., Hangaishi, A., Kurokawa, M.,

Nese, N., Paner, G. P., Mallin, K., Ritchey, J., Stewart, A., & Amin, M. B. (2009). *Renal cell* 

*in 47,909 cases using the national cancer data base*. Ann Diagn Pathol, 13(1):1–8. Teng, P. N., Hood, B. L., Sun, M., Dhir, R., & Conrads, T. P. (2011). *Differential proteomic* 

Osunkoya, A. O., Yin-Goen, Q., Phan, J. H., Moffitt, R. A., Stokes, T. H., Wang, M. D., &

Pantuck, A. J., Zeng, G., Belldegrun, A. S., & Figlin, R. A. (2003). *Pathobiology, prognosis, and* 

Pantuck, A. J., Fang, Z., Liu, X., Seligson, D. B., Horvath, S., Leppert, J. T., Belldegrun, A. S.,

Paul, R., Necknig, U., Busch, R., Ewing, C. M., Hartung, R., & Isaacs, W. B. (2004). *Cadherin-6: a new prognostic marker for renal cell carcinoma*. J Urol, 171(1):97–101. Perego, R. A., Bianchi, C., Corizzato, M., Eroini, B., Torsello, B., Valsecchi, C., Fonzo, A. D.,

Perego, R. A., Corizzato, M., Brambilla, P., Ferrero, S., Bianchi, C., Fasoli, E., Signorini, S.,

Perroud, B., Ishimaru, T., Borowsky, A. D., & Weiss, R. H. (2009). *Grade-dependent proteomics* 

Pez, F., Dayan, F., Durivault, J., Kaniewski, B., Aimond, G., Provost, G. S. L., Deux, B.,

*characterization of kidney cancer*. Mol Cell Proteomics, 8(5):971–985.

*introduction and historical overview*. Oncologist, 16 Suppl 2:1–3.

*genotyping arrays*. Cancer Res, 65(14):6071–6079.

98(2):496–501.

Ther, 9(12):3115–3125.

4(5):1503–1510.

Cancer, 44(7):1039–1047.

Cancer Res, 9(13):4641–4652.

Clinical Oncology*. 23(Supplement 16):4535*.

Teh, B. T., Latif, F., & Maher, E. R. (2008). *Functional epigenomics approach to identify methylated candidate tumour suppressor genes in renal cell carcinoma*. Br J Cancer,

Chiba, S., Bailey, D. K., Kennedy, G. C., & Ogawa, S. (2005). *A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism* 

*carcinoma: assessment of key pathologic prognostic parameters and patient characteristics* 

*analysis of renal cell carcinoma tissue interstitial fluid*. J Proteome Res, 10(3):1333–1342.

Young, A. N. (2009). *Diagnostic biomarkers for renal cell carcinoma: selection using novel bioinformatics systems for microarray data analysis*. Hum Pathol, 40(12):1671–1678. Pal, S. K., Kortylewski, M., Yu, H., & Figlin, R. A. (2010). *Breaking through a plateau in renal* 

*cell carcinoma therapeutics: development and incorporation of biomarkers*. Mol Cancer

*targeted therapy for renal cell carcinoma: exploiting the hypoxia-induced pathway*. Clin

& Figlin, R. A. (2005). *Gene expression and tissue microarray analysis of interleukin-2 complete responders in patients with metastatic renal cell carcinoma*. In Journal of

Cordani, N., Favini, P., Ferrero, S., Pitto, M., Sarto, C., Magni, F., Rocco, F., & Mocarelli, P. (2005). *Primary cell cultures arising from normal kidney and renal cell carcinoma retain the proteomic profile of corresponding tissues*. J Proteome Res,

Torsello, B., Invernizzi, L., Bombelli, S., Angeloni, V., Pitto, M., Battaglia, C., Proserpio, V., Magni, F., Galasso, G., & Mocarelli, P. (2008). *Concentration and microsatellite status of plasma DNA for monitoring patients with renal carcinoma*. Eur J

Clézardin, P., Sommer, P., Pouysségur, J., & Reynaud, C. (2011). *The HIF-1-inducible* 

*lysyl oxidase activates HIF-1 via the AKT pathway in a positive regulation loop and synergizes with HIF-1 in promoting tumor cell growth*. Cancer Res, 71(5):1647–1657.


Molecular Portrait of Clear Cell Renal Cell Carcinoma:

225.

387.

158(5):1639–1651.

*metastatic hepatoblastoma*. Hepatology, 53(3):833–842.

*protein response in cancer*. Nat Rev Cancer, 8(11):851–864.

*sensing pathway*. Nat Med, 15(3):319–324.

*microarrays*. Am J Hum Genet, 81(1):114–126.

*neoplasm*s. Histol Histopathol, 21(3):325–339.

*individual tumours*. Int J Biol Sci, 5(6):517–527.

*cancer.* N Engl J Med, 349(5):427–434.

An Integrative Analysis of Gene Expression and Genomic Copy Number Profiling 55

Waalkes, S., Eggers, H., Blasig, H., Atschekzei, F., Kramer, M. W., Hennenlotter, J.,

Wang, Y., Roche, O., Yan, M. S., Finak, G., Evans, A. J., Metcalf, J. L., Hast, B. E., Hanna, S.

Wouters, B. G. & Koritzinsky, M. (2008). *Hypoxia signalling through mTOR and the unfolded* 

Yamamoto, G., Nannya, Y., Kato, M., Sanada, M., Levine, R. L., Kawamata, N., Hangaishi,

Yang, J. C., Haworth, L., Sherry, R. M., Hwu, P., Schwartzentruber, D. J., Topalian, S. L.,

Yao, M., Tabuchi, H., Nagashima, Y., Baba, M., Nakaigawa, N., Ishiguro, H., Hamada, K.,

Yin-Goen, Q., Dale, J., Yang, W. L., Phan, J., Moffitt, R., Petros, J. A., Datta, M. W., Amin, M.

Yoshimoto, T., Matsuura, K., Karnan, S., Tagawa, H., Nakada, C., Tanigawa, M., Tsukamoto,

Young, A. N., Amin, M. B., Moreno, C. S., Lim, S. D., Cohen, C., Petros, J. A., Marshall, F. F.,

Yusenko, M. V., Zubakov, D., & Kovacs, G. (2009). *Gene expression profiling of chromophobe* 

Zhang, Q., Ying, J., Li, J., Fan, Y., Poon, F. F., Ng, K. M., Tao, Q., & Jin, J. (2010). *Aberrant* 

*is associated with more advanced tumor stage*. J Urol, 184(2):731–737.

*gene expression in renal clear cell carcinoma*. J Pathol, 213(4):392–401.

(2011). *MicroRNA-492 is processed from the keratin 19 gene and up-regulated in* 

Tränkenschuh, W., Stenzl, A., Serth, J., Schrader, A. J., Kuczyk, M. A., & Merseburger, A. S. (2011). *Caveolin 1 mRNA is overexpressed in malignant renal tissue and might serve as a novel diagnostic marker for renal cancer*. Biomark Med, 5(2):219–

C., Wondergem, B., Furge, K. A., Irwin, M. S., Kim, W. Y., Teh, B. T., Grinstein, S., Park, M., Marsden, P. A., & Ohh, M. (2009). *Regulation of endocytosis via the oxygen-*

A., Kurokawa, M., Chiba, S., Gilliland, D. G., Koeffler, H. P., & Ogawa, S. (2007). *Highly sensitive method for genomewide detection of allelic composition in nonpaired, primary tumor specimens by use of Affymetrix single-nucleotide-polymorphism genotyping* 

Steinberg, S. M., Chen, H. X., & Rosenberg, S. A. (2003). *A randomized trial of bevacizumab, an anti-vascular endothelial growth factor antibody, for metastatic renal* 

Inayama, Y., Kishida, T., Hattori, K., Yamada-Okabe, H., & Kubota, Y. (2005). *Gene expression analysis of renal carcinoma: adipose differentiation-related protein as a potential diagnostic and prognostic biomarker for clear-cell renal carcinoma*. J Pathol, 205(3):377–

B., Wang, M ., & Young, A. N. (2006). *Advances in molecular classification of renal* 

Y., Uchida, T., Kashima, K., Akizuki, S., Takeuchi, I., Sato, F., Mimata, H., Seto, M., & Moriyama, M. (2007). *High-resolution analysis of DNA copy number alterations and* 

& Neish, A. S. (2001). *Expression profiling of renal epithelial neoplasms: a method for tumor classification and discovery of diagnostic molecular markers*. Am J Pathol,

*renal cell carcinomas and renal oncocytomas by Affymetrix genechip using pooled and* 

*promoter methylation of DLEC1, a critical 3p22 tumor suppressor for renal cell carcinoma,* 


Sültmann, H., von Heydebreck, A., Huber, W., Kuner, R., Buness, A., Vogt, M., Gunawan,

Staller, P., Sulitkova, J., Lisztwan, J., Moch, H., Oakeley, E. J., & Krek, W. (2003). *Chemokine* 

Stickel, J. S., Weinzierl, A. O., Hillen, N., Drews, O., Schuler, M. M., Hennenlotter, J., Wernet,

Struckmann, K., Mertz, K., Steu, S., Storz, M., Staller, P., Krek, W., Schraml, P., & Moch, H.

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A.,

Takahashi, M., Papavero, V., Yuhas, J., Kort, E., Kanayama, H. O., Kagawa, S., Baxter, R. C.,

Tanaka, T., Kitamura, H., Torigoe, T., Hirohashi, Y., Sato, E., Masumori, N., Sato, N., &

Tang, S. W., Chang, W. H., Su, Y. C., Chen, Y. C., Lai, Y. H., Wu, P. T., Hsu, C. I., Lin, W. C.,

Thiery, J. P. (2003). *Epithelial-mesenchymal transitions in development and pathologies*. Curr Opin

Thiery, J. P., Acloque, H., Huang, R. Y. J., & Nieto, M. A. (2009). *Epithelial-mesenchymal* 

Thompson, H. G. R., Harris, J. W., Wold, B. J., Lin, F., & Brody, J. P. (2003). *p62 overexpression* 

Tostain, J., Li, G., Gentil-Perret, A., & Gigante, M. (2010). *Carbonic anhydrase 9 in clear cell* 

Truong, L. D. & Shen, S. S. (2011). *Immunohistochemical diagnosis of renal neoplasms*. Arch

Tun, H. W., Marlow, L. A., von Roemeling, C. A., Cooper, S. J., Kreinest, P., Wu, K., Luxon,

*human clear-cell renal cell carcinoma*. J Pathol, 214(4):464–471.

*expression profiles*. Proc Natl Acad Sci U S A, 102(43):15545–15550.

*axis in clear cell renal cell carcinoma.* Int J Oncol, 26(4):923–931.

*transitions in development and disease*. Cell, 139(5):871–890.

Cancer Res, 11(2 Pt 1):646–655.

Immunother, 58(9):1407–1417.

425(6955):307–311.

137(5):789–794.

273(1):35–43.

Cell Biol, 15(6):740–746.

Oncogene, 22(15):2322–2333.

Pathol Lab Med, 135(1):92–109.

46(18):3141–3148.

B., Vingron, M., Füzesí, L., & Poustka, A. (2005). *Gene expression in kidney cancer is associated with cytogenetic abnormalities, metastasis formation, and patient survival*. Clin

*receptor CXCR4 downregulated by von Hippel-Lindau tumour suppressor pVHL*. Nature,

D., Müller, C. A., Stenzl, A., Rammensee, H.-G., & Stevanović, S. (2009). *HLA ligand profiles of primary renal cell carcinoma maintained in metastases*. Cancer Immunol

(2008). *pVHL co-ordinately regulates CXCR4/CXCL12 and MMP2/MMP9 expression in* 

Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., & Mesirov, J. P. (2005). *Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide* 

Yang, X. J., Gray, S. G., & Teh, B. T. (2005). *Altered expression of members of the IGF-*

Tsukamoto, T. (2011). *Autoantibody against hypoxia-inducible factor prolyl hydroxylase-3 is a potential serological marker for renal cell carcinoma*. J Cancer Res Clin Oncol,

Lai, M. K., & Lin, J. Y. (2009). *MYC pathway is activated in clear cell renal cell carcinoma and essential for proliferation of clear cell renal cell carcinoma cells*. Cancer Lett,

*in breast tumors and regulation by prostate-derived ETS factor in breast cancer cells*.

*renal cell carcinoma: a marker for diagnosis, prognosis and treatment*. Eur J Cancer,

B. A., Sinha, M., Anastasiadis, P. Z., & Copland, J. A. (2010). *Pathway signature and cellular differentiation in clear cell renal cell carcinoma*. PLoS One, 5(5):e10696. von Frowein, J., Pagel, P., Kappler, R., von Schweinitz, D., Roscher, A., & Schmid, I. (2011). *MicroRNA-492 is processed from the keratin 19 gene and up-regulated in metastatic hepatoblastoma*. Hepatology, 53(3):833–842.


**3** 

*Australia* 

**The VHL-HIF Signaling in Renal Cell** 

Christudas Morais1\*, David W. Johnson1,2 and Glenda C. Gobe1

*1Centre for Kidney Disease Research, School of Medicine, The University of Queensland,* 

Renal cell carcinoma (RCC) is the third most common genitourinary cancer behind prostate and bladder cancer, accounting for 3% of all adult malignancies (Curti, 2004). It is a highly metastatic and heterogeneous disease with at least 16 histologic subtypes (Eble et al., 2001; Lopez-Beltran et al., 2006), among which clear cell (70-80%), papillary (10-15%) and chromophobe (5%) are the most common (Curti, 2004). Up to 25% of patients with RCC have distant metastases at presentation. Another 50% develop metastases or local recurrence during follow-up, despite treatment of the primary tumor (Thyavihally et al., 2005). The average survival, following metastatic RCC, is about 4 months, and only 10% of patients survive for one year. The global incidence of RCC per year is close to 300000, with a male to female ratio of 3:2 and an estimated mortality of approximately 100000 (Arai&Kanai, 2010; Ferlay et al., 2010). The incidence of RCC has been rising steadily, probably because of incidental findings from imaging techniques performed for other reasons. It can occur at any age, but is most frequently diagnosed in the 40-70 year old group (Eble et al., 2001; Pascual

Well before the advent of the modern era of genetics and molecular biology, surgeons and pathologists were aware of the hyper-vascular nature of RCC (Corn, 2007). The subsequent isolation of the von Hippel-Lindau (VHL) gene in 1993 led to the important discoveries that aberrant VHL is the most important risk for RCC and that VHL negatively regulates the hypoxia inducible factor (HIF) and thus the downstream angiogenesis pathway thereby engendering increased vascularity. In this chapter, we will focus on the role of the VHL-HIF pathway in RCC, advancements in novel therapeutics targeting this pathway and future

VHL disease, commonly known as the VHL syndrome, is named after the German ophthalmologist Eugene von Hippel, and the Swedish neuropathologist Arvid Lindau, who

**1. Introduction** 

directions.

Corresponding Author

 \*

& Borque, 2008; Arai&Kanai, 2010).

**2. Von Hippel-Lindau syndrome** 

**Carcinoma: Promises and Pitfalls** 

*2Department of Renal Medicine, The University of Queensland at* 

*Princess Alexandra Hospital, Brisbane,* 

