**6. Deep sequencing of HCC using next-generation sequencing technologies**

Current advances in genomics technologies have been first seen as revolution of microarrays and then recently appeared as high-throughput parallel sequencing techniques. The resolute advance of fluorescence-based standard Sanger technique seemingly stretched to its limits for technical enhancements. At the same time soaring demand for low-cost and high-output sequencing has driven the development of superior technologies that allow massively parallel sequencing processes, producing millions and billions of sequences at once [73-74]. Therefore, it was inevitable to see the replacement of standard sequencing methods to newly emerging advanced sequencing technologies called next generation sequencing technologies. These technologies initially appeared as relatively high-cost difficult techniques for practical use and believed to be useful for only whole genome sequencing of different species. But soon these perceptions were evaded radically. Today, the state of DNA sequencing technologies is in a greater flux than ever before. With this foreseeable evolution, comes new possibilities not only in the field of large-scale genomic sciences from medicine to agriculture and plant sciences coupled with new challenges in data storage and analysis, but also for practical use such as clinical utilization for routine diagnostics[74-78]. Currently, several methods are already established, made significant impact on the field of genomics by having reputable track record for many different published applications and some are in the process of building confidence, some are yet to be tested and perfected[78-84].

The first study of HCC using the next-generation sequencing technology for deep sequencing appeared most recently [5]. Using Illumina's Genome Analyzer IIx system, also called GAIIx, Totoki *et al.* sequenced genomic libraries from a normal Japanese male and hepatitis C-positive HCC sample. Both samples' sequence reads had an almost complete match to a human reference sequence covering 99.79% and 99.69% for lymphocyte (normal male) and HCC sample genomes, respectively. Nearly ~3 million nucleotide variations were recorded from each genome, yielding 84,555 bases more variations in lymphocyte genome perhaps due to presence of chromosomal alterations in tumor genome and 11,731 of these changes in HCC were somatically acquired. There were several interesting results related to nucleotide changes in the study. First, it was found that occurrence of somatic substitutions was varied between genic and intergenic regions, significantly lower in the genic regions (consisting of coding and noncoding exons, and introns) in comparison to its counterpart

mir-221 and mir-21 [59]. Murakami *et al.* identified eight miRNAs with altered expression in HCC, which discriminated HCC samples from non-tumor with 97.8% accuracy [66]. Similarly, Huang *et al.* identified 24 abberrantly expressed miRNAs [67]. Toffanin *et al.* studied miRNA profiling of HCC samples that was previously profiled for mRNA and copy number (CN) changes [65]. The authors identified three subclasses of HCC based on miRNA profiles. The other studies identified miRNA signatures that predicted metastasis potential, recurrence and survival [64, 68-69]. Since the miRNAs are stable in blood, more recently, the circulating miRNAs have been reported as diagnostic markers for various cancers, including the HCC [70-72].Therefore, identification of miRNAs and their protein-coding target genes is important to understand the mechanisms of hepatocarcinogenesis, and reveals new

**6. Deep sequencing of HCC using next-generation sequencing technologies**  Current advances in genomics technologies have been first seen as revolution of microarrays and then recently appeared as high-throughput parallel sequencing techniques. The resolute advance of fluorescence-based standard Sanger technique seemingly stretched to its limits for technical enhancements. At the same time soaring demand for low-cost and high-output sequencing has driven the development of superior technologies that allow massively parallel sequencing processes, producing millions and billions of sequences at once [73-74]. Therefore, it was inevitable to see the replacement of standard sequencing methods to newly emerging advanced sequencing technologies called next generation sequencing technologies. These technologies initially appeared as relatively high-cost difficult techniques for practical use and believed to be useful for only whole genome sequencing of different species. But soon these perceptions were evaded radically. Today, the state of DNA sequencing technologies is in a greater flux than ever before. With this foreseeable evolution, comes new possibilities not only in the field of large-scale genomic sciences from medicine to agriculture and plant sciences coupled with new challenges in data storage and analysis, but also for practical use such as clinical utilization for routine diagnostics[74-78]. Currently, several methods are already established, made significant impact on the field of genomics by having reputable track record for many different published applications and some are in the process of building

The first study of HCC using the next-generation sequencing technology for deep sequencing appeared most recently [5]. Using Illumina's Genome Analyzer IIx system, also called GAIIx, Totoki *et al.* sequenced genomic libraries from a normal Japanese male and hepatitis C-positive HCC sample. Both samples' sequence reads had an almost complete match to a human reference sequence covering 99.79% and 99.69% for lymphocyte (normal male) and HCC sample genomes, respectively. Nearly ~3 million nucleotide variations were recorded from each genome, yielding 84,555 bases more variations in lymphocyte genome perhaps due to presence of chromosomal alterations in tumor genome and 11,731 of these changes in HCC were somatically acquired. There were several interesting results related to nucleotide changes in the study. First, it was found that occurrence of somatic substitutions was varied between genic and intergenic regions, significantly lower in the genic regions (consisting of coding and noncoding exons, and introns) in comparison to its counterpart

biomarkers for diagnosis, prognosis and therapeutic targets.

confidence, some are yet to be tested and perfected[78-84].

intergenic regions. This was explained either by negative selection of lethal mutations in the genic regions or by the existence of specific molecules responsible for the repair of transcribed region. Second, presence of germline variations was significantly lesser in the coding regions relative to the non-coding regions. Third, the ratio of nonsynonymous to synonymous variations (N/NS) either somatic or germline origins differed in HCC and was significantly lower than that of somatically originated substitutions. To explain this, authors highlighted the influence of positive selections happening in exons causing survival of tumor cells or favored negative selection of somatic variations over germline substitutions on the coding exons. Fourth, the preferred somatic substitutions included T>C/A>G and C>T/G>A transitions. Fifth, in addition to 81 confirmed somatic substitutions common to both genomes (all in protein coding regions), 670 small deletions and insertions were identified and seven of which were validated. Among these variations, some of the changes seemed more critical since they were located on the previously annotated tumor suppressor genes for HCC and other cancer types. Moreover, authors decided to resequence exons potentially harboring malignant changes in 96 HCC and control samples as well as 21 HCC cell lines. These efforts yielded two critical somatic mutations p.Phe190Leu and p.Gln212X in *LRRC30.* 

Besides the nucleotide changes, small deletions and insertions, 22 verified chromosomal rearrangements were identified. These rearrangements were mostly intra-chromosomal and in close proximity with some known copy number regions. These chromosomal rearrangements led four different fusion transcripts that involve transcriptional regulation of BCORL1-ELF4 [5]. Then using the deep whole exome sequencing approach (76X or more coverage) a nonsense mutation in TSC1 gene was also identified in a subset of tumor cells.

As demonstrated in this study, further next-generation sequencing studies have the potential to reveal novel genes/mutations and likely critical pathways that can be utilized for the biomarker discovery and identification of novel therapeutic targets for HCC. Besides, the next-generation technologies have already been proven to be useful for genomic studies on some cancers [85-94]. Moreover, once affordable prices are reached, such next generation sequencing techniques will create an amazing opportunity to look for genome-wide DNA and/or RNA level differences and methylation patterns in many cancer types at an affordable cost and will open doors for daily diagnostics and personalized medicine [86] [95-96].
