**4. Transcriptomics**

provide a global understanding of the gene repertoire of a given species or genus, in order to elucidate the essential genes that are involved in processes such as replication, transcription and translation, in addition to the genes considered as accessory, that are also important for the characterization of variabilities in their genetic patterns, as well as allows the analysis of

In another aspect, comparative analyzes between different strains within the same phylogenetic clade make it possible to recognize similarities and differences among genomes, to clarify which sequences are capable of diverging phenotypic changes in organisms, and to elucidate the mechanisms of virulence among pathogenic organisms or in in the case of environmental microorganisms. From this premise, the pan-genome concept emerged [37].

Regarding *C. ulcerans*, a study was conducted with 19 strains identifying 4120 genes composing the pan-genome, of which 1405 were present in the core genome and 2715 present in the accessory genome, where proteins involved in the pili formation and the *tox* gene were found in a large part of the genomes. Furthermore, variations between the transmembrane proteins and proteins secreted among the different species have been identified, contributing to the variability of the pathogenicity between them. This study made a greater understanding pos-

The pan-genome is constituted by the core genome, which configures the genes present among all analyzed strains; the accessory genome that shares genes between two or more, but not all, strains and includes the genes the bacteria needs to survive in a specific environment, in addition to species-specific genes belonging to a single lineage, which can be acquired via horizontal transfer [37, 39]. The representatives of the genus *Corynebacterium* become an interesting object of studies of comparative genomics and evolution, due to its

This approach was used in *C. jeikeium* by comparing 17 plasmids from different clinical isolates, which identified that plasmid pK43 can act as a natural vehicle for gene transfer conferring antimicrobial resistance between multiresistant strains and possibly between other

In *C. pseudotuberculosis*, the pan-genome of 15 strains revealed differences between the biovars of this species, in which the biovar *ovis* presented clonal behavior, while the *equi* group has a greater genetic diversity [42]. Recently, a study with strains isolated from equines was analyzed and corroborated the diversity of the biovar, also presenting a wide repertoire of resistance genes and virulence factors such as: beta-lactamases, recombination endonucleases

In a comparative analysis between *Corynebacterium jeikeium, Corynebacterium urealyticum, Corynebacterium kroppenstedtii, Corynebacterium resistens* and *Corynebacterium variabile,* it was possible to identify 83 regulatory genes, being 56 of transcriptional DNA binding regulators and nine sigma factors. Furthermore, 44 regulatory proteins were identified that were present in the core genome. These genes shared by the strains are involved in the generation of short-chain volatile acids, which are related to the odor formation process of the human body,

showing the importance of this approach in lipophilic corynebacteria [44].

members of the corynebacteria group, such as *C. diphtheriae* [41].

sible, regarding the knowledge around the virulence of this emerging pathogen [38].

the genomic plasticity [36].

64 Basic Biology and Applications of Actinobacteria

diverse lifestyles [40].

and phage integrase [43].

The genomic approach allowed to know the sequence of DNA of a certain organism, though, only this knowledge does not define the gene function to external stimuli. A protein to be synthesized primarily needs the DNA to be transcribed into an RNA molecule, later translated into a protein molecule. However, the genes are not active all the time in the cell, and they are expressed when necessary to act in cellular biological process. The set of genes are expressed in a cell under a certain physiological condition or stage of development at a specific time is called transcriptome [45].

Studies that address the transcriptome technique aim the analysis of the collection of all transcripts and provide information about the regulation of the genes, too allow inferring functions of uncharacterized genes, helping to understand the biology of the organism analyzed. One of the applications obtained by this approach is the usage of the data generated to provide more information about the host defense response to the survival and proliferation of bacterial pathogens, which enables an understanding of the pathogenesis of infectious diseases [46].

Due to the diverse applications of transcriptomics, new technologies and high-throughput methods have been developed for large-scale analysis, such as hybridization-based method (Microarray) and sequencing-based methods such as RNA sequencing [47].

Microarray technology is considered a large-scale method because it generates the expression profile of thousands of transcripts simultaneously. Studies with microarray technology have identified clusters of genes that are involved in specific physiological responses, through the variations of environmental conditions faced by microorganisms [48], such as ammonia limitation. This compound is used as a source of nitrogen that is essential for almost all complex macromolecules in bacteria. A study analyzed the response of *C. glutamicum* in ammonia-limiting medium, demonstrating that there was alteration in the expression of 285 genes, many of which encode transport proteins and proteins involved in metabolism, nitrogen regulation, energy generation and protein turnover [49].

Other studies with *C. glutamicum* were carried out aiming to evaluate the level of gene expression essential to the survival of the bacteria in stress environments. The transcriptional profile of this species under growth conditions with citrate as a source of carbon and energy compared to glucose demonstrated that *citM* and *tctCBA* encoding citrate uptake systems were induced, while the *ptsG, ptsS* and *ptsF* genes encoding the glucose capping system were repressed. Additionally, genes encoding tricarboxylic acid, malic enzyme, PEP carboxykinase, gluconate-glyceraldehyde-3-phosphate dehydrogenase and ATP synthase cycle enzymes were induced [50].

In relation to the production of amino acids, L-lysine-producing *C. glutamicum* ATCC21300 obtained 543 differentially expressed genes compared to wild type *C. glutamicum* ATCC13032, highlighting *bioA, bioB, bioD*, NCgl1883, NCgl1884, and NCgl1885 involved in metabolism or transport of the biotin, of which the *bioB* gene was hyper expressed about 20-fold, and when it was discontinued, lysine production was reduced to approximately 76% and the genes NCgl1883, NCgl1884, and NCgl1885 were repressed [57]. Genes involved in the production of L-valine were also analyzed, in which 1155 differentially expressed genes were identified, where *ilvBN, ilvC, ilvD,* and *iLvE* were hyperexpressed, resulting in the improvement of the carbon flux used to produce valine. Thus, the work involving this approach helps to better understand *C. glutamicum* for the generation of biotechnological

The Genus *Corynebacterium* in the Genomic Era http://dx.doi.org/10.5772/intechopen.80445 67

The RNA-Seq technique also can be applied for identification of operon structures, although this approach requires a reliable genome annotation and low gene rate with unknown function. Through these data, transcription initiation sites (TSSs) can be identified and corrected, allowing a more detailed analysis of the promoters and classifying them according to their

The central base of molecular biology involves understanding how cells work and interact among each other. These cellular processes occur through the activity of biomolecules that act together throughout specialized mechanisms. This whole process involves storing the genetic information in the DNA molecule and the unidirectional flow of this information to the RNA and proteins. Proteins make up a large part of the cell molecular machinery, and the overall analysis of them provides the information needed to understand how cells work. This analy-

In 1995, the term "proteome" was taken as the set of proteins produced by a cell or tissue at a given time and condition [62]. As early as 1996, the term "proteomics" appeared to define the large characterization of all protein contents of a cell line, tissue or organism [63]. The study of the proteome currently refers not only to the knowledge of the protein content of a given organism in a given condition, but also includes the quantification, location, modifications,

This area has three strands: expression, structural and functional. Expression proteomics generally involves studies to investigate the pattern of protein expression in abnormal cells. This classification encompasses studies of qualitative and quantitative expression analysis of total proteins under two different conditions. The second analyzes the three-dimensional conformation and structural complexities of functional proteins. This strand makes it possible to identify all the proteins of a complex system and characterize the possible interactions of these proteins and protein complexes. Functional proteomics reveals the function of proteins based on their interactions with specific protein complexes and the detailed description of cell

location in relation to the protein coding regions (CDs). For example, see [59, 60].

products [58].

**5. Proteomics**

sis is referred to as proteomics [61].

interactions and function of these proteins [64, 65].

signaling pathways to which they are involved [66].

The microarray technique provided an advance in the research with important organisms, such as the members of the genus *Corynebacterium*. Nevertheless, this technique has some limitations, such as high noise interference, inability to detect transcripts with a low number of copies per cell, low coverage of transcripts, and dependence on prior knowledge about the genome for the preparation of the probes, consequently generating little information about the transcript sequence [47].

As a result of these limitations and the advent of NGS platforms, a promising alternative technique was developed, RNA-seq. Through this technique, it was possible to obtain more accurate, fast and reliable analyzes from cDNA sequencing. The advantages of this method are: low occurrence or absence of interference, detection of small transcripts that would not be detected by other methods, low cost and reduction of time and work to prepare the samples. RNA-Seq is considered an ideal tool for the analysis of complete transcriptomes and is applied in the exploration of expression profile, and characterization of differentially expressed genes. Thus, it represents an important tool to uncover the mechanisms of virulence and pathogenicity in microorganisms [51, 52].

Relating to this, two studies with *C. pseudotuberculosis* simulating the stress conditions faced by the bacterium during infection in host were performed. The first study was with strain *C. pseudotuberculosis* 1002, biovar *ovis*, which underwent three stress conditions: thermal, acidic and osmotic. Most of the identified targets were related to oxidation and reduction, cell division and cell cycle, and the *stimulon* of the three stresses presented induced genes that participate in the mechanisms of virulence, defense against oxidative stress, adhesion and regulation, revealing that they have important role in the infection process [53]. The other study, with strain 258, biovar *equi*, was performed using the thermal stress condition, similar to the conditions performed on strain 1002. Herein, 113 genes were considered induced, in which *hspR, grpE, dnaK* and *clpB* were highlighted due to its expression rates and participation in the mechanism of adaptation of the pathogen to high temperatures [54].

Recently, the first analysis of RNA-Seq with *C. diphtheriae* was developed, in which it was sought to investigate the alteration of the transcription profile between a wild strain and a Δ*dtxR* mutant, also to detect the operon structures from the transcriptome data of the wild type strain. The authors revealed that approximately 15% of the genome was differentially transcribed and that DtxR may also play a role in other regulatory functions, in addition to regulating the metabolism of iron and diphtheria toxin. Finally, they identified 471 operons subdivided into 167 sub-operon structures [55].

One of the representatives of the genus that had the gene expression regulation most studied is *C. glutamicum*. The RNA-Seq approach elucidated the regulatory mechanisms of several industrially relevant compounds, such as the dissolved oxygen concentration (DO), which is important in industrial microbial processes, providing new information on the relationship between oxygen supply and bacterial metabolism [56].

In relation to the production of amino acids, L-lysine-producing *C. glutamicum* ATCC21300 obtained 543 differentially expressed genes compared to wild type *C. glutamicum* ATCC13032, highlighting *bioA, bioB, bioD*, NCgl1883, NCgl1884, and NCgl1885 involved in metabolism or transport of the biotin, of which the *bioB* gene was hyper expressed about 20-fold, and when it was discontinued, lysine production was reduced to approximately 76% and the genes NCgl1883, NCgl1884, and NCgl1885 were repressed [57]. Genes involved in the production of L-valine were also analyzed, in which 1155 differentially expressed genes were identified, where *ilvBN, ilvC, ilvD,* and *iLvE* were hyperexpressed, resulting in the improvement of the carbon flux used to produce valine. Thus, the work involving this approach helps to better understand *C. glutamicum* for the generation of biotechnological products [58].

The RNA-Seq technique also can be applied for identification of operon structures, although this approach requires a reliable genome annotation and low gene rate with unknown function. Through these data, transcription initiation sites (TSSs) can be identified and corrected, allowing a more detailed analysis of the promoters and classifying them according to their location in relation to the protein coding regions (CDs). For example, see [59, 60].
