3. Inferring molecular coregulatory events from the integration of collected functional genomic readouts

The development of mid/high throughput strategies for analyzing genome sequences, their variants, gene expression, or even the proteome composition, provided means to the scientific community to interrogate each of these layers of complexity in a variety of model systems and tissues and in addition to integrate them to reconstruct a regulatory view. As illustrated in the previous section, several studies described major functional genomic readouts focused on studying brain development in normal and disease settings.

While being comprehensive, in most cases they provide relevant list of players (gene variants, differentially expressed genes, etc.) on the basis of statistical descriptors but forgets completely to address their potential relationship. Or, from a biological point of view, each of the players composing the system under study is expected to directly (or indirectly) influence the behavior of others. As a consequence, the current challenge is to evolve into an integrative view, focused on studying the various "deregulated events" as interconnected entities by the incorporation of multiple types of readouts and supported by computational solutions.

From an historical perspective, the article of Walsh et al. released in Science in 2008, corresponds to one of the first major studies aiming at identifying neurodevelopmental programs involved in a disease context like schizophrenia [54]. In this study, the authors hypothesized that the collective contribution of each of the rare structural variants retrieved on neurological/neurodevelopmental syndromes accounts for these disorders, and in the specific case of schizophrenia, they have demonstrated a difference of at least 3-fold between controls and individuals with schizophrenia on the frequency of rare structural variants within coding regions. Furthermore, they have focused on structural mutations that disrupt genes, and evaluated their functions with the help of computational solutions querying for gene enrichment in one or more functionally defined pathways (PANTHER and Ingenuity Pathway Analysis). This strategy per se aims at establishing gene relationships on the basis of their annotation to a given program (or pathway), even though in this case such relationships are inferred in-silico.

Since then, further studies incorporated other types of data, like the use of RNA-Seq transcriptomic analysis to identify the differentially expressed genes between controls and individuals with schizophrenia, which are then associated to biological functions by Gene ontology analysis [55–57]. Furthermore, the development of computational solutions for enhancing data integration has being performed like in the case of NETBAG, which allows to integrate multiple types of genetic variations like single nucleotide variants (SNVs), rare copy number variants (CNVs), and genome-wide association studies (GWAS), to identify highly connected gene clusters, potentially related to functional roles. NETBAG was initially described in the context of de novo CNVs in autism [58] and schizophrenia [59].

Beyond correlating changes in gene expression with the identification of genetic variations, further efforts are required for stratifying information, like the use of gene co-expression strategies. This approach aims at aggregating genes on the grounds of their expression levels under the hypothesis that co-expressed genes are the consequence of a common regulatory force; e.g., the action of transcription factors. This analysis can be represented under a network structure, on which a pair of genes is displayed interconnected on the basis of their significant co-expression relationship. This strategy has been applied by Voineagu and colleagues to resolve consistent differences in transcriptomes assessed over autistic and normal brain samples [23]. Specifically, they have resolved gene expression levels in cortical regions (suggesting cortical abnormalities in the context of autism), but in addition they have managed to identify discrete modules of co-expressed genes, clearly demonstrating the advantages of such strategy for enhancing the analytical resolution. Since then, various studies incorporated gene co-expression analysis together with genome-wide association data (GWAS) [60, 61], incorporated multiple human brain regions and issued from various human development stages as a way to identify specific biological processes and defined brain regions associated to autism disorder [62, 63].

While gene co-expression networks are expected to be the consequence of the action of defined master transcription factors, their identity remains unknown in this type of analysis. The combination of chromatin immunoprecipitation (ChIP) with massive parallel sequencing provided means to scrutinize the genome locations on which given TFs are located. Furthermore, on the basis of their proximity to annotated coding regions, it is possible to infer their transcriptional regulation activity over proximal genes. Following such strategy, factors like TBR1 [64] or Auts2 [65, 66], initially identified by rare genetic variant studies were ChIPsequenced to reveal their direct targets. In both cases, they were found located on genomic regions adjacent to autism spectrum disorder (ASD)-related genes. A similar strategy has been applied to map the gene targets associated to the chromatin modifier CHD8 (chromodomain helicase) [67], previously shown to be mutated in rare genetic variant studies [68].

Overall, the analytical strategies aforementioned clearly suggest the necessity of incorporating various types of genetic and functional genomic readouts such that their inter-relationship might enhance our comprehension of the phenomena under study. This is more relevant when studying neurodevelopment and their related diseases as the consequence of multigenetic events. Furthermore, it is important to mention that data integration is systematically supported by computational developments, as witnessed by the various tools and computational strategies devoted to infer relationships among the available data, but also to model systems behavior. Notably, the use of machine learning strategies for modeling the maturity and regional identity obtained during neuronal in-vitro assays in comparison with human fetal brain data, provide means to take advantage of in-vitro systems that manage to reconstitute as close as possible the in-vivo events [73]. In a similar manner, major efforts like the "blue brain project" are currently combining data assessment with computational modeling to reconstruct cell atlas for instance of the mouse brain [74], strongly suggesting that over the coming years major discoveries

Comparison of 58 nervous system cell/tissue types on the basis of their master TF co-regulatory networks. The fraction of common TFs pairwise is displayed in percentage (heatmap). The inset displays the identity of the major TFs retrieved in Frontal cortex compared with those retrieved on hypothalamus. The illustrated data are extracted from the analysis performed over more than 3000 Affymetrix arrays corresponding to 300 cell/tissue

in neuroscience might arise from such multidisciplinary efforts.

types describing 14 different systems on the human body (Cholley et al. [69]).

Systems Biology Perspectives for Studying Neurodevelopmental Events

DOI: http://dx.doi.org/10.5772/intechopen.85072

Figure 2.

141

4. Perspectives for the coming years: from the use of new in-vitro 3Dbrain tissue models, single cell strategies to big-data systems biology

tem tissues as source of material. As consequence, technical concerns like the potential RNA degradation following pre- and postmortem factors as environment, collection methods, or postmortem interval could directly influence the quality of the readouts [75–77]. The use of animal models as an alternative is losing interest due to the reported differences, for instance in human corticogenesis relative to mouse models, which are further supported by human specific gene signature and/ or divergences in gene regulatory programs [78–80]. Even if few percentages of genes have different trajectories in non-human primate and human in contrast to rodent, this model can help to understand brain development, but it cannot model all features found in human [79, 81]. In fact, comparison between non-human

The majority of transcriptome or related studies in human brain used postmor-

Although powerful for the identification of the target genes for a given factor, performing ChIP-Seq assays remains still challenging for covering a large number of TFs, epigenetic modifications, and/or chromatin remodelers which could appear associated to neurodevelopmental events. In fact, identifying strategies to prioritize the list of TFs to be immunoprecipitated remains a key step, which is currently handled by applying computational strategies. In this context, we have recently developed TETRAMER, a computational approach able to reconstruct gene regulatory networks from the integration of transcriptomes provided by the user and annotations retrieved in various databases concerning TF-Target gene relationships [69]. Furthermore, TETRAMER simulates transcription regulation propagation over the reconstructed connectivity to identify master TFs, which could then be prioritized for experimental assays. This strategy has been initially used for identifying novel master TFs implicated on neurogenesis by reconstructing gene regulatory networks from temporal transcriptomes [70]; then, it has been extrapolated to a collection of more than 3000 transcriptomes covering 300 cell/tissue types and representing 14 different anatomical systems in the human body. Among them, 58 cell/tissue types composing the human nervous system were analyzed, for which their relevant master TFs as well as their related gene regulatory networks were inferred. As illustrated in Figure 2, this type of analysis allows to compare the fraction of shared TFs retrieved on different nervous systems, thus providing to highlight relevant players implicated on their transcriptional regulation. In Figure 2, a comparison between the TFs retrieved on frontal cortex and hypothalamus is depicted, revealing the presence of factors like TBR1 or ARNT2, previously identified as presenting rare genetic variants associated to autism disorders [64, 71] or NPAS3, previously described as a master regulator of neuropsychiatric related genes [72].

Systems Biology Perspectives for Studying Neurodevelopmental Events DOI: http://dx.doi.org/10.5772/intechopen.85072

### Figure 2.

the consequence of a common regulatory force; e.g., the action of transcription factors. This analysis can be represented under a network structure, on which a pair of genes is displayed interconnected on the basis of their significant co-expression relationship. This strategy has been applied by Voineagu and colleagues to resolve consistent differences in transcriptomes assessed over autistic and normal brain samples [23]. Specifically, they have resolved gene expression levels in cortical regions (suggesting cortical abnormalities in the context of autism), but in addition they have managed to identify discrete modules of co-expressed genes, clearly demonstrating the advantages of such strategy for enhancing the analytical resolution. Since then, various studies incorporated gene co-expression analysis together with genome-wide association data (GWAS) [60, 61], incorporated multiple human brain regions and issued from various human development stages as a way to identify specific biological processes and defined brain regions associated to autism

Neurodevelopment and Neurodevelopmental Disorder

While gene co-expression networks are expected to be the consequence of the action of defined master transcription factors, their identity remains unknown in this type of analysis. The combination of chromatin immunoprecipitation (ChIP) with massive parallel sequencing provided means to scrutinize the genome locations on which given TFs are located. Furthermore, on the basis of their proximity to annotated coding regions, it is possible to infer their transcriptional regulation activity over proximal genes. Following such strategy, factors like TBR1 [64] or Auts2 [65, 66], initially identified by rare genetic variant studies were ChIPsequenced to reveal their direct targets. In both cases, they were found located on genomic regions adjacent to autism spectrum disorder (ASD)-related genes. A similar strategy has been applied to map the gene targets associated to the chromatin modifier CHD8 (chromodomain helicase) [67], previously shown to be mutated

Although powerful for the identification of the target genes for a given factor, performing ChIP-Seq assays remains still challenging for covering a large number of TFs, epigenetic modifications, and/or chromatin remodelers which could appear associated to neurodevelopmental events. In fact, identifying strategies to prioritize the list of TFs to be immunoprecipitated remains a key step, which is currently handled by applying computational strategies. In this context, we have recently developed TETRAMER, a computational approach able to reconstruct gene regulatory networks from the integration of transcriptomes provided by the user and annotations retrieved in various databases concerning TF-Target gene relationships [69]. Furthermore, TETRAMER simulates transcription regulation propagation over the reconstructed connectivity to identify master TFs, which could then be prioritized for experimental assays. This strategy has been initially used for identifying novel master TFs implicated on neurogenesis by reconstructing gene regulatory networks from temporal transcriptomes [70]; then, it has been extrapolated to a collection of more than 3000 transcriptomes covering 300 cell/tissue types and representing 14 different anatomical systems in the human body. Among them, 58 cell/tissue types composing the human nervous system were analyzed, for which their relevant master TFs as well as their related gene regulatory networks were inferred. As illustrated in Figure 2, this type of analysis allows to compare the fraction of shared TFs retrieved on different nervous systems, thus providing to highlight relevant players implicated on their transcriptional regulation. In Figure 2, a comparison between the TFs retrieved on frontal cortex and hypothalamus is depicted, revealing the presence of factors like TBR1 or ARNT2, previously identified as presenting rare genetic variants associated to autism disorders [64, 71] or NPAS3, previously described as a master regulator of neuropsychiatric related

disorder [62, 63].

genes [72].

140

in rare genetic variant studies [68].

Comparison of 58 nervous system cell/tissue types on the basis of their master TF co-regulatory networks. The fraction of common TFs pairwise is displayed in percentage (heatmap). The inset displays the identity of the major TFs retrieved in Frontal cortex compared with those retrieved on hypothalamus. The illustrated data are extracted from the analysis performed over more than 3000 Affymetrix arrays corresponding to 300 cell/tissue types describing 14 different systems on the human body (Cholley et al. [69]).

Overall, the analytical strategies aforementioned clearly suggest the necessity of incorporating various types of genetic and functional genomic readouts such that their inter-relationship might enhance our comprehension of the phenomena under study. This is more relevant when studying neurodevelopment and their related diseases as the consequence of multigenetic events. Furthermore, it is important to mention that data integration is systematically supported by computational developments, as witnessed by the various tools and computational strategies devoted to infer relationships among the available data, but also to model systems behavior. Notably, the use of machine learning strategies for modeling the maturity and regional identity obtained during neuronal in-vitro assays in comparison with human fetal brain data, provide means to take advantage of in-vitro systems that manage to reconstitute as close as possible the in-vivo events [73]. In a similar manner, major efforts like the "blue brain project" are currently combining data assessment with computational modeling to reconstruct cell atlas for instance of the mouse brain [74], strongly suggesting that over the coming years major discoveries in neuroscience might arise from such multidisciplinary efforts.
