**4. Data-driven modeling**

**Figure 2.** A detailed depiction of the acidic patch region and the triad of H2A that complexes the canonical arginine anchor. (A) LANA was the first peptide model of a binding domain to be crystalized (green, pdb: 1zla). (B) The first crystalized protein domain was RCC1 (pink, pdb: 3mvd). (C) They all share the acidic patch as a common binding epitope for the arginine anchor residue, as is also the case for the PRC1 complex with its acidic patch binding RING domain (cyan, 4r8p). (D) Interestingly, acidic patch binding is not necessarily limited to one single arginine anchor residue. As for the nucleosome-bound structure of the deubiquitinase complex SAGA-DUB (yellow, pdb: 4zux), three arginine residues are essential part of the acidic patch binding, of which none occupies the position in the center of the acidic triad. Combination of nucleosomal DNA and acidic patch binding is shown in the structure of RCC1 and PRC1. (E) PRC1 (cyan, pdb: 4r8p) that besides the acidic patch also engages DNA with its UbcH5c subunit. (F) The same holds

true for RCC1 (pink, pdb: 3mvd) that contacts to DNA with the unstructured tail region.

28 Chromatin and Epigenetics

An attractive alternative to traditional structure determination methods is the modeling of structures of complexes based on some sort of experimental information on the interaction [79, 80]. In such data-driven modeling of a complex structure, the two interaction partners are docked together, guided by the experimental data, and respecting their biophysical properties. The exact binding interface and relative orientation of the binding partners are typically refined over several steps. Prerequisite for this approach is the availability of the 3D structures of the interacting partners. Several molecular docking programs allow the incorporation and use of experimental data and so increase the accuracy of resulting structures [81]. Hence, data from diverse biophysical techniques are translated into restraints guiding the docking process [82–84]. The type of information includes interaction interface, distances or shape of the complex and its subunits. Techniques that can provide these information are listed in **Table 3**.

Interestingly, all three classes of information can be provided by NMR spectroscopy. It is possible to gather data on intermolecular distances and shape by paramagnetic relaxation enhancement (PRE) and the nuclear Overhauser effect (NOE) as well as information on binding interfaces and binding affinity through chemical shift perturbation (CSP). The use of these NMR methods in docking studies is reviewed in detail elsewhere [79]. An overview of publications that used data-driven docking to investigate nucleosome-protein complexes is listed in **Table 4**.

### **4.1. Bringing data-driven modeling to nucleosome complexes (LSD1-CoREST)**

A pioneer study for data-driven modeling of a nucleosome complex was successfully applied for the lysine-specific demethylase 1 and CoREST complex [86]. Both proteins cooperate in the demethylation of mono- and dimethylated H3K4. While it was possible to solve the crystal structure of LSD1-CoREST, their nucleosome-bound state remains elusive. Yang et al. gained insight into the molecular basis of LSD1-CoREST interaction by identifying point mutations that interfere with the LSD1-CoREST ability to demethylate a methylated peptide model of the histone H3 tail. Since it was previously shown that LSD1 recognizes a specific stretch of the H3 tail [94], it was possible to employ modeling to identify intermolecular interactions between the peptide and both the LSD1 active site and the LSD1-CoREST interface (**Figure 3B**). Lastly, NMR titration experiments of the CoREST SANT2 domain with DNA revealed a DNA-binding interface on SANT2. These pieces of interaction data were used to


**Table 3.** Biochemical and biophysical techniques for structural analysis of protein complexes.

guide a docking approach resulting in a complete structural model of the LSD1-CoRESTnucleosome complex (**Figure 3A**). With the lack of experimental data on the nucleosome interaction, this is a prime example of combining crystal structures, mutagenesis and NMR data to overcome limitations of the separate techniques.

**4.2. NMR-based structural biology of nucleosome-protein complexes**

by crystallography and cryo-EM.

modeling of nucleosome-protein complexes.

H1 for nucleosome binding.

**complexes (RNF169 & Rad18)**

Over recent years, several studies have demonstrated that state-of-the-art solution NMR can offer high-resolution and site-specific characterization of the structures and dynamics of nucleosome-protein complexes. NMR has the particular advantage of its sensitivity to dynamics and the ease with which interactions can be studied, allowing detailed insights into molecular recognition processes. NMR allows studies when systems are dynamic, or (partially) disordered, while this typically hampers high-resolution structure determination

Recognition of Nucleosomes by Chromatin Factors: Lessons from Data-Driven Docking-Based…

http://dx.doi.org/10.5772/intechopen.81016

31

The molecular size of nucleosomes, and even more so of complexes with effector proteins, poses a challenge to traditional NMR methods. However, this challenge can be overcome through the use of methodologies designed for high-molecular weight systems. This method, methyl group-based transverse-relaxation-optimized spectroscopy (methyl-TROSY), relies on the highly sensitive observation of NMR signals of protein methyl groups [95]. Here, a specific isotope-labeling scheme is used, which typically results in observation of isoleucine, leucine, valine (ILV) methyl groups. The methyl-TROSY NMR spectra can subsequently be used to delineate binding sites of effector proteins on the nucleosome surface and vice versa [68, 69, 93, 96]. Extracting more detailed structural information is possible through the use of so-called spin-labels that can generate long-range distance restraints between the interaction partners [97, 98]. Whichever way used, NMR-based interaction data are of unique value in the

**4.3. Expanding data sources for nucleosome complex models to NMR (HMGN2)**

**4.4. Latest applications of NMR to investigate structures of nucleosome** 

Kato *et al.* were the first to use the methyl-TROSY approach for the study of nucleosomeprotein interactions [93]. Importantly, they reported the NMR signal assignments of the ILV-methyl groups for all histones in the nucleosomes. These assignments are essential in determining protein-binding sites on the nucleosome surface. The approach was demonstrated using high mobility group nucleosomal protein 2 (HMGN2), which regulates a variety of chromatin functions. HMGN2 was found to bind both the acidic patch and nucleosomal DNA. Based on these NMR data, supported by mutagenesis, it was possible to determine a structural model of the complex (**Figure 4A**). HMGN2 binds to the nucleosome as a staple, using two main interaction sites. On one side, HMGN2 is anchored to the acidic patch using a canonical arginine anchor in the N-terminal region of the binding domain, while the lysinerich motif in its C-terminal region binds to nucleosomal DNA (**Figure 4B**). This binding mode provided a structural basis for the antagonistic function of HMGN2 towards linker histone

Two recent studies relied on methyl-TROSY NMR-derived binding data to elucidate the recognition of ubiquitinated nucleosomes. Both focused on the interaction between ubiquitylated H2A K13/15 and the DNA repair factor RNF169. The work of Kitevski-LeBlanc et al.


**Table 4.** Structural models of nucleosome-protein complexes based on biophysical data.

**Figure 3.** (A) Structural model of LSD1-CoREST bound to the nucleosome. The DNA binding of the SANT2 domain was elucidated by NMR spectroscopy. A previously identified binding motif in the H3 tail sequence was docked onto the interface of amine oxidase (AOD) and SWIRM domain revealing a second binding epitope. (B) The resulting model of the model peptide binding to AOD-SWIRM is shown as a close-up, highlighting how the tail is positioned on the interface of both domains. Figure generated using the author-provided PDB file [86].

### **4.2. NMR-based structural biology of nucleosome-protein complexes**

guide a docking approach resulting in a complete structural model of the LSD1-CoRESTnucleosome complex (**Figure 3A**). With the lack of experimental data on the nucleosome interaction, this is a prime example of combining crystal structures, mutagenesis and NMR

**Protein Role Data source Reference** PSIP1-PWWP Trimethyl lysine reader H3K36 NMR [67, 68, 85] CoREST/LSD1 Demethylase Crystallography/NMR [86] Rad6-Bre1 Ubiquitin ligase XL-MS [70] LANA Viral protein ssNMR [87] NSD1 Methyltransferase H3K36 Mutagenesis [88] RNF169 Ubiquitin reader NMR, SAXS [69, 89] H1 Linker histone NMR [43, 90] ISW2 Chromatin remodeler XL-MS [91] Rad18 DNA repair factor NMR [89] RCC1 Ran-recruitment Crystallography [62] PHF1 Tudor Trimethyl lysine reader H3K36 Crystallography/NMR [92] HMGN2 Chromatin decompaction NMR [93]

data to overcome limitations of the separate techniques.

30 Chromatin and Epigenetics

**Table 4.** Structural models of nucleosome-protein complexes based on biophysical data.

interface of both domains. Figure generated using the author-provided PDB file [86].

**Figure 3.** (A) Structural model of LSD1-CoREST bound to the nucleosome. The DNA binding of the SANT2 domain was elucidated by NMR spectroscopy. A previously identified binding motif in the H3 tail sequence was docked onto the interface of amine oxidase (AOD) and SWIRM domain revealing a second binding epitope. (B) The resulting model of the model peptide binding to AOD-SWIRM is shown as a close-up, highlighting how the tail is positioned on the Over recent years, several studies have demonstrated that state-of-the-art solution NMR can offer high-resolution and site-specific characterization of the structures and dynamics of nucleosome-protein complexes. NMR has the particular advantage of its sensitivity to dynamics and the ease with which interactions can be studied, allowing detailed insights into molecular recognition processes. NMR allows studies when systems are dynamic, or (partially) disordered, while this typically hampers high-resolution structure determination by crystallography and cryo-EM.

The molecular size of nucleosomes, and even more so of complexes with effector proteins, poses a challenge to traditional NMR methods. However, this challenge can be overcome through the use of methodologies designed for high-molecular weight systems. This method, methyl group-based transverse-relaxation-optimized spectroscopy (methyl-TROSY), relies on the highly sensitive observation of NMR signals of protein methyl groups [95]. Here, a specific isotope-labeling scheme is used, which typically results in observation of isoleucine, leucine, valine (ILV) methyl groups. The methyl-TROSY NMR spectra can subsequently be used to delineate binding sites of effector proteins on the nucleosome surface and vice versa [68, 69, 93, 96]. Extracting more detailed structural information is possible through the use of so-called spin-labels that can generate long-range distance restraints between the interaction partners [97, 98]. Whichever way used, NMR-based interaction data are of unique value in the modeling of nucleosome-protein complexes.

#### **4.3. Expanding data sources for nucleosome complex models to NMR (HMGN2)**

Kato *et al.* were the first to use the methyl-TROSY approach for the study of nucleosomeprotein interactions [93]. Importantly, they reported the NMR signal assignments of the ILV-methyl groups for all histones in the nucleosomes. These assignments are essential in determining protein-binding sites on the nucleosome surface. The approach was demonstrated using high mobility group nucleosomal protein 2 (HMGN2), which regulates a variety of chromatin functions. HMGN2 was found to bind both the acidic patch and nucleosomal DNA. Based on these NMR data, supported by mutagenesis, it was possible to determine a structural model of the complex (**Figure 4A**). HMGN2 binds to the nucleosome as a staple, using two main interaction sites. On one side, HMGN2 is anchored to the acidic patch using a canonical arginine anchor in the N-terminal region of the binding domain, while the lysinerich motif in its C-terminal region binds to nucleosomal DNA (**Figure 4B**). This binding mode provided a structural basis for the antagonistic function of HMGN2 towards linker histone H1 for nucleosome binding.

### **4.4. Latest applications of NMR to investigate structures of nucleosome complexes (RNF169 & Rad18)**

Two recent studies relied on methyl-TROSY NMR-derived binding data to elucidate the recognition of ubiquitinated nucleosomes. Both focused on the interaction between ubiquitylated H2A K13/15 and the DNA repair factor RNF169. The work of Kitevski-LeBlanc et al.

**Figure 4.** (A) Structural model of HMGN2 (red) bound to the nucleosome. The binding occurs along the nucleosome surface and is driven by interactions with the acidic patch and nucleosomal DNA, resulting in HMGN2 competing with H1 for nucleosome binding. (B) Close view on the acidic patch binding N-terminal HMGN2 region depicting the canonical arginine anchor R26 surrounded by the Glu 91, Asp 89, Glu 60 acidic triad motif of H2A. Figure generated using the author-provided PDB file [93].

a binding site for nucleosomal DNA, resulting in a simultaneous binding mechanism of both

RNF169 MIU2 (red) and ubiquitin (green). Figure generated using the author-provided PDB file [69].

**Figure 5.** (A) Structural model of nucleosome-bound RNF169 (red) and ubiquitin (green). (B, top) The proposed main acidic patch anchoring residue R700 (conserved position throughout the docking solutions) is shown in the conserved arginine anchor position between the acidic triads (Glu 60, Asp 89, Glu 91). (B, bottom) Side chain interactions between

Recognition of Nucleosomes by Chromatin Factors: Lessons from Data-Driven Docking-Based…

http://dx.doi.org/10.5772/intechopen.81016

33

For PHF1-Tudor, a crystal structure bound to a trimethylated H3 tail peptide was already available to use. The additional importance of the nucleosomal context and synergetic binding mechanism can be understood from the corresponding nucleosome-bound structure (**Figure 6A**). In case of PSIP1-PWWP, the domain structure was solved by NMR and, together with NMR titration data, used to determine a structural model of nucleosome-bound protein (**Figure 6B**) [67, 68, 85]. The structural models of both highlighted the importance of the nucleosomal context in H3K36me3 recognition, emphasizing that complex formation critically depends on two synergetic binding processes. Firstly, the aromatic residues that form the aromatic cage bind to trimethylated lysine H3K36me3. This recognition of the PTM is crucial for the binding, but the readers reach their full binding affinity only when their positive surface residues interact with the nucleosomal DNA. This makes both studies outstanding examples of synergetic interplay of epitopes in nucleosome-binding proteins (**Figure 6C**, **D**). The insights derived from these structural models were used to design experiments to validate the structural model and may offer possible tools for further research approaches. In case of PSIP1-PWWP, the structural model sparked current efforts in the design of nucleosome-

The studies mentioned above illustrate the potential of data-driven modeling of nucleosomeprotein complexes based on state-of the-art solution NMR. Recent advances in solid-state

mimicking peptides to modulate the PSIP1-chromatin interaction.

trimethyl lysine and nucleosomal DNA.

**4.6. LANA goes solid state**

established the molecular basis of this interaction. The α-helical MIU2 (motif interacting with ubiquitin) domain binds to a hydrophobic patch on the K13/15-conjugated ubiquitin while a disordered region anchors RNF169 on the nucleosome by binding to the acidic patch. They subsequently reconstructed a model structure that presents both epitopes in their nucleosomebound state (**Figure 5A**). The work of Hu *et al.* combined traditional NOESY-based structure determination at the level of histone-dimers with interaction studies at the nucleosome level and complemented these with SAXS data into a final model [89]. The authors also extended their findings to an NMR-based structural model for the complex with DNA repair factor Rad18. Both RNF169 and Rad18 are known to interfere with the binding of 53BP1 to nucleosomes ubiquitinated at H2A K13/15. These NMR-based structural models have allowed to hypothesize on the molecular mechanism for this interference.
