**2. Nucleosome-binding epitopes**

Consisting of DNA and histone proteins, the nucleosome offers a selection of distinct interaction surfaces for binding of effector proteins with high levels of specificity (**Figure 1**).

modifications (PTMs) such as, mono-, di- and trimethylation (Lys, Arg); acetylation (Lys); phosphorylation (Ser, Thr) and ubiquitination (Lys) [11, 12]. This cosmos of modifications maintains a dynamic nature through the reversibility of the covalent modifications. Modified

such as histone mark and DNA (F), acidic patch and DNA (G) or all three epitopes (H).

**Figure 1.** A schematic depiction of different modes of nucleosome recognition. Reported types of epitopes are histone tails including PTMs (A), the H2A-H2B acidic patch (B), the canonical histone surface (C), specific surface motifs formed by histone variants (D) or nucleosomal DNA (E). A manifold of synergetic combinations of binding epitopes are known,

Recognition of Nucleosomes by Chromatin Factors: Lessons from Data-Driven Docking-Based…

http://dx.doi.org/10.5772/intechopen.81016

23

Histone proteins possess a globular tertiary structure with exposed, disordered N-terminal tails. Histone tails are known to carry a wide range of covalent, post-translational side chain Recognition of Nucleosomes by Chromatin Factors: Lessons from Data-Driven Docking-Based… http://dx.doi.org/10.5772/intechopen.81016 23

and plays crucial roles in controlling cell fate and protecting genome integrity. The fundamental unit of chromatin is the nucleosome in which 147 base pairs (bp) of DNA are wrapped around an octameric protein complex composed of two copies of histone proteins H2A, H2B, H3 and H4 [1–3]. Nucleosomes are arranged as beads-on-a-string forming 10 nanometer (nm) wide fiber that subsequently condense into higher order structures [4]. Nucleosomes as the basis of chromatin are responsible for its dynamics. Chromatin state and changes in DNA accessibility are determined at the nucleosome level. These changes are mediated through interactions of histone proteins and nucleosomal DNA alike with a wide range of protein complexes that control the structure of chromatin. They interpret, write and erase posttranslational modifications or act as ATP-dependent nucleosome remodelers. This allows changes in the functional state of chromatin and regulation of DNA-templated processes. While promoting a large variety of effects on chromatin structure, nucleosome-interacting proteins share the molecular basis of recognizing and binding the nucleosome. Understanding the basis of chromatin dynamics therefore demands understanding the molecular basis of

In particular, insights into the molecular mechanistic basis of how histone-modifying enzymes install or remove post-translational modifications (writers and erasers, respectively) and how these modifications are recognized by effector proteins (readers) are of immense interest, especially in drug development. Deregulation of these proteins is strongly connected to pathological outcome, including cardiovascular diseases, neurological disorders, metabolic disorders and cancer [5]. So-called epigenetic drugs that target the nucleosome interaction of these chromatin factors offer new therapeutic potential [6–9]. A selection of epigenetic drugs including those currently undergoing clinical trial is described in detail elsewhere [10]. Advancement in their development requires insights into the underlying molecular mechanism of nucleosome recognition, enabling control over subsequent modification of the

In the following, we will review the molecular basis of nucleosome-protein interactions, focusing on the different binding epitopes presented by the nucleosome. After an overview of the nucleosome-protein structures determined by crystallography or cryo-electron microscopy (cryo-EM), we highlight several studies in which experimental data from nuclear magnetic resonance spectroscopy (NMR), cross-link-based mass spectrometry (XL-MS) or mutational analysis were used to build atomistic structural models of nucleosome complexes. Throughout, we emphasize the role of these data-driven models in deepening our under-

Consisting of DNA and histone proteins, the nucleosome offers a selection of distinct interac-

Histone proteins possess a globular tertiary structure with exposed, disordered N-terminal tails. Histone tails are known to carry a wide range of covalent, post-translational side chain

tion surfaces for binding of effector proteins with high levels of specificity (**Figure 1**).

nucleosome-protein interactions.

standing of nucleosome recognition.

**2. Nucleosome-binding epitopes**

chromatin state.

22 Chromatin and Epigenetics

**Figure 1.** A schematic depiction of different modes of nucleosome recognition. Reported types of epitopes are histone tails including PTMs (A), the H2A-H2B acidic patch (B), the canonical histone surface (C), specific surface motifs formed by histone variants (D) or nucleosomal DNA (E). A manifold of synergetic combinations of binding epitopes are known, such as histone mark and DNA (F), acidic patch and DNA (G) or all three epitopes (H).

modifications (PTMs) such as, mono-, di- and trimethylation (Lys, Arg); acetylation (Lys); phosphorylation (Ser, Thr) and ubiquitination (Lys) [11, 12]. This cosmos of modifications maintains a dynamic nature through the reversibility of the covalent modifications. Modified histones are recognized by so-called reader protein domains specific for the respective modification (**Figure 1A**). Interestingly, nucleosome-interacting proteins can possess more than one reader domain which allows cross talk between different post-translational modifications. Examples of PTM reader domains are Chromo, Tudor, PHD and MBT domains for methylated lysine residues, bromodomains for acetylated lysine residues and 14–3-3 proteins for phosphorylated serine [11, 13] (**Table 1**). The most recent addition to the list is YEATS domains that recognize crotonylated lysine [14–16]. Reader domains often have structurally conserved motifs that are able to complex a specific modification. The "Royal Family" of reader domains is in this respect a particularly instructive example. This superfamily includes the Chromo, MBT, PWWP and plant Agenet domains that bind methylated lysine (Tudor, Chromo, MBT, PWWP, plant Agenet) or arginine (Tudor) residues. Most domains of this family contain a barrel-shaped structure formed by 3–5 antiparallel β-strands that holds a cluster of aromatic residues that form the so-called aromatic cage [17]. The aromatic cage presents an electronrich yet hydrophobic surface that is ideally suited to bind methylated lysines through cation-π interactions [18]. The structural features and similarities, as well as their substrate specificity, have been subject to literature reviews [19–21].

Next to histone tails, the nucleosome also possesses intrinsic docking platforms on its histone surface. The most prominent of these is composed of histones H2A and H2B. While the histone octamer is overall highly positively charged, there is a patch on the H2A-H2B dimer surface formed by acidic residues with negative surface charge. This structural feature is named the acidic patch and engages in a manifold of interactions with specific binding domains (**Figure 1**), including the tail of histone H4 of adjacent nucleosomes that promotes chromatin compaction. A common feature observed for acidic patch-interacting proteins is a positively charged arginine residue that interacts with a triad of acidic residues on H2A (Glu61, Asp90, Glu92). This is referred to as the arginine anchor [38]. It is often supported by surrounding

Recognition of Nucleosomes by Chromatin Factors: Lessons from Data-Driven Docking-Based…

http://dx.doi.org/10.5772/intechopen.81016

25

Other parts of the histone core surface may also mediate protein-nucleosome interactions (**Figure 1C**). First, a solvent exposed cleft between H4 and H2B was shown to be involved in binding interactions with Sir3 or 53BP1 [39, 40]. Interestingly, these proteins bind simultaneously to both the H4-H2B cleft and the acidic patch using one nucleosome-binding domain for each epitope. Second, incorporation of non-canonical histones in nucleosomes introduces specific interaction surfaces that allow histone variant-specific nucleosome binding (**Figure 1D**). An example hereof are CENP-N and CENP-C that recognize the incorporated histone H3

Finally, the nucleosomal DNA is a major protein interaction site. First, it forms the binding site of linker histone H1 [43–45] (see also Section 4.9). Second, it is often involved in additional synergistic interactions to nucleosome-binding domains (**Figure 1E**). Finally, recent studies have identified transcription factor proteins that primarily bind to nucleosomal DNA. These so-called pioneer factors bind their DNA target sites while embedded in the nucleosome

Throughout the advances in studies on nucleosome binding, it has become clear that binding of effector proteins in many cases involves interactions of nucleosome-binding domains to multiple nucleosome epitopes (**Figure 1G**, **H**). However, due to their size and complexity as well as the stability and dynamics of complex formation, the nucleosome is a challenging

A key role in the research of protein interactions are high-resolution three-dimensional structures of the complexes, typically obtained by crystallography and, increasingly, cryo-electron microscopy. These structures enable the identification of binding sites and intermolecular interactions, offering a guided approach to design binding-deficient mutants or competitive binders. The history of nucleosome structural biology peaked with the publication of the highresolution crystal structure of the nucleosome in 1997 [3]. Luger *et al.* achieved crystallization of the nucleosome together with a palindromic version of human α-satellite DNA [49]. This milestone study provided the foundation to also study the structures of nucleosomes together with chromatin factors in complexes. **Table 2** lists the structures of nucleosome-protein complexes

**3. Crystal clear: lessons from crystallography and single particles**

positively charged residues interacting with acidic H2A/H2B interface residues.

[46–48]. The structural details of these are however still lacking.

variant CENP-A [41, 42].

system for structural biology.

Reader domains can, in addition to the post translational modification, show specificity for a defined amino acid sequence motif around the epigenetic mark that supports complex formation. For example, the WD40 domain of the EED (embryonic ectoderm development) protein selectively reads out trimethylated lysine in a A-R-K-S sequence motif (as for H3K27me3) but not in a R-T-K-Q motif (as for H3K4me3) [37].


**Table 1.** Overview of selected reader domains for post-translational modifications<sup>a</sup> . Next to histone tails, the nucleosome also possesses intrinsic docking platforms on its histone surface. The most prominent of these is composed of histones H2A and H2B. While the histone octamer is overall highly positively charged, there is a patch on the H2A-H2B dimer surface formed by acidic residues with negative surface charge. This structural feature is named the acidic patch and engages in a manifold of interactions with specific binding domains (**Figure 1**), including the tail of histone H4 of adjacent nucleosomes that promotes chromatin compaction. A common feature observed for acidic patch-interacting proteins is a positively charged arginine residue that interacts with a triad of acidic residues on H2A (Glu61, Asp90, Glu92). This is referred to as the arginine anchor [38]. It is often supported by surrounding positively charged residues interacting with acidic H2A/H2B interface residues.

histones are recognized by so-called reader protein domains specific for the respective modification (**Figure 1A**). Interestingly, nucleosome-interacting proteins can possess more than one reader domain which allows cross talk between different post-translational modifications. Examples of PTM reader domains are Chromo, Tudor, PHD and MBT domains for methylated lysine residues, bromodomains for acetylated lysine residues and 14–3-3 proteins for phosphorylated serine [11, 13] (**Table 1**). The most recent addition to the list is YEATS domains that recognize crotonylated lysine [14–16]. Reader domains often have structurally conserved motifs that are able to complex a specific modification. The "Royal Family" of reader domains is in this respect a particularly instructive example. This superfamily includes the Chromo, MBT, PWWP and plant Agenet domains that bind methylated lysine (Tudor, Chromo, MBT, PWWP, plant Agenet) or arginine (Tudor) residues. Most domains of this family contain a barrel-shaped structure formed by 3–5 antiparallel β-strands that holds a cluster of aromatic residues that form the so-called aromatic cage [17]. The aromatic cage presents an electronrich yet hydrophobic surface that is ideally suited to bind methylated lysines through cation-π interactions [18]. The structural features and similarities, as well as their substrate specificity,

Reader domains can, in addition to the post translational modification, show specificity for a defined amino acid sequence motif around the epigenetic mark that supports complex formation. For example, the WD40 domain of the EED (embryonic ectoderm development) protein selectively reads out trimethylated lysine in a A-R-K-S sequence motif (as for H3K27me3) but

> 53BP1 DNA damage response [24] TDRD3 Transcription activation [25]

> > [28, 29]

HP1 Heterochromatin [32]

.

MRG15 Splicing [33]

KAc BRD2/3 Transcriptional regulation [35]

Sph 14–3-3ζ Transcriptional activation [36]

have been subject to literature reviews [19–21].

not in a R-T-K-Q motif (as for H3K4me3) [37].

Tudor Kme1, Kme2, Kme3, Rme2

See Refs. [21–23] for more in-depth discussion.

**Royal family**

24 Chromatin and Epigenetics

**Bromodomain**

**14-3-3**

a

**(sub)Domain Modification Protein Function**

MBT Kme1, Kme2 L3MBTL1 Transcriptional repression [26, 27]

Chromo Kme, Kme2, Kme3 CHD1 Chromatin remodeling [30, 31]

Plant Agenet Kme, Kme2, Kme3 FMRP DNA damage response [34]

**Table 1.** Overview of selected reader domains for post-translational modifications<sup>a</sup>

PWWP Kme3 PSIP1 Transcriptional co-activation, DNA repair

Other parts of the histone core surface may also mediate protein-nucleosome interactions (**Figure 1C**). First, a solvent exposed cleft between H4 and H2B was shown to be involved in binding interactions with Sir3 or 53BP1 [39, 40]. Interestingly, these proteins bind simultaneously to both the H4-H2B cleft and the acidic patch using one nucleosome-binding domain for each epitope. Second, incorporation of non-canonical histones in nucleosomes introduces specific interaction surfaces that allow histone variant-specific nucleosome binding (**Figure 1D**). An example hereof are CENP-N and CENP-C that recognize the incorporated histone H3 variant CENP-A [41, 42].

Finally, the nucleosomal DNA is a major protein interaction site. First, it forms the binding site of linker histone H1 [43–45] (see also Section 4.9). Second, it is often involved in additional synergistic interactions to nucleosome-binding domains (**Figure 1E**). Finally, recent studies have identified transcription factor proteins that primarily bind to nucleosomal DNA. These so-called pioneer factors bind their DNA target sites while embedded in the nucleosome [46–48]. The structural details of these are however still lacking.

Throughout the advances in studies on nucleosome binding, it has become clear that binding of effector proteins in many cases involves interactions of nucleosome-binding domains to multiple nucleosome epitopes (**Figure 1G**, **H**). However, due to their size and complexity as well as the stability and dynamics of complex formation, the nucleosome is a challenging system for structural biology.
