**Abstract**

High confidence methods are needed for determining the glycosylation profiles of complex biological samples as well as recombinant therapeutic proteins. A common glycan analysis workflow involves liberation of N-glycans from glycoproteins with PNGase F or O-glycans by hydrazinolysis prior to their analysis. This method is limited in that it does not permit determination of glycan attachment sites. Alternative proteomics-based workflows are emerging that utilize site-specific proteolysis to generate peptide mixtures followed by selective enrichment strategies to isolate glycopeptides. Methods designed for the analysis of complex samples can yield a comprehensive snapshot of individual glycans species, the site of attachment of each individual glycan and the identity of the respective protein in many cases. This chapter will highlight advancements in enzymes that digest glycoproteins into distinct fragments and new strategies to enrich specific glycopeptides.

**Keywords:** glycoproteomics, glycopeptide enrichment, lectin, Fbs1, BGL, O-glycoprotease, alpha-lytic protease

### **1. Introduction**

Protein glycosylation is a post-translational carbohydrate ('glycan') modification of eukaryotic proteins that may affect their folding, stability, localization and biological function. Glycan profiles differ from cell type to cell type and are known to be altered in carcinogenesis [1, 2], inflammation [3], and Alzheimer's disease [4] etc. Importantly, circulating plasma proteins may serve as biomarkers as altered glycosylation profiles may signal specific types of disease. Glycosylation may be assessed on a global level by isolating total protein from tissue, cells or serum followed by liberation of glycans and analysis by reliable methods such as ultra performance liquid chromatography (UPLC) coupled to mass spectrometry (MS). Profiling liberated glycans is useful in some cases, but site of attachment (glycosite) and the glycan structure at each glycosite is more valuable. This "total picture" is attainable when performing liquid chromatography (LC) with tandem mass spectrometry (LC–MS/ MS) at the glycopeptide level. Furthermore, it is critically important to be able to strictly characterize therapeutic proteins to ensure reproducible glycosylation,

safety and efficacy as the absence, presence or type of glycan is known to dictate the efficacy of some therapeutic molecules [5].

Glycans are attached to certain asparagine residues (N-linked glycans) or serine/ threonine residues (O-linked glycans). N-linked glycosylation occurs at the Asn-X-Ser/Thr/Cys (where X is not proline) consensus sequence on proteins that pass through the eukaryotic secretory pathway. There are three structural classes of N-glycan that share a common trimannosyl chitobiose core motif (Man3GlcNAc2) (**Figure 1a**). This core may be further variably decorated with mannose, fucose, galactose, sialic acid, N-acetylgalactosamine (GalNAc) and N-acetylglucosamine (GlcNAc). In contrast to N-glycans, O-glycans are appended to the hydroxyl oxygen of Ser or Thr residues with no strong consensus sequence defining a glycosite. There are eight structural classes of O-glycans that are defined by core di- or trisaccharides that occupy a glycosite (**Figure 1b**). Each of these cores can be further elaborated with other sugars yielding a large variety of possible O-glycan structures. Over 10% of secreted human proteins carry some form of O-glycan modification. A second form of O-glycosylation occurs on nuclear and cytoplasmic proteins, where a single β-linked GlcNAc is attached to Ser or Thr residues. β-O-GlcNAcylation is an essential, dynamic modification that is important in cell signaling and differentiation [6]. Finally, chemical groups (*e.g*., sulfate, phosphate, acetate, methyl, etc.) may also occur at various positions on certain N- and O-glycan sugars [7].

#### **Figure 1.**

*Basic structures of N-glycans (a) and core structures of O-glycans (b). a, N-glycans can be categorized into three basic types: High mannose, Complex and Hybrid. The core structure (Man3GlcNAc2) of N-glycans is indicated by the orange triangle. A GlcNAc residue can attach to a* β*-mannose of the N-linked glycan core, resulting in a bisecting N-glycan (illustrated in Complex N-glycan). The reducing end GlcNAc (indicated by an arrow) of N-glycans can also be modified with a fucose (illustrated in Complex N-glycan). A N-glycan modifies a peptide via its reducing end GlcNAc attaching to Asparagine (N) within the peptide. b, eight core structures of O-glycans. O-glycan starts with a GalNAc (reducing end, indicated by an arrow), and further modifications can be added to the non-reducing end of the core structures. In O-glycopeptides, O-glycans are attached to the hydroxyl group of Serine (S) or Threonine (T) via the reducing end GalNAc.*

#### *Improving the Study of Protein Glycosylation with New Tools for Glycopeptide Enrichment DOI: http://dx.doi.org/10.5772/intechopen.97339*

Protein glycosylation is remarkable in its structural complexity. This trait reflects the way in which glycans are synthesized and transferred to proteins. Glycans are assembled by complex biosynthetic pathways consisting of many different enzymes. Individual monosaccharides become linked together by glycosyltransferases that each have sugar and stereochemical specificity. For example, the elaborate mammalian 14 sugar N-glycan precursor consists of only three types of monosaccharides (Glc3Man9GlcNAc2), yet its assembly requires the coordinated action of 13 different glycosyltransferases. There are over 200 different glycosyltransferases that affect glycan structures in the mammalian glycome [8]. Gene expression of some of these enzymes varies by tissue, cell type, and epigenetic regulation resulting in significant structural variation of glycans. A glycoform is a single protein isoform having a defined glycan present at each glycosite. As such, proteins naturally exist as collections of glycoforms. Additionally, some protein isoforms periodically lack glycan occupancy at a potential glycosite. The complexity of these attributes of glycoproteins underscores the technical challenges associated with deconvoluting any given glycome.

Analysis of glycan structure has been performed several different ways. However, the most common approaches typically utilize one of two strategies: (i) analysis of glycans that have been released from glycoproteins or ii) bottom-up proteomics analysis of peptide/glycopeptide mixtures. Standard N-glycan profiling methods begin with liberation of N-glycans from a glycoprotein with the enzyme PNGase F. Typically, they are then labelled at their reducing ends with a fluorophore, and separated via high/ultra-performance liquid chromatography (H/UPLC) or capillary electrophoresis (CE) with fluorescence detection and optional inline mass detection [9]. Glycan structures are assigned to observed peaks by comparing mobility and mass data to glycan reference databases [10]. Exoglycosidases with precise specificities can be used to further confirm structural assignments [11, 12]. For O-glycans, no enzyme that releases a broad range of elaborated O-glycan structures has been identified. Chemical release of O-glycans via hydrazinolysis can be achieved, but this can damage some released glycans [13, 14]. In addition, released N- and O-glycans may be permethylated and analyzed directly by LC–MS/ MS or MALDI-MS [15]. Finally, for both N- and O-glycans, profiling of released glycans provides a catalog of the range of structures present in a sample, but it does not provide information regarding their point of attachment in a protein.

A more data-rich method of glycoprotein analysis uses bottom-up proteomics to analyze peptide/glycopeptide mixtures. In this approach, a glycoprotein is treated with a protease (*e.g*., trypsin) to generate a pool of peptides that are then analyzed by mass spectrometry (typically LC–MS/MS). Data are processed by computer algorithms with the help of protein and glycan mass reference databases (*e.g*., Byonic software and O-Pair Search) to generate a peptide map and identify appended glycans. Advantages of this method are that the same workflow can yield information about both N- and O-glycans (and other protein modifications), it identifies glycosites, it can determine both glycan occupancy and the range of glycan structures at each glycosite, and it can be quantitative. This approach (termed the 'multi-attribute method', MAM) is gaining traction in the pharmaceutical industry for monitoring the purity of biologic drugs and is expected to become the industry standard for final product characterization [16]. Despite its benefits, there are still technical challenges facing glycoproteomics analyses. For example, existing proteases (*e.g*., trypsin) used in proteomics often generate large peptides that may have multiple glycans (especially for O-glycans that tend to be clustered within proteins). These generated glycopeptides can be either too large to detect by MS or it can be difficult to assign glycosites on such peptides with high confidence. Therefore, better approaches are needed to generate glycopeptides. Additionally,

## *Fundamentals of Glycosylation*

glycopeptides represent a small portion of a peptide mixture and often do not ionize well. New methods that address sample complexity through enrichment of specific glycopeptides are emerging.

The field has begun to address these issues through development of new reagents that aid in glycoproteomics. Novel proteases, including those that have specificity for O-glycans, have recently been characterized and validated in glycoproteomics workflows. Additionally, reagents and methods that permit selective enrichment of glycopeptides have been applied to reduce sample complexity. In this chapter, we review advances in glycopeptide generation and enrichment methods that are helping to improve glycopeptide analysis. Additionally, we present an example case study illustrating N-glycopeptide enrichment to address glycan heterogeneity in Wnt signaling.
