**2.2.3 Glyco-epitope on synovial lubricin verified by LC-MS**

Though immunoassay with lectins and anti-carbohydrate antibodies is convenient to detect glyco-epitopes, inner structural information is commonly scant. Furthermore, some glyco-

Glycoproteomics of Lubricin-Implication

while ion at *m/z* 1477 ([M-H]-

of Important Biological Glyco- and Peptide-Epitopes in Synovial Fluid 139

remaining 18% of the total *O*-glycans detected, were found in this and a previous study (Estrella et al., 2010). Three representative MS2 spectra of core 2 *O*-linked oligosaccharide is shown, with ions at *m/z* 1040, 1477 and 667 (Fig. 3B, C and D). The [M-H]--ion at *m/z* 1040 (NeuAc1Hex2HexNAc2) demonstrates a mono-sialylated core 2 *O*-linked oligosaccharide,

additional sialic acid and one fucose. This structure has a sequence indicative of a sialyl Lewistype terminal glyco-epitope. These types of sialylated structures together with sialylated core 1 *O*-glycan are consistent with the positive WGA and MAA lectin blots. The Western blot results showed that lubricin were only positive to sLex-specific antibody but negative to sLea. This suggests that synovial lubricin carries sLex [NeuAcα2,3Galβ1,4(Fucα1,3)GlcNAc] epitope (spectrum in Fig. 3C) on core 2 structures. Sulfated core 2 *O*-glycans were also found in this study (Fig. 3D) and previous study (Estrella et al., 2010). Due to lack of good antibodies and lectins, this epitope could only be identified by MS but not by lectin analysis. This argues for LC-MS and lectin as complementary techniques that need to be applied in glycomics studies.

Though several proteomic analyses using synovial fluid samples have been carried out (Ruiz-Romero & Blanco, 2010), lubricin (or its fragments) appeared in only a few reports (Gobezie et al., 2007; Kamphorst et al., 2007; Estrella et al., 2010). To fully characterize

When the dominating band (area 2, Fig. 4B-2) was analyzed, 28.5% of the lubricin sequence could be identified and believed to represent the fully glycosylated full-length secreted lubricin. The unidentified portion was mostly located to the mucin-like domain of lubricin (Fig. 4C). Lubricin was also detected in all other pieces of the gel indicating that lubricin existed as fragments or splice variants. Sequences of all exons could be detected except exon 1, consisting of the *N*-terminal 24 amino acid-signal sequence. In addition to the area 2 (Fig. 4B-2) where full-length lubricin was detected, remarkably high sequence recovery of lubricin was also found in the low mass region below 65 kDa (Fig. 4B-5). Identified peptides were from both *N*- and *C*-terminal implying these fragments were generated by proteolytic cleavage close to or within mucin-like domain. Examples of LC-MS2 spectra of tryptic peptides from *N*-and *C*-terminal region of lubricin is shown in Fig. 5. Both *N*- and *C*-terminal fragments of lubricin have been found in other studies (Flannery et al., 1999; Rhee et al., 2005b). These data together with our presented data suggest that lubricin is present in synovial fluid as both full-length and degraded proteins. Few peptides (7.7%) were recovered from the area higher than lubricin area (Fig. 4B-1). This is probably caused by inefficient reduction and trace amount of multimer of lubricin which has been found in synovial fluid recently (Schmidt et al., 2009). The dominating bands in area 3 and 5 are fibronectin and the *C*-terminal fragment of lubricin,

In addition to detection of lubricin, co-purified proteins were also identified by the proteomic approach. Table 1 listed top 3 proteins identified in each gel area, which consisted of 7 unique proteins and their fragments. Except serum albumin, other proteins are glycoproteins. The presence of the lower molecular weight fibronectin in high molecular area (area 1) confirmed inefficient reduction and suggest the presence of fibronectin dimers or oligomers. Alternatively, both serum albumin and fibronectin have both been reported to bind to lubricin *in vitro* (Schmid et al., 2002) and may have been attached to lubricin during the purification. The possible association of lubricin with these proteins or their fragments is

**2.2.4 Identification of synovial lubricin fragment by proteomic analysis** 

respectively (Fig. 4B, Jin et al., unpublished results).

under investigation by our group.

synovial lubricin, the enriched samples were also subjected to proteomic analysis.

, NeuAc2Hex2deHex1HexNAc2) is the same core with one

epitopes may not be detectable because of hindrance in space or detect limitations. In addition, certain glyco-epitopes are short of specific antibodies. For example, there is currently no antibody available that could distinguish 3-*O*-sulfation from 6-*O*- sulfation. Therefore, to scrutinize the result obtained by Western blot/lectin blot of *O*-linked oligosaccharides on lubricin, purified samples were also subjected to β-elimination with mild base. Released oligosaccharides were then analyzed by LC-MS equipped with online graphitized carbon column as previously described (Estrella et al., 2010).

Fig. 3. Examples of core 1 and 2 *O*-linked oligosaccharides found on synovial lubricin determined by LC-MS using the [*M*-H]- -ions as precursors. (A) MS2 spectra of core 1 *O*glycan (T antigen) at *m/z* 384; (B) MS2 spectra of mono-sialylated core 2 *O*-glycan with one α2,3-linked NeuAc at *m/z* 1040; (C) MS2 spectra of ion at *m/z* 1477 indicating a terminal sLex [NeuAcα2,3Galβ1,4(Fucα1,3)GlcNAc] epitope; (D) MS2 spectra of ion at *m/z* 667, in which produced ion at *m/z* 282 indicate a sulfate group linked to GlcNAc. Purple diamond stands for sialic acid (NeuAc); yellow circle for galactose (Gal); blue square for *N*acetylglucosamine (GlcNAc); yellow square for *N*-acetylgalactosamine (GalNAc); red triangle for fucose (Fuc); S for sulfate.

Consistent with findings from a previous study of lubricin (Estrella et al., 2010), core 1 *O*linked oligosaccharides including T antigen (Galβ1,3GalNAc-*O*-Ser/Thr) and sialyl T antigen were the predominant *O*-linked oligosaccharides. As illustrated in Fig. 3A, MS2 of ion at *m/z* 384 ([M-H]- ) indicates a composition of Hex1HexNAc1, corresponding to a T antigen. The presence of the Z ion fragment at *m/z* 204.1 is consistent with a composition of reduced HexNAc, while C ion fragment at *m/z* 179.0 indicates a terminal Hex. In comparison with MS2 spectra in the database of UniCarb-DB (2011 version) (Hayes et al., 2011), the structure is consistent with Galβ1,3GalNAc, and its amount approximately accounts for 10% of total *O*glycans on lubricin. Together with mono-sialylated [NeuAcα2,3Galβ1,3GalNAc] and [Galβ1,3(NeuAcα2,6)GalNAc] and di-sialylated [NeuAcα2,3Galβ1,3(NeuAcα2,6)GalNAc] structures, core 1-based structures accounted for up to 82% of total *O*-glycan, based on the total ion count. A small proportion of core 2 oligosaccharides, which account for the

epitopes may not be detectable because of hindrance in space or detect limitations. In addition, certain glyco-epitopes are short of specific antibodies. For example, there is currently no antibody available that could distinguish 3-*O*-sulfation from 6-*O*- sulfation. Therefore, to scrutinize the result obtained by Western blot/lectin blot of *O*-linked oligosaccharides on lubricin, purified samples were also subjected to β-elimination with mild base. Released oligosaccharides were then analyzed by LC-MS equipped with online

graphitized carbon column as previously described (Estrella et al., 2010).

Fig. 3. Examples of core 1 and 2 *O*-linked oligosaccharides found on synovial lubricin determined by LC-MS using the [*M*-H]--ions as precursors. (A) MS2 spectra of core 1 *O*glycan (T antigen) at *m/z* 384; (B) MS2 spectra of mono-sialylated core 2 *O*-glycan with one α2,3-linked NeuAc at *m/z* 1040; (C) MS2 spectra of ion at *m/z* 1477 indicating a terminal sLex [NeuAcα2,3Galβ1,4(Fucα1,3)GlcNAc] epitope; (D) MS2 spectra of ion at *m/z* 667, in which produced ion at *m/z* 282 indicate a sulfate group linked to GlcNAc. Purple diamond stands

acetylglucosamine (GlcNAc); yellow square for *N*-acetylgalactosamine (GalNAc); red

Consistent with findings from a previous study of lubricin (Estrella et al., 2010), core 1 *O*linked oligosaccharides including T antigen (Galβ1,3GalNAc-*O*-Ser/Thr) and sialyl T antigen were the predominant *O*-linked oligosaccharides. As illustrated in Fig. 3A, MS2 of ion at *m/z*

presence of the Z ion fragment at *m/z* 204.1 is consistent with a composition of reduced HexNAc, while C ion fragment at *m/z* 179.0 indicates a terminal Hex. In comparison with MS2 spectra in the database of UniCarb-DB (2011 version) (Hayes et al., 2011), the structure is consistent with Galβ1,3GalNAc, and its amount approximately accounts for 10% of total *O*glycans on lubricin. Together with mono-sialylated [NeuAcα2,3Galβ1,3GalNAc] and [Galβ1,3(NeuAcα2,6)GalNAc] and di-sialylated [NeuAcα2,3Galβ1,3(NeuAcα2,6)GalNAc] structures, core 1-based structures accounted for up to 82% of total *O*-glycan, based on the total ion count. A small proportion of core 2 oligosaccharides, which account for the

) indicates a composition of Hex1HexNAc1, corresponding to a T antigen. The

for sialic acid (NeuAc); yellow circle for galactose (Gal); blue square for *N*-

triangle for fucose (Fuc); S for sulfate.

384 ([M-H]-

remaining 18% of the total *O*-glycans detected, were found in this and a previous study (Estrella et al., 2010). Three representative MS2 spectra of core 2 *O*-linked oligosaccharide is shown, with ions at *m/z* 1040, 1477 and 667 (Fig. 3B, C and D). The [M-H]--ion at *m/z* 1040 (NeuAc1Hex2HexNAc2) demonstrates a mono-sialylated core 2 *O*-linked oligosaccharide, while ion at *m/z* 1477 ([M-H]- , NeuAc2Hex2deHex1HexNAc2) is the same core with one additional sialic acid and one fucose. This structure has a sequence indicative of a sialyl Lewistype terminal glyco-epitope. These types of sialylated structures together with sialylated core 1 *O*-glycan are consistent with the positive WGA and MAA lectin blots. The Western blot results showed that lubricin were only positive to sLex-specific antibody but negative to sLea. This suggests that synovial lubricin carries sLex [NeuAcα2,3Galβ1,4(Fucα1,3)GlcNAc] epitope (spectrum in Fig. 3C) on core 2 structures. Sulfated core 2 *O*-glycans were also found in this study (Fig. 3D) and previous study (Estrella et al., 2010). Due to lack of good antibodies and lectins, this epitope could only be identified by MS but not by lectin analysis. This argues for LC-MS and lectin as complementary techniques that need to be applied in glycomics studies.

### **2.2.4 Identification of synovial lubricin fragment by proteomic analysis**

Though several proteomic analyses using synovial fluid samples have been carried out (Ruiz-Romero & Blanco, 2010), lubricin (or its fragments) appeared in only a few reports (Gobezie et al., 2007; Kamphorst et al., 2007; Estrella et al., 2010). To fully characterize synovial lubricin, the enriched samples were also subjected to proteomic analysis.

When the dominating band (area 2, Fig. 4B-2) was analyzed, 28.5% of the lubricin sequence could be identified and believed to represent the fully glycosylated full-length secreted lubricin. The unidentified portion was mostly located to the mucin-like domain of lubricin (Fig. 4C). Lubricin was also detected in all other pieces of the gel indicating that lubricin existed as fragments or splice variants. Sequences of all exons could be detected except exon 1, consisting of the *N*-terminal 24 amino acid-signal sequence. In addition to the area 2 (Fig. 4B-2) where full-length lubricin was detected, remarkably high sequence recovery of lubricin was also found in the low mass region below 65 kDa (Fig. 4B-5). Identified peptides were from both *N*- and *C*-terminal implying these fragments were generated by proteolytic cleavage close to or within mucin-like domain. Examples of LC-MS2 spectra of tryptic peptides from *N*-and *C*-terminal region of lubricin is shown in Fig. 5. Both *N*- and *C*-terminal fragments of lubricin have been found in other studies (Flannery et al., 1999; Rhee et al., 2005b). These data together with our presented data suggest that lubricin is present in synovial fluid as both full-length and degraded proteins. Few peptides (7.7%) were recovered from the area higher than lubricin area (Fig. 4B-1). This is probably caused by inefficient reduction and trace amount of multimer of lubricin which has been found in synovial fluid recently (Schmidt et al., 2009). The dominating bands in area 3 and 5 are fibronectin and the *C*-terminal fragment of lubricin, respectively (Fig. 4B, Jin et al., unpublished results).

In addition to detection of lubricin, co-purified proteins were also identified by the proteomic approach. Table 1 listed top 3 proteins identified in each gel area, which consisted of 7 unique proteins and their fragments. Except serum albumin, other proteins are glycoproteins. The presence of the lower molecular weight fibronectin in high molecular area (area 1) confirmed inefficient reduction and suggest the presence of fibronectin dimers or oligomers. Alternatively, both serum albumin and fibronectin have both been reported to bind to lubricin *in vitro* (Schmid et al., 2002) and may have been attached to lubricin during the purification. The possible association of lubricin with these proteins or their fragments is under investigation by our group.

Glycoproteomics of Lubricin-Implication

Basement membranespecific heparan sulfate proteoglycan core protein

**Protein identified MW** 

\* Molecular weight of apoprotein obtained from protein database.

**Gel area** 

1

2

3

4

5

numbers are also listed.

of Important Biological Glyco- and Peptide-Epitopes in Synovial Fluid 141

**Peptide identified** 

Fibronectin 262.4 42 P02751 24.9

Apolipoprotein B-100 515.2 13 P04114 4.0

Alpha-2-macroglobulin 163.2 15 P01023 15.0

Aggrecan core protein 250.0 15 P16112 6.9

Apolipoprotein B-100 515.2 111 P04114 30.9

Alpha-2-macroglobulin 163.2 36 P01023 36.2

Serum albumin (HSA) 69.2 50 P02768 75.5

Apolipoprotein B-100 515.2 52 P04114 16.9

Fibronectin 262.4 85 P02751 44.9

Fibronectin 262.4 51 P02751 30.9

Lubricin 151.0 39 Q92954 25.4

Fibronectin 262.4 38 P02751 23.1

Serum albumin 69.3 33 P02768 60.4

Table 1. Proteins identified in enriched synovial fluid sample. Reduced and alkylated lubricin sample was separated by 3-8% Tris/acetate NuPAGE. The entire gel line (Fig. 4B) was cut into five pieces (1 to 5). Gel pieces were subjected to in-gel digestion with trypsin. The resultant peptides were applied to nano-LC-MS2. The proteins were identified from peptide MS/MS spectra, searched against Uniprot human protein database using GPM software. The 3 top ranked proteins from 1-5 cut areas with their molecular weight in kDa and with the number of unique peptides for each protein are listed in the table 1. The recoveries (%) of the 3 top ranked proteins sequence and their UniProt identification

Lubricin 151.0 69 Q92954 28.5

**Protein ID** 

468.5 14 P98160 4.0

**Coverage (%)** 

**(kDa)\*** 

Fig. 4. Proteomic analysis of enriched synovial lubricin. (A and B) Reduced and alkylated lubricin sample was separated by 3-8% Tris/acetate NuPAGE. Protein bands were visualized by Coomassie blue. Gel slab was cut into five pieces (1 to 5) and subjected to LC-MS/MS analysis after trypsin digestion. The graph (A) shows the recoveries (%) of lubricin sequence from different cut areas. (C) Peptide map of lubricin recovered from different gel areas. The horizontal axis stands for the lubricin amino acid sequence (in total 1404 amino acids). E1 to E12 indicates the end of exon. \*fibronectin; \*\**C*-terminus of lubricin

#### **2.2.5 Characterization of lubricin mucin-like domain**

Because the sequence in mucin-like domain is still largely unknown, several ways were tried to characterize this heavily *O*-glycosylated domain. As shown in Fig. 4C, resolved peptides from lubricin contain both *N*- and *C-*terminus (Fig. 5). Sequenced *N*-terminus spanned from residue 25 to 334, while *C*-terminus spanned from residue 1094 to 1383 (1404 amino acids in full-length). Only one peptide (A888LENSPKEPGVPTTK902) within mucin-like domain (348- 855) containing 59 imperfect/perfect 8-amino acid repeats (KxPxPTTx) was found in area 2. It is believed that because of heavy *O*-glycosylation, the protein domain with this modification is normally not accessible to proteases and hence the low recovery obtained. In the case of synovial lubricin, however, it could be completely digested with trypsin in both reducing and non-reducing condition (Fig. 6A). The digestion was so complete that lubricin-

Fig. 4. Proteomic analysis of enriched synovial lubricin. (A and B) Reduced and alkylated lubricin sample was separated by 3-8% Tris/acetate NuPAGE. Protein bands were

visualized by Coomassie blue. Gel slab was cut into five pieces (1 to 5) and subjected to LC-MS/MS analysis after trypsin digestion. The graph (A) shows the recoveries (%) of lubricin sequence from different cut areas. (C) Peptide map of lubricin recovered from different gel areas. The horizontal axis stands for the lubricin amino acid sequence (in total 1404 amino

Because the sequence in mucin-like domain is still largely unknown, several ways were tried to characterize this heavily *O*-glycosylated domain. As shown in Fig. 4C, resolved peptides from lubricin contain both *N*- and *C-*terminus (Fig. 5). Sequenced *N*-terminus spanned from residue 25 to 334, while *C*-terminus spanned from residue 1094 to 1383 (1404 amino acids in full-length). Only one peptide (A888LENSPKEPGVPTTK902) within mucin-like domain (348- 855) containing 59 imperfect/perfect 8-amino acid repeats (KxPxPTTx) was found in area 2. It is believed that because of heavy *O*-glycosylation, the protein domain with this modification is normally not accessible to proteases and hence the low recovery obtained. In the case of synovial lubricin, however, it could be completely digested with trypsin in both reducing and non-reducing condition (Fig. 6A). The digestion was so complete that lubricin-

acids). E1 to E12 indicates the end of exon. \*fibronectin; \*\**C*-terminus of lubricin

**2.2.5 Characterization of lubricin mucin-like domain** 


\* Molecular weight of apoprotein obtained from protein database.

Table 1. Proteins identified in enriched synovial fluid sample. Reduced and alkylated lubricin sample was separated by 3-8% Tris/acetate NuPAGE. The entire gel line (Fig. 4B) was cut into five pieces (1 to 5). Gel pieces were subjected to in-gel digestion with trypsin. The resultant peptides were applied to nano-LC-MS2. The proteins were identified from peptide MS/MS spectra, searched against Uniprot human protein database using GPM software. The 3 top ranked proteins from 1-5 cut areas with their molecular weight in kDa and with the number of unique peptides for each protein are listed in the table 1. The recoveries (%) of the 3 top ranked proteins sequence and their UniProt identification numbers are also listed.

Glycoproteomics of Lubricin-Implication

of Important Biological Glyco- and Peptide-Epitopes in Synovial Fluid 143

Fig. 5. MS2 spectra of one *N*-terminal (A) and one *C*-terminal (B) peptide derived from reduced and alkylated lubricin that was searched against UniProt and NCBI human protein

database using GPM software. The position of the *N*-terminal peptide in the protein sequence starts from amino acid 25 and ends at 33. The *m/z* 976.41 is the [M+H]+ precursor ion and *m/z* 488.71 is the [M+H]2+. The assigned ID number for this peptide in the GPM database is 1193. The position of the *C*-terminal peptide starts from amino acid 1265 and ends at 1274. The *m/z* 1156.59 is the [M+H]+ precursor ion and *m/z* 578.80 is the [M+H]2+.The

assigned ID number for this peptide in the GPM is 2538.

specific antibodies showed negative in Western blot (data not shown). This data suggests that the mucin domain of lubricin is different from mucin domains of traditional mucous mucins which are not susceptible to trypsin. Difference in glycosylation between lubricin and traditional mucins was also suggested by the treatment with *O*-sialoglycoprotein endopeptidase from *Pasteurella haemolytica* which cleaves heavily sialylated mucin-domain, but only had minor effect on lubricin. As shown in Fig. 6B, after overnight incubation, the density of Coomassie blue stained band was significantly diminished. However, when the digested same samples were probed with mouse monoclonal antibody, Western blot showed that epitope of the antibody, recognizing part of the unglycosylated *N*-terminal region, was still attached to the large mucin domain. Only a small shift in the migration in SDS-PAGE was observed after the endopeptidase treatment.

In order to show that the reason for low recovery of the mucin domain was due to glycosylation, sialidase A and *O*-glycanase were used to remove the majority of *O*-linked oligosaccharides (Fig. 6C and 7). Desialylation with sialidase A decreased the size of lubricin on Ag-PAGE verified that synovial lubricin contained sialic acid (Fig. 6C). When further treated with *O*-glycanase, which cleaves core 1 type *O*-linked glycan (Galβ1,3GalNAcα1-*O*-Ser/Thr) on glycoproteins and glycopeptides, the size of lubricin decreased dramatically and was close to the calculated molecular weight of lubricin without posttranslational modification, i.e. 148 kDa. These results suggested that lubricin is heavily glycosylated and core 1 type *O*-linked oligosaccharides are the predominant *O*-glycans on lubricin. Bands after sialidase A treatment with or without subsequent *O*-glycanase treatment were subjected to LC-MS/MS analysis after trypsin digestion. Sialidase A alone recover 19.1% of lubricin sequence (Fig. 7), most of the peptides were located in the mucin-like domain including 18 random repeats of EPAPTTPK. In contrast, removal of glycosylation using both sialidase and *O*-glycanase gave up to 48% recovery of the lubricin sequence (Fig. 7). By removal of core 1 *O*-glycans, more protein core was revealed and made accessible for digestion providing peptides from the mucin domain repeated to be recovered and detected by LC-MS. Resolved sequence covered almost entire mucin-like domain of lubricin and repeat region without glycosylation could be identified (Fig. 8).

#### **2.3 Discussion**

Though several biomarkers in SF and serum have been associated with RA and OA, no single biomarker has sufficient discriminating power to clearly indicate prognosis. Hence, the quest to find new, more efficient single biomarker for cartilage degrading diseases remains. On the other hand, measurement of multiple biomarkers at the time of diagnosis would improve diagnosis accuracy and even early diagnosis. As a candidate biomarker, SF lubricin has been found to be an important lubricant in SF, but expression level is also associated with inflammation. Lubricin has not been characterized fully because of its size and heavily *O*-glycosylation. In this study, SF lubricin was characterized by both glycomic and proteomic means, indicating that in addition to the level of lubricin in SF, both the glycosylation and its degradation are potential marker for disease progression and inflammation.

In combination with our previous study (Estrella et al., 2010) and this study (Fig. 2 and 3), synovial lubricin was shown to possess predominantly core 1 *O*-linked oligosaccharides. Even in a low amounts, with the aid of liquid chromatography-mass spectrometry (LC-MS), small proportions of core 2 oligosaccharides were found to carry sulfate group. In addition,

specific antibodies showed negative in Western blot (data not shown). This data suggests that the mucin domain of lubricin is different from mucin domains of traditional mucous mucins which are not susceptible to trypsin. Difference in glycosylation between lubricin and traditional mucins was also suggested by the treatment with *O*-sialoglycoprotein endopeptidase from *Pasteurella haemolytica* which cleaves heavily sialylated mucin-domain, but only had minor effect on lubricin. As shown in Fig. 6B, after overnight incubation, the density of Coomassie blue stained band was significantly diminished. However, when the digested same samples were probed with mouse monoclonal antibody, Western blot showed that epitope of the antibody, recognizing part of the unglycosylated *N*-terminal region, was still attached to the large mucin domain. Only a small shift in the migration in

In order to show that the reason for low recovery of the mucin domain was due to glycosylation, sialidase A and *O*-glycanase were used to remove the majority of *O*-linked oligosaccharides (Fig. 6C and 7). Desialylation with sialidase A decreased the size of lubricin on Ag-PAGE verified that synovial lubricin contained sialic acid (Fig. 6C). When further treated with *O*-glycanase, which cleaves core 1 type *O*-linked glycan (Galβ1,3GalNAcα1-*O*-Ser/Thr) on glycoproteins and glycopeptides, the size of lubricin decreased dramatically and was close to the calculated molecular weight of lubricin without posttranslational modification, i.e. 148 kDa. These results suggested that lubricin is heavily glycosylated and core 1 type *O*-linked oligosaccharides are the predominant *O*-glycans on lubricin. Bands after sialidase A treatment with or without subsequent *O*-glycanase treatment were subjected to LC-MS/MS analysis after trypsin digestion. Sialidase A alone recover 19.1% of lubricin sequence (Fig. 7), most of the peptides were located in the mucin-like domain including 18 random repeats of EPAPTTPK. In contrast, removal of glycosylation using both sialidase and *O*-glycanase gave up to 48% recovery of the lubricin sequence (Fig. 7). By removal of core 1 *O*-glycans, more protein core was revealed and made accessible for digestion providing peptides from the mucin domain repeated to be recovered and detected by LC-MS. Resolved sequence covered almost entire mucin-like domain of lubricin and

Though several biomarkers in SF and serum have been associated with RA and OA, no single biomarker has sufficient discriminating power to clearly indicate prognosis. Hence, the quest to find new, more efficient single biomarker for cartilage degrading diseases remains. On the other hand, measurement of multiple biomarkers at the time of diagnosis would improve diagnosis accuracy and even early diagnosis. As a candidate biomarker, SF lubricin has been found to be an important lubricant in SF, but expression level is also associated with inflammation. Lubricin has not been characterized fully because of its size and heavily *O*-glycosylation. In this study, SF lubricin was characterized by both glycomic and proteomic means, indicating that in addition to the level of lubricin in SF, both the glycosylation and its degradation are potential marker for disease progression and

In combination with our previous study (Estrella et al., 2010) and this study (Fig. 2 and 3), synovial lubricin was shown to possess predominantly core 1 *O*-linked oligosaccharides. Even in a low amounts, with the aid of liquid chromatography-mass spectrometry (LC-MS), small proportions of core 2 oligosaccharides were found to carry sulfate group. In addition,

SDS-PAGE was observed after the endopeptidase treatment.

repeat region without glycosylation could be identified (Fig. 8).

**2.3 Discussion** 

inflammation.

Fig. 5. MS2 spectra of one *N*-terminal (A) and one *C*-terminal (B) peptide derived from reduced and alkylated lubricin that was searched against UniProt and NCBI human protein database using GPM software. The position of the *N*-terminal peptide in the protein sequence starts from amino acid 25 and ends at 33. The *m/z* 976.41 is the [M+H]+ precursor ion and *m/z* 488.71 is the [M+H]2+. The assigned ID number for this peptide in the GPM database is 1193. The position of the *C*-terminal peptide starts from amino acid 1265 and ends at 1274. The *m/z* 1156.59 is the [M+H]+ precursor ion and *m/z* 578.80 is the [M+H]2+.The assigned ID number for this peptide in the GPM is 2538.

Glycoproteomics of Lubricin-Implication

treatment were in blue.

of Important Biological Glyco- and Peptide-Epitopes in Synovial Fluid 145

Fig. 7. Peptides recovery of bands excised from sialidase A and *O*-glycanase (Fig. 6C). The mucin-like domain is in capital (encoded by exon 6). Sequences recovered after sialidase A treatment were underlined, while sequences recovered after sialidase A and *O*-glycanase

With increased mechanical stress and protease activity associated with OA and RA, the fragmentation of lubricin shown here opens up a new possibility for disease-specific biomarkers. A few fragments of lubricin were detected in synovial fluid, which were enriched together with intact protein. The *O*-glycosylation domain is supposed to protect against proteolytic cleavage. In the case of lubricin, however, it was extensively degraded by trypsin but resistant to *O*-sialoglycoprotein endopeptidase (Fig . 5A). Similarly, lubricin has been found to be extensively degraded by papain, trypsin and pronase and to a lesser extent by pepsin (Flannery et al., 1999). Other proteases, such as neutrophil elastase (a serine protease) and cathepsin B (a cysteine protease), are also able to degrade lubricin *in vitro* (Jones et al., 2003; Elsaid et al., 2005). Interestingly, lubricin tryptic peptides were detected as low as the 30- 65 kDa region (Table 1 and Fig. 4). These fragments are unlikely to contain the full mucin-like domain, but more likely an *N*- or *C*-terminal domain with a portion of mucin-like domain (non-glycosylated *N-*terminus has a mass of 33.8 kDa and the *C*-terminus 35.4 kDa). So far, it is not clear whether they were from unique cleavages along lubricin sequence or just randomly excised *in vivo*. Evidence has indicated *N*-terminus of lubricin is more sensitive to neutrophil elastase (Elsaid et al., 2005). Purified neutrophil elastase has been shown to damage cartilage explants *in vitro* (Burkhardt et al., 1988). Also neutrophil elastase, and not MMPs, can destroy the superficial layer of cartilage where lubricin locates. Consequently, MMPs have better access to cartilage molecules in less superficial layers of cartilage (Jasin & Taurog, 1991). Are lubricin fragments associated with inflammation or pathophysiology of degenerative joint disease? Or does lubricin fragmentation patterns in OA or RA differ from those in healthy individuals? Recent studies reported the fragment of lubricin in SF (Gobezie et al., 2007; Kamphorst et al., 2007). In the work of Kamphorst et al., they found two lubricin *C*-terminal fragments (R1285PALNYPVYGETTQV1299 and D1373QYYNIDVPSRTA1385) in OA SF but not in healthy SF (Kamphorst et al., 2007). In the current study with enriched RA synovial lubricin,

Fig. 6. Proteomic analysis of lubricin under various conditions. All purified samples were reduced and alkylated before separation by 3-8% Tris/acetate NuPAGE gel. (A) Samples (5 µg) were incubated with trypsin. Aliquots were taken out at different time (0 to 16 hours). SDS-PAGE gel was stained with Coomassie blue. (B) Samples (5 µg) were treated with *O*sialoglycoprotein endopeptidase from *Pasteurella haemolytica.* A duplicated gel was blotted to PVDF membrane and probed with moue monoclonal anti-lubricin antibody. (C) Enriched lubricin sample (8 µg/lane) was treated with sialidase in absence or presence of *O*-glycanase at 37oC overnight. The resultant products were separated by Ag-PAGE under reducing condition.

a sLex epitope was also found present on a small proportion of the core 2 oligosaccharides. However, unlike sulfation, the level of fucosylation on lubricin was very low. Though in comparison with LC-MS, immunoassay seems less efficient but very specific to certain glycoepitopes. For example, MS2 spectra of ion at *m/z* 1477 suggested a Lewis type epitope. Without further fragmentation and known retention time on LC, it is not easy to define this structure of sLea or sLex. With sLea- and sLex-specific antibody, immunoblot demonstrated lubricin was modified with sLex. HAA is a lectin specific to terminal GalNAcα1- on *N*- or *O*-glycans. The lack of antibody recognition to Tn-antigen despite HAA reactivity indicated that exposed GalNAcα1- to protein backbone was only sparingly found. Additionally, synovial lubricin was shown to contain PNA binding epitopes (Fig. 2B). This is consistent with that PNA can been used as an affinity ligand to enrich synovial lubricin (Jay et al., 2001; Teeple et al., 2011). The result from the glycomic study using both LC-MSn and antibody/lectins showed the presence of a trace amount of Tn antigen, high abundant sialylated and unsialylated core 1 and several sialylated, fucosylated and sulfated core 2 oligosaccharides to be present on lubricin.

Suggestions of lubricin involvement in disease and inflammation can be identified from its glycosylation. Glycomic analysis showed that approximately 50% of the lubricin *O*-glycans contain terminal galactose, such as the T antigen. It makes lubricin a potential ligand for galectins, which are a mammalian lectin family recognizing terminal galactose. Increased expression of galectin-3 has been reported in synovial fluid from RA patients (Ohshima et al., 2003). Galectin-3 is believed to play a pro-inflammation role in joint diseases in which galectin-3 together with soluble fibrinogen was found to regulate neutrophil activation, degranulation and survival (Fernandez et al., 2005). Another attractive glyco-epitope on lubricin, sLex, is reminiscent of selectin ligands which are involved in leukocyte trafficking. For instance, although it is in a low amount, L-selectin on the surface of synovial neutrophils as well as soluble L-selectin are reported in synovial fluid (Humbria et al., 1994; De Clerck et al., 1995).

Fig. 6. Proteomic analysis of lubricin under various conditions. All purified samples were reduced and alkylated before separation by 3-8% Tris/acetate NuPAGE gel. (A) Samples (5 µg) were incubated with trypsin. Aliquots were taken out at different time (0 to 16 hours). SDS-PAGE gel was stained with Coomassie blue. (B) Samples (5 µg) were treated with *O*sialoglycoprotein endopeptidase from *Pasteurella haemolytica.* A duplicated gel was blotted to PVDF membrane and probed with moue monoclonal anti-lubricin antibody. (C) Enriched lubricin sample (8 µg/lane) was treated with sialidase in absence or presence of *O*-glycanase at 37oC overnight. The resultant products were separated by Ag-PAGE under reducing condition. a sLex epitope was also found present on a small proportion of the core 2 oligosaccharides. However, unlike sulfation, the level of fucosylation on lubricin was very low. Though in comparison with LC-MS, immunoassay seems less efficient but very specific to certain glycoepitopes. For example, MS2 spectra of ion at *m/z* 1477 suggested a Lewis type epitope. Without further fragmentation and known retention time on LC, it is not easy to define this structure of sLea or sLex. With sLea- and sLex-specific antibody, immunoblot demonstrated lubricin was modified with sLex. HAA is a lectin specific to terminal GalNAcα1- on *N*- or *O*-glycans. The lack of antibody recognition to Tn-antigen despite HAA reactivity indicated that exposed GalNAcα1- to protein backbone was only sparingly found. Additionally, synovial lubricin was shown to contain PNA binding epitopes (Fig. 2B). This is consistent with that PNA can been used as an affinity ligand to enrich synovial lubricin (Jay et al., 2001; Teeple et al., 2011). The result from the glycomic study using both LC-MSn and antibody/lectins showed the presence of a trace amount of Tn antigen, high abundant sialylated and unsialylated core 1 and several

sialylated, fucosylated and sulfated core 2 oligosaccharides to be present on lubricin.

Suggestions of lubricin involvement in disease and inflammation can be identified from its glycosylation. Glycomic analysis showed that approximately 50% of the lubricin *O*-glycans contain terminal galactose, such as the T antigen. It makes lubricin a potential ligand for galectins, which are a mammalian lectin family recognizing terminal galactose. Increased expression of galectin-3 has been reported in synovial fluid from RA patients (Ohshima et al., 2003). Galectin-3 is believed to play a pro-inflammation role in joint diseases in which galectin-3 together with soluble fibrinogen was found to regulate neutrophil activation, degranulation and survival (Fernandez et al., 2005). Another attractive glyco-epitope on lubricin, sLex, is reminiscent of selectin ligands which are involved in leukocyte trafficking. For instance, although it is in a low amount, L-selectin on the surface of synovial neutrophils as well as soluble L-selectin are reported in synovial fluid (Humbria et al., 1994; De Clerck et al., 1995).

Fig. 7. Peptides recovery of bands excised from sialidase A and *O*-glycanase (Fig. 6C). The mucin-like domain is in capital (encoded by exon 6). Sequences recovered after sialidase A treatment were underlined, while sequences recovered after sialidase A and *O*-glycanase treatment were in blue.

With increased mechanical stress and protease activity associated with OA and RA, the fragmentation of lubricin shown here opens up a new possibility for disease-specific biomarkers. A few fragments of lubricin were detected in synovial fluid, which were enriched together with intact protein. The *O*-glycosylation domain is supposed to protect against proteolytic cleavage. In the case of lubricin, however, it was extensively degraded by trypsin but resistant to *O*-sialoglycoprotein endopeptidase (Fig . 5A). Similarly, lubricin has been found to be extensively degraded by papain, trypsin and pronase and to a lesser extent by pepsin (Flannery et al., 1999). Other proteases, such as neutrophil elastase (a serine protease) and cathepsin B (a cysteine protease), are also able to degrade lubricin *in vitro* (Jones et al., 2003; Elsaid et al., 2005). Interestingly, lubricin tryptic peptides were detected as low as the 30- 65 kDa region (Table 1 and Fig. 4). These fragments are unlikely to contain the full mucin-like domain, but more likely an *N*- or *C*-terminal domain with a portion of mucin-like domain (non-glycosylated *N-*terminus has a mass of 33.8 kDa and the *C*-terminus 35.4 kDa). So far, it is not clear whether they were from unique cleavages along lubricin sequence or just randomly excised *in vivo*. Evidence has indicated *N*-terminus of lubricin is more sensitive to neutrophil elastase (Elsaid et al., 2005). Purified neutrophil elastase has been shown to damage cartilage explants *in vitro* (Burkhardt et al., 1988). Also neutrophil elastase, and not MMPs, can destroy the superficial layer of cartilage where lubricin locates. Consequently, MMPs have better access to cartilage molecules in less superficial layers of cartilage (Jasin & Taurog, 1991).

Are lubricin fragments associated with inflammation or pathophysiology of degenerative joint disease? Or does lubricin fragmentation patterns in OA or RA differ from those in healthy individuals? Recent studies reported the fragment of lubricin in SF (Gobezie et al., 2007; Kamphorst et al., 2007). In the work of Kamphorst et al., they found two lubricin *C*-terminal fragments (R1285PALNYPVYGETTQV1299 and D1373QYYNIDVPSRTA1385) in OA SF but not in healthy SF (Kamphorst et al., 2007). In the current study with enriched RA synovial lubricin,

Glycoproteomics of Lubricin-Implication

**3. Conclusion** 

**4. Acknowledgment**

**5. References** 

315-24, ISSN 0004-3591

0021-9533

1744-8409

pp. 2879-85, ISSN 0301-4851

regulated proteases on synovial lubricin during joint diseases.

and from Knut and Alice Wallenberg's foundation (KAW2007.0118).

of Important Biological Glyco- and Peptide-Epitopes in Synovial Fluid 147

Besides two flanking protein domains, in the middle of lubricin there is a mucin-like domain. To our knowledge, this is the first report of a protein sequence within this domain (Fig. 7 and 8). It should be noted that among 59 imperfect repeat units, 18 have a perfect repeating unit of EPAPTTPK (Fig. 7 and 8). *O*-glycanase treatment greatly increased the recovery. As discussed above, proteolytic cleavage sites on lubricin are probably located within the mucin-like domain, and here we show that the mucin domain of lubricin is indeed accessible to proteolytic enzymes. This is probably due to a dispersed *O*glycosylation in contrast to continuous *O*-glycosylation as the heavily bottle-brush-like *O*glycosylation on traditional mucins, which will hinder the access to the cleavage site. Being able to sequence mucin-like domain in lubricin will facilitate the mapping of authentic proteolytic cleavage sites on lubricin *in situ* and to investigate the effect of cytokine

In summary, using glycoproteomics we fully characterized the major glycoprotein in SF as lubricin. With knowledge of *O*-glycosylation and proteomic properties of lubricin, it allowed us to identify RA or OA-specific glyco-epitopes and fragments, enabling us to better understand how the glycosylation of lubricin is influenced by inflammation of the joint.

This work was supported by the Swedish Research Council (621-2010-5322), EU Marie Curie Program (PIRG-GA-2007-205302) and Reumatikerförbundet (R-85481). The mass spectrometers were obtained by grants from the Swedish Research Council (342-2004-4434)

Arnett, F. C., Edworthy, S. M., Bloch, D. A., McShane, D. J., Fries, J. F., Cooper, N. S., Healey,

Bao, J. P., Chen, W. P. & Wu, L. D. (2011). Lubricin: a novel potential biotherapeutic

Bayless, K. J., Meininger, G. A., Scholtz, J. M. & Davis, G. E. (1998). Osteopontin is a ligand

Brockhausen, I. & Anastassiades, T. P. (2008). Inflammation and arthritis: perspectives of the

Burkhardt, H., Rehkopf, E., Kasten, M., Rauls, S. & Heimann, P. (1988). Interaction of

L. A., Kaplan, S. R., Liang, M. H., Luthra, H. S., Medsger, T. A., Mitchell, D. M., Neustadt, D. H., Pinals, R. S., Schaller, J. G., Sharp, J. T., Wilder, R. L. & Hunder, G. G. (1988). The American Rheumatism Association 1987 revised criteria for the classification of rheumatoid arthritis. *Arthritis Rheum*, Vol.31, No.3, (Mar 1988), pp.

approaches for the treatment of osteoarthritis. *Mol Biol Rep*, Vol.38, No.5, (Jan 2011),

for the alpha4beta1 integrin. *J Cell Sci*, Vol.111, No.9, (May 1998), pp. 1165-74, ISSN

glycobiologist. *Expert Rev Clin Immunol*, Vol.4, No.2, (Mar 2008), pp. 173-91, ISSN

polymorphonuclear leukocytes with cartilage in vitro. Catabolic effects of serine

the first sequence (1285-1300) was found distributed throughout the gel; while the second sequence (1373-1385) was only detected in area 3 and 5, areas lower than the lubricin area (Fig. 4B and C). Additionally, several new peptides derived from both *N*-terminus and *C*-termini were found in this study. These fragments could be solely by-products of degenerative joint. Alternatively, these fragments might play a regulatory role. For example, fragments from fibronectin and aggrecan have been reported to correlate with joint diseases (Homandberg et al., 1997a; Homandberg et al., 1997b; Struglics et al., 2009). Interestingly, these two proteins were also found in this study (Table I). It is not clear whether they form a protein complex together with lubricin or just happened to be co-purified with lubricin.

Fig. 8. MS2 spectrum of the peptide derived after desialylated and *O*-glycanase treated lubricin sample searched against UniProt and NCBI human protein database using GPM software. The *m/z* 420.73 is the [M+H]2+ precursor ion. The assigned ID number for the peptide in the GPM database is 3740. This peptide sequence is in the mucin domain of lubricin and is repeated 18 times in lubricin.

Besides two flanking protein domains, in the middle of lubricin there is a mucin-like domain. To our knowledge, this is the first report of a protein sequence within this domain (Fig. 7 and 8). It should be noted that among 59 imperfect repeat units, 18 have a perfect repeating unit of EPAPTTPK (Fig. 7 and 8). *O*-glycanase treatment greatly increased the recovery. As discussed above, proteolytic cleavage sites on lubricin are probably located within the mucin-like domain, and here we show that the mucin domain of lubricin is indeed accessible to proteolytic enzymes. This is probably due to a dispersed *O*glycosylation in contrast to continuous *O*-glycosylation as the heavily bottle-brush-like *O*glycosylation on traditional mucins, which will hinder the access to the cleavage site. Being able to sequence mucin-like domain in lubricin will facilitate the mapping of authentic proteolytic cleavage sites on lubricin *in situ* and to investigate the effect of cytokine regulated proteases on synovial lubricin during joint diseases.
