**3. Analysis of proteoforms of recombinant therapeutic proteins: challenges**

Similar challenges are associated with recombinant therapeutic proteins. The importance of therapeutic proteins has been continually increasing over the past years [12, 13]. Currently, several types of therapeutic proteins [14] are available in the market including monoclonal antibodies (mAbs), erythropoietin (EPO), insulin, human growth hormone and many more. Therapeutic proteins market is dominated by the monoclonal antibodies with sales of approximately \$123 billion in 2017 and will be seen increasing with the upcoming biosimilar market [13]. Therapeutic proteins possess several advantages over small molecule drugs due to their higher specificity towards drug targets, which are in most cases also proteins [15]. This makes therapeutic proteins able to target specific key steps in disease pathology [16].

This group of man-made proteins has presumably a significantly higher number of proteoforms per gene than proteoforms per gene *in vivo*, causing a huge number of proteoforms within a single recombinant therapeutic protein (rTP) product. The heterogeneity is developing during the production of an rTP mainly in the upstream processing. The first event increasing the heterogeneity is alternative splicing [17–19]. The second critical step is the protein biosynthesis at the ribosomes, in which errors can occur. Proteolytic cleavage may happen at any stage after the protein has left the ribosome, not only within the host cell, but also extracellularly, if host cell proteases have not been removed by purification of the target protein. Many therapeutic proteins like conventional monoclonal antibodies or erythropoietin [20] are posttranslationally modified by glycans. Especially, the glycan chains are adding an additional factor multiplying the heterogeneity of proteoforms. An example of a therapeutic glycoprotein is Etanercept, which is decorated with Oand N-glycans. Commercial preparations of Etanercept used as drugs show a very high degree of complexity [21]. It can be assumed that therapeutic fusion proteins applied to patients like etanercept are containing even hundreds of species, which differ in their exact composition of atoms. In addition to glycans, all other forms of posttranslational modifications are possible, depending on the nature of the protein and the type of the host cells and the upstream parameters.

*Why is the heterogeneity of recombinant therapeutic proteins much higher than the heterogeneity of gene products in-vivo?* Host cells used for the production of recombinant therapeutic proteins are optimized to synthesize a large excess of recombinant proteins [22]. However, increasing the expression of proteins does not usually correlate to increase in the correctly processed bioactive form of the recombinant proteins [22]. Consequently, the probability is increasing, that these overexpressed recombinant proteins are underlying errors during synthesis, side reactions of enzymes and spontaneous chemical reactions. As a result, the number of recombinant species, which have a low quality, is much higher than in a native cell in an intact organism [23]. It was reported that overexpressing recombinant therapeutic proteins is also accompanied by an increase in high molecular weight aggregates and misfolded forms [24]. Thus, it can be assumed that the cellular systems, which usually remove low-quality or incorrectly processed proteins, are swamped by

*Proteoforms - Concept and Applications in Medical Sciences*

their low concentration compared to the main proteoform.

proteoform for analysis instead of proteolytic peptides.

For performing a TDMS analysis, a purified individual intact proteoform is transferred into the MS. From the MS spectrum of the intact ions, the molecular weight can be determined. Various techniques are available for fragmentation of the intact proteoform such as HCD, CID, ETD, ETHcD, ECD, UVPD and IRMPD, yielding different types for fragments, which complement each other [5]. After fragmentation, the proteoform can be identified by interpreting the fragment spectrum. There are several software tools available for analyzing the TDMS intact data [6–8]. The review of Schaffer et al. is recommended as an introduction into TDMS [9]. Robust protocols for mass analysis of intact proteins with TDMS were recently published by Donnelly et al. [10]. TDMS is requiring sample mixtures of low complexity for obtaining high quality spectra of proteoforms. Aebersold et al. estimated the number of proteoforms being present in the human organism in the

**2. Analysis of proteoforms: challenges**

For developing methods for comprehensive analysis of proteoforms, the group of therapeutic proteins is a suitable training area. Therapeutic proteins are known to be rich in the number of proteoforms. Although a therapeutic protein product is containing only trace amounts of impurities like host cell proteins, which are difficult to detect because of their very low concentration, the analysis of their proteoforms is very challenging because of their large number, their similarity and

The most common method in proteomics is the bottom-up or shotgun approach. It relies on the proteolytic cleavage of proteins by proteases like trypsin. The resulting peptide mixture is subjected to liquid chromatography coupled to tandem mass spectrometry (LC–MS/MS) analysis. Proteins are identified from the LC–MS/MS data by comparing the peptide fragment spectra against in-silico fragment spectra generated from a protein database [4]. As a rule of thumb, a protein is claimed to be identified, if at least two unique peptides are identified representing parts of the sequence. Thus, often a sequence coverage of 100% is not obtained. Consequently, if this is the case, it can be only stated that a product or several products (proteoforms) of a defined gene has been identified. No information about the identity of the underlying proteoform is obtained. It can even be assumed that the identified tryptic peptides may be products of several different proteoforms. For the characterization of a therapeutic protein, bottom-up proteomics is a standard method. The signals in the LC–MS chromatograms represent tryptic peptides of all proteoforms of the therapeutic protein. A defined tryptic peptide, which is present in all proteoforms, will form one single monoisotopic signal. Its signal intensity represents the sum of this peptide from the different species. The presence of an individual proteoform only can be detected, if this proteoform will yield a tryptic peptide, a defined phosphor-peptide, which is unique for this proteoform. However, it cannot be excluded, that there are several proteoforms containing that peptide. As a result, bottom-up proteomics is helpful for getting LC–MS chromatograms which can be used as fingerprints of a therapeutic protein, but will give no information about the number and composition of proteoforms within the therapeutic protein product. The detection of a low abundant proteoform is especially difficult, since a unique tryptic peptide of such a proteoform is present in a low amount and thereby the signal in a bottom-up proteomics LC–MS chromatogram will have a low intensity. Thus, if the detection of different proteoforms is of interest, top-down mass spectrometry (TDMS) is the method of choice, because it utilizes the intact

**28**

these inadequate proteins [25] and thereby these species will not be processed in the cell or be eliminated. Beside the enzymatic reactions mainly taking place in the upstream-processing, chemical reactions which modify the recombinant therapeutic proteins, can occur during the whole production process including even the final product fill and finish or storage [26, 27]. A very common reaction is the oxidation of methionine, which can happen on nearly every stage of the production and can affect the efficacy of the product.

*Is any risk associated with the large number of species?* Fortunately, severe side effects associated with species, which are not exactly identical with the target protein, have been reported very seldomly. An unfortunate case with dramatic consequences for a few patients was reported from Seidl et al. [28]. In this case, tungsten ions, a contamination which got into the glass vials during the production of the vials, induced the dimerization of erythropoietin. As a result, a few patients developed autoantibodies against erythropoietin, thereby destroying the remaining cells in these patients, which were producing the native hormone. Since a therapy with erythropoietin was not possible any more, these patients had to get blood transfusions for survival. Non-human glycan structures bound to therapeutic proteins, which can occur when producing them in mouse cells, can induce hypersensitivity reactions [29, 30].

More common than severe side effects is the phenomenon that , showing even small differences in their composition of atoms compared with the target species, make the species less potent than the target species. For example, deamidation, causing a + 1 Da shift of the molecular weight, can decrease the efficacy of a therapeutic protein [31], as observed with recombinant human interleukin (rhIL)- 15 [32]. Deamidation converts asparagine or glutamine to aspartic acid or glutamic acid, respectively. As a result, the polar, uncharged amides are changed into negatively charged carboxylic acids, impacting protein surface-charge density and surface hydrophobicity, thereby explaining the change of the efficacy of a therapeutic protein. Deamidation of asparagine can occur spontaneously at physiological pH of 7.4 [32]. A further important modification of proteins is the disulfide bond (S-S), which is formed by the oxidation of thiol groups (SH) between two cysteine residues resulting in a covalent bond [33], which is decreasing the molecular weight of a protein by 2 Da. Disulfide-bonds have an impact on protein stability as well as on activities [33]. Du et al. stated that during the manufacturing process, extensive reduction of antibodies has been observed after harvest operation or Protein A affinity chromatography and multiple process parameters correlate to the extent of the reduction [34]. The topic "disulfide bonds of therapeutic proteins" is in depth discussed by Lakbub et al. [35].

More details about sources and effects of microheterogeneity are described in the excellent reviews of Beyer [36] and Ambrogelly [37].

*How large are the differences of the individual proteoforms of a therapeutic protein?* Proteofroms can vary in all chemical properties known, such as size, isoelectric points (pI) [38] and hydrophobicity [39]. The pIs of recombinant erythropoietin varies from pH 3.5–6 [38, 40]. Therapeutic proteins are characterized by the presence of size variants arising from the manufacturing process or storage conditions when exposed to chemical, physical or conformational stress [41]. These size variants may include the N terminus clipped proteins, truncated forms, fragments representing sub molecular weight species or improperly assembled therapeutic proteins. The formation of dimers or multimers, in which more than two monomers are forming a complex, is a problem, which many therapeutic proteins are associated with [42]. Such aggregates can induce adverse immune responses in patients [43]. The proteoforms of recombinant erythropoietin are varying within a range of 4–6 kDa [20]. Beside these larger differences in size, the composition of

**31**

batches [50].

*Preparing Proteoforms of Therapeutic Proteins for Top-Down Mass Spectrometry*

**4. Separation of proteoforms of therapeutic proteins**

**4.1 Separation of proteoforms of therapeutic proteins with liquid** 

adsorption to the stationary phase, highlighted in bold in **Table 1**.

nant monoclonal antibodies, it is not very relevant.

atoms of many proteoforms derived from one single gene can be very similar within subtypes of proteoforms such as the family of acidic proteoforms. As a result, the separation of charge variants by ion exchange usually is successful but the composition within a single fraction might not only contain one single but also multiple

Liquid chromatography (LC) is the most common for purification and fractionation of therapeutic proteins [37]. The proteoforms are either separated by sizeexclusion (SEC), making use of different path lengths through chromatographic particles related to the size of the proteins, or by adsorption chromatography. The latter is applying the principle of separation of molecules by their different velocities during crossing a column filled with chromatographic particles. The velocities are proportional to the affinities of the molecules towards the stationary phase of the stationary phase. Depending on the chemistry of the functional groups of the stationary phase, different forms of liquid chromatography are possible based on

**Table 1** is giving an overview about the different types of separation methods and their frequency of application with a focus on therapeutic proteins and in addition with respect to proteoforms. The numbers of column 2 compared with column 3 clearly show that the topic of proteoforms is not yet addressed very often. The selected reviews will give deeper insights into the different separation methods. *Affinity chromatography* using chromatographic material derivatized with protein-A is the most common and effective method for the purification of recombinant monoclonal antibodies [45]. For the separation of proteoforms of recombi-

*Ion exchange chromatography (IEX)*: charge variants of therapeutic proteins such as acidic or basic species can be separated with ion exchange chromatography (IEX) [46]. IEX of proteins can be performed with oppositely charged ionic group on the stationary phase as either anion exchange or cation exchange chromatography. Elution buffers are decreasing electrostatic interactions of the proteins with IEX material thereby decreasing the affinity of the protein towards the stationary phase. Elution can be either pH or salt based [47]. Salt-based elution is used for IEX with ultra violet (UV) online detection. Coupling IEX directly with MS is only possible if the elution buffer system is volatile [48]. Acidic species are often related to PTM's like sialic acid or deamidation on asparagine, while basic variants are formed by aspartate isomerization, succinimide formation, variants of C terminal lysine and N terminal glutamine [49]. IEX is giving relative quantitative information about charge variants which can be important for the qualification of manufacturing

*Hydroxyapatite-chromatography (HAP)* is based on a material consisting of the crystals of calcium hydroxyapatite, described by the formula Ca5(PO4)3(OH). HAP can be described as mixed-mode chromatography. The Ca2+ −ions can act via electrostatic interactions as anion-exchanger. Also, metal coordination bonds of carboxylic groups can be formed with the Ca2+ −ions. With the anionic phosphate groups of HAP, positive-charged molecules will be adsorbed by electrostatic interactions. Phosphate-, chloride-ion-, and calcium-ion- gradients are common as well as multi-component gradients [39]. Therefore, finding appropriate eluents

*DOI: http://dx.doi.org/10.5772/intechopen.89644*

proteoforms [44].

**chromatography**

atoms of many proteoforms derived from one single gene can be very similar within subtypes of proteoforms such as the family of acidic proteoforms. As a result, the separation of charge variants by ion exchange usually is successful but the composition within a single fraction might not only contain one single but also multiple proteoforms [44].
