**2. Analysis of proteoforms: challenges**

The most common method in proteomics is the bottom-up or shotgun approach. It relies on the proteolytic cleavage of proteins by proteases like trypsin. The resulting peptide mixture is subjected to liquid chromatography coupled to tandem mass spectrometry (LC–MS/MS) analysis. Proteins are identified from the LC–MS/MS data by comparing the peptide fragment spectra against in-silico fragment spectra generated from a protein database [4]. As a rule of thumb, a protein is claimed to be identified, if at least two unique peptides are identified representing parts of the sequence. Thus, often a sequence coverage of 100% is not obtained. Consequently, if this is the case, it can be only stated that a product or several products (proteoforms) of a defined gene has been identified. No information about the identity of the underlying proteoform is obtained. It can even be assumed that the identified tryptic peptides may be products of several different proteoforms. For the characterization of a therapeutic protein, bottom-up proteomics is a standard method. The signals in the LC–MS chromatograms represent tryptic peptides of all proteoforms of the therapeutic protein. A defined tryptic peptide, which is present in all proteoforms, will form one single monoisotopic signal. Its signal intensity represents the sum of this peptide from the different species. The presence of an individual proteoform only can be detected, if this proteoform will yield a tryptic peptide, a defined phosphor-peptide, which is unique for this proteoform. However, it cannot be excluded, that there are several proteoforms containing that peptide. As a result, bottom-up proteomics is helpful for getting LC–MS chromatograms which can be used as fingerprints of a therapeutic protein, but will give no information about the number and composition of proteoforms within the therapeutic protein product. The detection of a low abundant proteoform is especially difficult, since a unique tryptic peptide of such a proteoform is present in a low amount and thereby the signal in a bottom-up proteomics LC–MS chromatogram will have a low intensity. Thus, if the detection of different proteoforms is of interest, top-down mass spectrometry (TDMS) is the method of choice, because it utilizes the intact proteoform for analysis instead of proteolytic peptides.

For performing a TDMS analysis, a purified individual intact proteoform is transferred into the MS. From the MS spectrum of the intact ions, the molecular weight can be determined. Various techniques are available for fragmentation of the intact proteoform such as HCD, CID, ETD, ETHcD, ECD, UVPD and IRMPD, yielding different types for fragments, which complement each other [5]. After fragmentation, the proteoform can be identified by interpreting the fragment spectrum. There are several software tools available for analyzing the TDMS intact data [6–8]. The review of Schaffer et al. is recommended as an introduction into TDMS [9]. Robust protocols for mass analysis of intact proteins with TDMS were recently published by Donnelly et al. [10]. TDMS is requiring sample mixtures of low complexity for obtaining high quality spectra of proteoforms. Aebersold et al. estimated the number of proteoforms being present in the human organism in the

**29**

*Preparing Proteoforms of Therapeutic Proteins for Top-Down Mass Spectrometry*

**3. Analysis of proteoforms of recombinant therapeutic proteins:** 

range of approximately a billion [11]. Thus, very efficient purification steps prior to the TDMS are required to tackle the huge number of individual proteoforms in cells and tissues of body fluids. Beside the excessive number of individual proteoforms,

Similar challenges are associated with recombinant therapeutic proteins. The importance of therapeutic proteins has been continually increasing over the past years [12, 13]. Currently, several types of therapeutic proteins [14] are available in the market including monoclonal antibodies (mAbs), erythropoietin (EPO), insulin, human growth hormone and many more. Therapeutic proteins market is dominated by the monoclonal antibodies with sales of approximately \$123 billion in 2017 and will be seen increasing with the upcoming biosimilar market [13]. Therapeutic proteins possess several advantages over small molecule drugs due to their higher specificity towards drug targets, which are in most cases also proteins [15]. This makes therapeutic proteins able to target specific key steps in disease

This group of man-made proteins has presumably a significantly higher number of proteoforms per gene than proteoforms per gene *in vivo*, causing a huge number of proteoforms within a single recombinant therapeutic protein (rTP) product. The heterogeneity is developing during the production of an rTP mainly in the upstream processing. The first event increasing the heterogeneity is alternative splicing [17–19]. The second critical step is the protein biosynthesis at the ribosomes, in which errors can occur. Proteolytic cleavage may happen at any stage after the protein has left the ribosome, not only within the host cell, but also extracellularly, if host cell proteases have not been removed by purification of the target protein. Many therapeutic proteins like conventional monoclonal antibodies or erythropoietin [20] are posttranslationally modified by glycans. Especially, the glycan chains are adding an additional factor multiplying the heterogeneity of proteoforms. An example of a therapeutic glycoprotein is Etanercept, which is decorated with Oand N-glycans. Commercial preparations of Etanercept used as drugs show a very high degree of complexity [21]. It can be assumed that therapeutic fusion proteins applied to patients like etanercept are containing even hundreds of species, which differ in their exact composition of atoms. In addition to glycans, all other forms of posttranslational modifications are possible, depending on the nature of the protein

*Why is the heterogeneity of recombinant therapeutic proteins much higher than the heterogeneity of gene products in-vivo?* Host cells used for the production of recombinant therapeutic proteins are optimized to synthesize a large excess of recombinant proteins [22]. However, increasing the expression of proteins does not usually correlate to increase in the correctly processed bioactive form of the recombinant proteins [22]. Consequently, the probability is increasing, that these overexpressed recombinant proteins are underlying errors during synthesis, side reactions of enzymes and spontaneous chemical reactions. As a result, the number of recombinant species, which have a low quality, is much higher than in a native cell in an intact organism [23]. It was reported that overexpressing recombinant therapeutic proteins is also accompanied by an increase in high molecular weight aggregates and misfolded forms [24]. Thus, it can be assumed that the cellular systems, which usually remove low-quality or incorrectly processed proteins, are swamped by

and the type of the host cells and the upstream parameters.

*DOI: http://dx.doi.org/10.5772/intechopen.89644*

their dynamic range is a further challenge.

**challenges**

pathology [16].

range of approximately a billion [11]. Thus, very efficient purification steps prior to the TDMS are required to tackle the huge number of individual proteoforms in cells and tissues of body fluids. Beside the excessive number of individual proteoforms, their dynamic range is a further challenge.
