**Abstract**

A characteristic of many proteoforms, derived from a single gene, is their similarity regarding the composition of atoms, making their analysis very challenging. Many overexpressed recombinant proteins are strongly associated with this problem, especially recombinant therapeutic glycoproteins from large-scale productions. In contrast to small molecule drugs, which consist of a single defined molecule, therapeutic protein preparations are heterogenous mixtures of dozens or even hundreds of very similar species. With mass spectrometry, currently highquality spectra of intact proteoforms can be obtained only, if the complexity of the mixture of individual proteoform-ions, entering the gas phase at the same time is low. Thus, prior to mass spectrometric analysis, an effective separation is required for getting fractions with a low number of individual proteoforms. This is especially true not only for recombinant therapeutic proteins, because of their huge heterogeneity, but also relevant for top-down proteomics. Purification of proteoforms is the bottleneck in analyzing intact proteoforms with mass spectrometry. This review is focusing on the current state of the art, especially of liquid chromatography for preparing proteoforms for mass spectrometric top-down analysis. The topic of therapeutic proteins has been chosen, because this group of proteins is most challenging regarding their proteoform analysis.

**Keywords:** proteoforms, top-down mass spectrometry, therapeutic proteins, liquid chromatography, protein purification parameter screening, displacement chromatography

## **1. Introduction**

The analysis of proteoforms, often also termed protein species or isoforms, is the next level in proteomics. The first comprehensive definition of this subgroup of proteins was published by Jungblut et al. [1] and Schlüter et al. [2], using the term "protein species". In 2013, Smith and Kelleher [3] introduced the term "proteoform", which today is widely accepted in the community of proteomics experts. The concept of "proteoform" is nearly identical with the concept of "protein species". The only difference is that the proteoform concept is gene-centric and the proteinspecies-concept is chemistry-centric.

For developing methods for comprehensive analysis of proteoforms, the group of therapeutic proteins is a suitable training area. Therapeutic proteins are known to be rich in the number of proteoforms. Although a therapeutic protein product is containing only trace amounts of impurities like host cell proteins, which are difficult to detect because of their very low concentration, the analysis of their proteoforms is very challenging because of their large number, their similarity and their low concentration compared to the main proteoform.
