*2.3.4 Specific applications*

Y2H has been used extensively to detect PPIs since its conception (**Table 3**). The use of yeast allows for low-cost high-throughput testing of interactions. This technique has now been used to detect interactions between 90% of human proteins [27]. However, Y2H is not suited for ePPI discovery. The expression of human proteins in yeast may result in non-native post-translational

#### **Figure 4.**

*General schematic for the protein-fragment complementation approach. Two proteins suspected of interaction are each tagged with half of a split reporter protein. The reporter activity is detected only if the split reporter is brought together by an interaction between the two tagged proteins.*


**Table 3.**

*Specific protein-fragment complementation approaches covered in this section. These differ primarily in the specifics of the split reporter activity.*

modifications relevant for function, but more importantly, Y2H actively selects against ePPIs because the interaction must occur in the nucleus to drive transcriptional readout.

To complement the classic Y2H approaches and overcome the pitfalls related to ePPIs, several systems specifically targeting membrane proteins have been developed. The membrane yeast two-hybrid (MYTH) [28] and its mammalian counterpart, mammalian membrane two-hybrid (MaMTH) [29] require that at least one protein being tested is anchored to the plasma membrane. Both of these approaches use a split ubiquitin system where one of the two halves is fused to a membrane protein and a transcription factor. Tethering the transcription factor to the membrane protein keeps it out of the nucleus, preventing reporter expression. When the membrane protein interacts with a protein containing the second half of ubiquitin, a cleavage event occurs, releasing the transcription factor to translocate to the nucleus and initiate reporter expression. In combination with targeted libraries, this approach has been used for the high-throughput detection of interactions between receptor tyrosine kinases and phosphatases [30]. However, since these techniques rely on the endogenous ubiquitin machinery for cleavage, they mandate that both binding partners be expressed in the same cells, limiting applicability of these techniques for detection of in-trans interactors.

#### **2.4 Proximity labeling**

#### *2.4.1 Concept description*

Proximity labeling techniques identify possible PPIs by covalently modifying proteins that are in close proximity, typically within a few nanometers. In most cases, the label includes an affinity tag like biotin which allows the labeled proteins to be purified and identified using MS (**Figure 5**).

#### **Figure 5.**

*General schematic of a proximity labeling experiment. A target-of-interest is fused to an enzyme that labels nearby proteins with an affinity tag. That tag can then be used to isolate the proteins for identification.*

*Unbiased Identification of Extracellular Protein–Protein Interactions for Drug Target… DOI: http://dx.doi.org/10.5772/intechopen.97310*

#### *2.4.2 Concept pros*

The different proximity proteomics methods have represented some of the most significant advances in the field of PPI detection, and in particular membrane protein interaction discovery. From the initial development of BioID and its further iterations in BioID2 and TurboID, as well as the more recently developed MicroMap, these techniques have substantially increased the sensitivity for detection of a range of interactions, including weak, transient interactions by translating them into permanent covalent linkages. In addition, proximity proteomics approaches are generally applicable to complex physiological systems and cellular models of interest, and can bypass over-expression of proteins-of-interest and laborious libraries. Furthermore, these approaches also offer the advantage of temporal control, though currently this is typically on the tens of minutes time-scale.

#### *2.4.3 Concept cons*

While proximity labeling typically does not require any special equipment, the unbiased identification of proximal proteins requires access to MS. Since these approaches fundamentally readout proximity, the PPI is inferred. Especially if used in complex physiological contexts, the possibility of identifying neighboring but not directly interacting proteins means that this approach has the greatest challenge when it comes to data interpretability. Additionally, the most popular proximity proteomics methods, BioID and APEX, require expression of the protein-of-interest fused to a bulky tag, followed by over-expression of the fusion protein. Although generally applicable to physiologically relevant systems, experimental conditions may require optimization to ensure that the overall behavior of the target-ofinterest is not altered by tagging or over-expression.

Currently, these techniques have been applied primarily to detect interactions between proteins on the same cells. In many cases, the utility for in-trans interactions remains to be demonstrated.

#### *2.4.4 Specific applications*

The different proximity labeling techniques vary based on what enzymes or chemistries are used to accomplish the labeling. The field of proximity proteomics has been predominantly driven by the development of enzyme-catalyzed proximity labeling. These typically used a promiscuous biotin ligase (from BioID to the much faster TurboID) or a peroxidase (usually APEX or horse radish peroxidase (HRP)) to create a highly reactive biotin species that can only diffuse a short distance before reacting with nearby proteins or water [31]. When a target protein is tagged with one of these enzymes and substrate added, proteins in its vicinity are biotinylated, allowing them to be isolated and identified using MS. The particular techniques differ slightly in their tradeoffs. The peroxidases tend to be more broadly reactive and requiring less labeling time. However, they require the addition of a biotin conjugate and hydrogen peroxide, both of which could be toxic to cells. While the biotin ligases do not have this problem, they are on average slower, though the more recently engineered TurboID can achieve efficient labeling within minutes (**Table 4**) [31].

This concept has been incorporated into specific techniques for identifying ePPIs like selective proteomic proximity labeling assay using tyramine (SPPLAT). SPPLAT uses an HRP-conjugated antibody recognizing a cell surface protein.


#### **Table 4.**

*Specific approaches mentioned in this section for using proximity labeling and cross-linking. These techniques differ in the labeling enzymes that they use (for labeling) or the chemistries of the substrate.*

Since the antibody cannot diffuse across the plasma membrane, it specifically targets ePPIs without any tagging of proteins, and thus enabling studies in unmodified cellular settings [32]. Another approach is enzyme-mediated activation of radical sources (EMARS) which also uses an HRP-conjugated antibody. However, EMARS uses a biotin fused to an aryl azide group giving it a large labeling radius of 200–300 nm, making it more suitable for characterizing entire microdomains rather than ePPIs [33]. Though the use of antibodies has advantages, genetically tagging a protein with HRP can allow these types of techniques to be performed in the physiological context in an organism. For example, the use of a CD2-HRP fusion protein with a membrane-impermeable biotin-phenol allowed the identification of cell-type specific neural protein cross talk in the fly brain [34].

The newest addition to the proximity labeling family is MicroMap, which uses entirely orthogonal chemistry to the existing techniques. MicroMap uses an antibody to detect the target-of-interest, which is then recognized by a secondary antibody conjugated to a photocatalyst. The photocatalyst absorbs blue light to catalyze the activation of a biotin conjugate molecule in its vicinity. This approach uses a more reactive chemical moiety than the biotin ligase or peroxide approaches, which allow for an even smaller radius of labeling and thus, is more likely to detect direct PPIs. Using MicroMap, the authors proposed a new set of putative binders partners for key immune receptors such as PD-L1 [35].

#### **2.5 Direct protein–protein interaction screens**

#### *2.5.1 Concept description*

Direct interaction screens encompass a wide variety of techniques that have several features in common. First, there is a query protein that is the target-ofinterest. Second, the query protein is tested for binding to a library containing possible binding partners presented as recombinant proteins or receptors expressed on cells. Third, a positive signal in the screen directly reads out an interaction between the query and a given binding partner in the library, using detection methods that vary depending on the approach. Major distinguishing factors between the various direct PPI-screening techniques include: the level of multimerization of the target protein (from monomers to oligomeric proteins), the form of the library of binding partners being screened (protein-based vs. cell-based formats), as well as the degree of purification required (purified protein vs. conditioned media) (**Figure 6**).

#### *2.5.2 Concept pros*

This approach typically allows for the opportunity to control most aspects of the screen such as protein concentration and buffer conditions. This approach is also

*Unbiased Identification of Extracellular Protein–Protein Interactions for Drug Target… DOI: http://dx.doi.org/10.5772/intechopen.97310*

#### **Figure 6.**

*General schematic for direction protein interactions screens. The library of proteins can be directly immobilized on a solid substrate or be expressed on cells. Alternative multimerization methods have been developed to present the query protein of interest, reviewed in the text.*

generally amenable to scale-up. Therefore many modern libraries have high coverage of at least specific protein families. The simplicity and the fact that the readout reflects direct PPIs also generally leads to straight-forward data analysis.

#### *2.5.3 Concept cons*

Many of these approaches use purified proteins and may require the use of ectodomains rather than the full-length protein. While the ectodomain is sufficient for binding in many instances, this requirement makes it difficult to identify ePPIs that use multiple ectodomains or transmembrane domains for binding; a behavior documented for the family of seven-transmembrane-domain-containing G proteincoupled receptors (GPCRs). These approaches also typically require that the targetof-interest be screened against a library, which needs to be comprehensive for truly unbiased identification if ePPIs. The generation and maintenance of a large library, either as recombinant proteins or plasmids for expression on cells, can be costly and may require access to automation.

#### *2.5.4 Specific applications*

While published work tends to take advantage of specific combinations of the type of target presentation and the type of library, significant mixing and matching is possible due to the similarity in the overall conceptual framework. Therefore, we will talk about the major types of target presentation and library separately and mention any incompatibilities. Also, since there are a large diversity of library formats, we divided the formats into protein-based libraries and cell-based libraries, though the same target presentation strategies can be used for both (**Table 5**).


#### **Table 5.**

*Lists of target presentations and protein and cell-based library formats covered in this section.*

#### *2.5.4.1 Target presentation methods*

To be able to directly assay interactions, targets-of-interests are typically presented as recombinant protein for this type of approach. While secreted proteins are soluble and can be directly screened, membrane proteins tend to misfold and aggregate if they are extracted from membranes because of their hydrophobic transmembrane domains [12, 13]. Since ectodomains are usually the portion of transmembrane proteins available for direct ePPIs, typically only ectodomains are used for direct ePPI screening.

A number of multimerization approaches that increase query protein avidity and therefore facilitate detection of transient interactions have been developed. In particular, there are three dominant strategies: dimerization induced by fusing ectodomains to the constant Fc region of antibodies [36], pentamerization induced by fusing ectodomains to the rat cartilage oligomeric matrix protein (COMP) [37] and higher order multimerization using small beads with high protein-binding capacity, usually in the form of protein A-coated or streptavidin-coated beads.

While increased multimerization is a major factor for increasing sensitivity with target presentation, the readout method to measure target binding also varies. Using Fc-tagged dimers allows the detection of the target using a variety of secondary antibodies or protein A/G that bind to Fc regions with high affinity [36]. However, using enzymatic readouts can add a high degree of signal amplification that allows for increased sensitivity. Therefore, the approach used to generate the largest ePPI networks to date uses pentamerization combined with an enzymatic β-lactamase colorimetric assay [37–40]. As for bead-based approaches, the specific readout can be magnetic, fluorescent or chemiluminescent depending on the specific screening method used [41, 42].

Lastly, some of the recent high-throughput technologies use conditioned media enriched for the target-of-interest rather than purified proteins. Using conditioned media involves direct capture of secreted protein or protein ectodomains in the absence of protein purification, thus minimizing potential inactivation of the proteins due to purification steps. The use of conditioned media can also save time and resources, helping to make the approaches more accessible to different laboratory and more amenable to scaling up [37–40].

#### *2.5.4.2 Protein-based library formats*

Different protein-based library formats can allow for different levels of throughput and information collected about the binding interactions. The most common and high-throughput approaches are generally qualitative, detecting whether the interaction is present, but not providing quantitative information such as kinetic parameters. One example of this type of library is the protein microarray, which for ePPIs, contains different purified secreted proteins or ectodomains directly spotted on slides. Only small amounts of each protein are used, allowing for the dense tiling of thousands of proteins per slide. The compact format allows slides to be covered with a small volume of fluorescently-labeled target protein, rinsed and imaged using microscopy [43]. While this is a convenient format, the construction of the protein microarrays is often costly because it requires all of the proteins to be purified.

Another type of library for qualitative ePPI identification uses plate-based screening formats. The use of plates allows for the easy addition of proteins and controlled washes without the need for specialized microfluidics. While purified proteins can be used, plate-based formats allow for the direct capture of secreted tagged proteins from conditioned media. Capture of biotinylated proteins using streptavidin-coated plates [37] or Fc-tagged proteins using Protein A-coated plates [38–40] followed by washing allows for the effective purification of library proteins

#### *Unbiased Identification of Extracellular Protein–Protein Interactions for Drug Target… DOI: http://dx.doi.org/10.5772/intechopen.97310*

in wells while adding sensitivity by multimerizing (in the case of multivalent binding of streptavidin to biotin) or capturing already multimerized proteins. This approach also allows the use of enzymatic liquid phase readouts: β-lactamase-based colorimetric assays or luciferase-based luminescence assays which provides an additional degree of signal amplification. The plate-based approach also gives one value per well, allowing for simple data analysis and the greatest interpretability.

While the plate-based approach is generally the most scalable options, other techniques trade some scale for quantitative information on ePPIs. In particular, microfluidics, automation and miniaturization has pushed label-free biophysical techniques to be more high-throughput. For example, the combination of microfluidics and either surface plasmon resonance (SPR) or magneto-nanosensors has increase the scale enough to study all combinatorial interactions between a small number of proteins, making it especially adept at addressing complex cross-talk between small interaction networks [44, 45]. While SPR is the gold standard technique for biophysical characterization of protein interactions and calculation of kinetic parameters, the magneto-nanosensor platform provides higher degrees of sensitivity, and therefore, requiring less material to detect weak ePPIs. However, it requires the use of magnetic nanoparticles conjugated to the target-of-interest. The nanoparticles are are flowed over patches of library proteins printed on magnetonanosensors that detect a change in electrical resistance if a nanoparticle is nearby [45]. Another technique that also provides similar information is biolayer interferometry (BLI) which translates protein binding into a light interference signal. While typically less sensitive than SPR and the magneto-nanosensor platform, BLI excels in its ease-of-use. BLI uses small, disposable sensors that can be coupled to targets-of-interest, typically through the capture of tag like Fc-tags or biotin. The sensors are then simply dipped into wells containing the potential binding partners in solution. With advances in automated and miniaturized BLI setups, it can be used to screen for interactions in high-throughput, provided that libraries of recombinant proteins are available. This technology helped identify the PVR-TIGIT interaction [46] which is mechanistic foundations of the anti-TIGIT immunotherapy [47].

#### *2.5.4.3 Cell-based library formats*

Even though protein-based libraries have many advantages such as storability and easy data interpretation, they can often fail to detect ePPIs because of biochemical challenges associated with membrane proteins. Many membrane proteins lose activity when truncated into soluble ectodomains or extracted from membranes. In addition, the complex cellular membrane environment can provide important protein and non-protein co-factors, orient and cluster membrane proteins and assist in high-order complex formation. Therefore, hard to purify receptors are often screened against cDNA libraries expressing membrane proteins directly on cells. This is especially true for important drug targets like GPCRs and ion channels [48] which have multiple transmembrane domains and typically small extracellular regions.

To screen for interactions using cell-based formats, libraries are used to either induce loss-of-function (lack of binding) or gain-of-function (increased binding). In the loss-of-function approach, possible binders of a target-of-interest are knocked down or knocked out either randomly using chemical mutagens or transposons like gene trap [49] or in a targeted manner with siRNA, zinc-finger nuclease, transcription activator-like effector nuclease (TALEN) or clustered regularly interspaced short palindromic repeat (CRISPR) libraries [50]. When the target-ofinterest is incubated with the cells, the target should not interact if the interaction partner has been depleted. However, it may be difficult to identify cells that bind

the target-of-interest. In addition, the interaction must to be simple enough that knocking down one interaction partner causes a detectable decrease in binding.

To avoid these limitations, an alternative approach is to overexpress receptors that may participate in ePPIs. This is most commonly done using cDNA libraries. The DNA libraries are spotted on slides [51] or plated into wells [52], with each spot or well containing a vector encoding for a different protein. Cells are then added and transfected to induce them to overexpress the protein and display them on the plasma membrane. If the cells are expressing the receptor for the target-of-interest, this can be detected by increased target binding to the surface of the cell. This approach has been successfully utilized to deorphanize secreted factors [53], interactions between immune receptors [54], or identify glycan-dependent recognition of specific ligands [55]. However, the generation and management of cDNA libraries that have significant coverage of membrane proteins can be expensive and not accessible to many investigators. In addition, selective expression of the myriad of possible receptors isoforms that may participate in ligand binding makes truly comprehensive cDNA libraries infeasible. One way to address isoform-specific expression while facilitating library management is to use CRISPR activation (CRISPRa). In this implementation, CRISPR-Cas9 fused to transcriptional activation domains is coupled to guide RNAs selectively targeting cell-surface genes to overexpress receptor proteins. A high coverage CRISPRa guide library targeting most cell surface proteins has been recently utilized to identify novel receptor-ligand interactions [56].

#### **2.6 Computational models**

#### *2.6.1 Concept description*

Computational models cover a large range of concepts that attempt to predict PPIs based on existing knowledge of the biochemistry of protein binding and features of proteins, such as the sequence, conserved residues or structural features.

#### *2.6.2 Concept pros*

Computational models can offer relatively less resource-consuming and faster alternatives to experimental research. They allow for the theoretical exploration of PPIs without regard for experimental challenges related to expression of proteins or development of workflows or platforms. Modern machine learning approaches may also identify unintuitive features that are the most predictive for interactions such as unappreciated modifications. They can also draw from larger pools of information, taking into account protein expression patterns, genetic variations and dysregulation in disease.

#### *2.6.3 Concept cons*

Computational modeling approaches to identify PPIs, not to mention ePPIs, are still in their infancy, with overall low rates of accuracy. Many are based on our existing knowledge of experimentally determined interactions, which may have biases and is incomplete. Approaches that attempt to model binding interfaces are too computationally expensive to be high-throughput even when experimentally determined protein structures exist [57].

#### *2.6.4 Specific applications*

Since computational approaches remain immature for human ePPIs, we will mostly highlight the different computational resources and a few different *Unbiased Identification of Extracellular Protein–Protein Interactions for Drug Target… DOI: http://dx.doi.org/10.5772/intechopen.97310*


#### **Table 6.**

*List of computational resources and approaches covered in this section.*

approaches rather than try to describe a list of the major algorithms. However, this is a rapidly developing area mirroring the explosion in available experimental datasets, including data from all of the approaches mentioned so far as well as expression data for cell types, tissue and now single cells identifying which proteins are at the same places at the same times (**Table 6**) [58, 59].

The increased availability of comprehensive databases for PPIs, and more recently ePPIs, have fueled diverse computational approaches. Efforts like STRING [60] and BioGRID [61] which collects and curates public data on PPIs, are often drawn on for model development and are also important resources for individual researchers looking for the next interaction to drug. There are also many databases that document the progress of specific approaches like BioPlex which contains thousands of human interactions identified by AP/MS [18] as well as the Research Collaboratory for Structural Bioinformatics Protein Data Bank which captures many structures showing the molecular details of PPIs [62]. However, even here ePPIs have posed a challenge because we do not have a definitive list of all proteins that reach the cell surface in various tissues and cell types, though ongoing efforts are trying to experimentally answer that question [59, 63–65]. Recently, the human surfaceome was estimated using a machine learning model to predict the cell surface localization of almost 3000 proteins [66].

Modeling approaches that actually attempt to predict ePPIs range in terms of the types of information they try to account for. While not yet applied to human ePPIs, the use of residue-residue coevolution in combination with structure modeling successfully predicted many ePPIs in bacteria [67]. Another approach, PICTree, focused on the structurally related immunoglobulin superfamily (IgSF) of proteins, using knowledge of family members with known binding partners and sequence conservation to predict new interactions using [68]. Lastly, some approaches use broad information sets about a gold standard set of interactions. For example, FpClass trains their model on everything amino acid makeup to post-translational modifications to expression patterns [69]. However, this still resulted in an estimated false discovery rate of 60%, which shows that while modeling can assist in hypothesis generation, there is more work to be done before modeling would take the place of experimental approaches.

#### **3. Summary**

Extracellular protein–protein interactions are an important set of possible drug targets. They are commonly dysregulated in disease and can be targeted to alter disease phenotypes. Practically, ePPIs are exposed on the cell surface, making them easier to access using therapeutic approaches. However, because of challenges associated with ePPI biochemistry, most membrane proteins and secreted factors do

not have identified interactions. Elucidating the extracellular interaction networks in humans as well as their dysregulation during disease will be key to understand basic biology and fuel new or improved drug development efforts. To tackle this daunting challenge, researchers have applied genetic, chemical, biochemical and computational approaches to come up with an ever-growing list of ePPIs. Here we have reviewed the progress made in the last decade in technologies suitable for the study of ePPIs. In particular, we discuss those approaches that can be applied to the high throughput screening of ePPIs in an unbiased fashion.

### **4. Future perspectives**

As costs are continually falling on readouts like sequencing and mass spectrometry, and as throughput increases with better automation and computational analysis, the future looks bright in the field of ePPI identification. More and more techniques will cross over the categories that we have laid out, finding middle points that balance the various tradeoffs of ease, interpretability, and physiological relevance.

One exciting development that 2020 brought was the release of two largescale efforts using ePPI-optimized pentamer-based direct interaction screening approaches. These efforts each systematically tested hundreds of thousands of pairwise interactions, focusing on the IgSF of single-pass transmembrane proteins, the largest family of secreted and membrane-expressed proteins in the human genome [39, 40]. These large interaction networks identified hundreds of new interactions and present the most extensive ePPI network maps to date.

Once the interactions are found, we need to be able to manipulate them in humans to cure diseases. While not the topic of this chapter, several exciting developments on the drug development front holds much promise for targeting ePPIs. New highly selective inhibitors that recognize the transmembrane domains of protein, such as the isoform-selective inhibitor of the Nav1.7 channel, can provide novel classes of chemical inhibitors of transmembrane proteins to disrupt ePPIs [70]. While cytokines often offer desirable ways to manipulate many immune functions, they are often like playing with fire because of their many disparate effects. However, with improvements in protein design, completely artificial cytokine mimics can now be made which can be highly selective for activities that are desired and counter selected for activities that are not [71].

One major challenge that lies ahead is to not to just identify ePPIs but to identify disease relevant human ePPIs. Along these lines, a recently published map of the IgSF highlighted the power of big data integration, showing that the combination of clinical data with a focus on the protein pair participating in ePPIs gave greater predictive value than each of the proteins alone [39], suggesting that targeting specific ePPIs may be more beneficial than targeting an individual protein. Another challenge is the reliance on animal models. Plasma membrane and secreted factors are some of the least conserved of all proteins [72], having to evolve to adapt to our unique physiology. As more complex human ePPI networks are discovered, it will be a challenge to understand their impacts at the organismal level. Whether it be organoid systems or better functional assays, with the rapid growth in ePPI identification technologies, soon we'll have to find high-throughput ways to ask, what do they do?

#### **5. Executive summary**

• Extracellular protein–protein interactions (ePPIs) make for good drug targets because they control many biological processes and are accessible to therapeutic agents.

*Unbiased Identification of Extracellular Protein–Protein Interactions for Drug Target… DOI: http://dx.doi.org/10.5772/intechopen.97310*

