**2.7 Identification of IgE-binding epitopes**

We used the AlgPred server (www.imtech.res.in/raghava/algpred/submission. html), which creates arrays using sequences from known allergens, to identify IgEbinding epitopes and to determine potential allergenicity of proteins based on of their amino acid and dipeptide composition.

*Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*

#### **2.8 Identification of T cell binding epitopes**

We used the ProPred program (Singh et al., 2011) (http://webs.iiitd.edu.in/ragha va/propred/) to analyze the protein sequences of legumes in the study. The analysis was performed with a 2% threshold for the most common human HLA-DR alleles among the Caucasian population: [DRB1\*0101 (DR1), DRB1\*0301 (DR3), DRB1\*0401 (DR4), DRB1\*0701 (DR7), DRB1\*0801 (DR8), DRB1\*1101 (DR5), and DRB1\*1501 (DR2)].

### **3. Results and discussion**

#### **3.1 Sequences obtained from the Allergome database**

We used the Allergome database to retrieve the available sequences of complete proteins of legumes, following the link to UniProt. The legumes included in this study are lentil, lupin, pea, chickpea, and peanut. Only two major allergens (*Gly m 5* and *Gly m 8*) with their available isoforms were extracted from soybean and used as reference to carry out the alignments and further analyses.

The reference proteins, soybean major allergens *Gly m 5* and *Gly m 8* with their isoforms, correspond to profilin, 7 s globulins, and albumin 2 s protein families. The allergen *Gly m 8* is considered to have the highest sensitivity [19], specificity, and reproducibility [20] to clinical reaction to soybean in atopic patients. The combination of *Gly m 5* and *Gly m 8* was suggested as one of the best ways to perform the estimation of the sensitization level and to improve the diagnosis of soybean allergy in children [21]. Thus, in the case of high similarity between the sequences of these soy allergens and the allergens of the other legumes included in this study, the diagnosis of possible cross-reactions between them could be facilitated.

#### **3.2 Alignment of allergen protein sequences**

Sequence alignments were performed to compare the common and differential features between allergen proteins and legumes. Overall, and according to the CODEX Alimentarius Commission in 2003, only proteins with a percentage of identity greater than 50% by local alignment (BLAST) are at risk of allergy or cross-reactivity [22]. Therefore, results obtained from protein–protein alignment beforehand do not show values high enough to make a prediction of possible cross-reactivity between soybean proteins and the rest of the legumes (**Table 2**).

The highest percentage of identity was the result of the alignment between the *Gly m 5* proteins and the *Gly m 5.0301* isoform (**Table 3**) with the *Lup a 1* protein with values of 48.41% and 48.72%, respectively (**Table 2D**). However, these percentages do not exceed the minimum alignment percentage recommended as guidance. Despite this, there are reported cases of cross-reactivity between other proteins with which there is a percentage lower than the standard minimum value considered for crossreactivity and lower than that which occurs between these proteins, as in the case of *Gly m 8* and *Ara h 2* [23], with an identity percentage of 31.46% (**Table 2F**).

The multiple alignment analysis between *Gly m 5* and the isoform *Gly m 5.0301* with the *Lup a 1* protein obtained a percentage of common identity of 35.80% with 207 identical positions (Image 1).


*Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*



*Percentages of Amino Acid Sequence Identity by Alignment of Peanut Species against Reference Soybean Sequences*


*Degree of identity resulting from the alignment of amino acid sequences. These have been obtained by alignment between soybean proteins, used as reference, against different legume species (lentil, chickpea, pea, lupine, and peanut) including major allergens and isoforms.*

#### **Table 2.**

*Percentages of amino acid sequence identity by alignment of different legume species against reference soybean sequences.*


#### **Table 3.**

*Summary of the largest (greater than 3%) and smallest differences as a result of legume–soy protein alignment.*

These data show that the percentage of identity of allergens must be kept in mind to compare allergens and to predict potential allergenicity and cross-reactivity, since not only do sequential epitopes have to be taken into account for that purpose, but also 3D and specific structural conformations of particular allergen proteins must be considered.

Using the information obtained by alignment, some of the proteins in the comparative analysis with soybean could be of interest at the molecular allergy level, such as Lup a delta conglutin and Lup an delta conglutin with percentages of identity with *Gly m 8* and *Gly m 8.0101* ranging from 35 to 36%. It also presents notable alignment

#### *Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*

percentage differences with *Gly m 5* and *Gly m 5.0301* (**Table 2B, D**), with approximately 8% being the most notable difference in identity with respect to the other conglutins. Another candidate protein for analysis is *Lup a delta conglutin* with percentages of identity of 35.63% and 36.25% compared to *Gly 8* and its isoform *Gly m 8.0101*, respectively (**Table 2D**) and *Lup an delta conglutin* of 35.62% and 36.25%, respectively (**Table 2B**). The identity ratios are lower than the minimum value considered to establish cross-reactivity with soybean. However, with such similar percentages among conglutin sequences it is worthy to conduct a deeper analysis. Multiple alignment shows a high rate of conservation between lupin proteins from the different species of *L. albus* and *Lupinus angustifolia*. Comparison of gamma conglutin protein sequences of both species, soybean obtained a low identity percentage of 13–15% compared to *Gly m 5* and 4–5% compared to *Gly m 8* (**Table 2B, D**). Alignment between both conglutins showed an identity of 84.21%, with 128 identical positions and 12 similar positions (**Figure 1**), with an identity value high enough to consider cross-reactivity among them. Indeed, these sequences showed high conservation rate among lupin proteins from different species such as *L. albus* and *L. angustifolia*. The three-dimensional structure of these conglutins will be further analyzed in later sections (**Figure 2**).

Considering the identity percentages previously indicated, the Ara h 2 identity percentage of 31% at *Gly m 8* with demonstrated cross-reactivity and the 48% identity of *Lup a 1* with soybean, we found more cases of proteins with intermediate values. Such is the case of *Pis s 2* with *Gly m 5* and its isoform with an identity of 41.638% (**Table 2E**) and *Cic a 1* with 36.76% and 37.58% identity with *Gly m 5* and its isoform, respectively (**Table 2C**). On the other hand, the characterization of demonstrated cross-reactivity between soybean and peanut, as is the case of *Ara h 1* with *Gly m 5* and its isoform *Gly m 5.0301,* showed a 36.59% and 36.75% identity, respectively [24]. The rest of the alignments show percentages less than the described data of identity range and may be discarded from the depth in their CR study (**Table 2**).

Interestingly, the percentage of alignment identity between soybean isoforms was low, with values less than 1%, specifically, in the alignment of soybean major allergen *Gly m 5* and its isoform *Gly m 5.0301*. The sequences of these two allergens were compared to the rest of the legume proteins considered in this study. We obtained a different percentage of identity of 0.6%, as well as 0.47% when compared *Gly m 8* with *Gly m 8.0101* (**Table 3**). The largest differences were found between soybean isoforms and legumes; *Gly m 5/Gly m 5.0301* was 5.60% against chickpea protein *Cic a 6* (**Table 2C**); 3.65% against pea *Pis s albumin* (**Table 2E**) protein; and *Gly m 8 /Gly m 8.0101* 3.07% against peanut (*A. hypogaea*) protein *Ara h 5.0101* (**Table 2G**). **Table 3** summarizes this data.

The existence of differences between isoforms of other legume species of the same allergen protein family could open the way for new studies finding significant differences in multiple cross-reactivity candidacy. For example, such as the case of *Lup an 1* and *Lup an 1. 0101* with identity differences exceeding 13% in alignment with *Gly m 5*, and ranging between 24.46% and 39.74%, respectively (**Table 2B**). These differences make *Lup an 1* an unsuitable candidate for cross-reactivity, whereas its isoform *Lup an 1.0101* could be a candidate for cross-reactivity with soybean.

#### **3.3 Post-translational modification analysis**

Post-translational modifications affecting the allergen protein sequences have been defined and involved in processes like alcohol or tiol addition (glycosidations), methyl


#### **Figure 1.**

*2D structure of allergen proteins. Multiple alignment of the major Lup a gamma conglutin (*Lupinus albus*) against Lup an gamma conglutin (*Lupinus angustifoluis*) with the secondary sequence represented in yellow by coil zones and in red by helix zones. In addition to the percentage of joint identity, number of identical amino acid positions and number of amino acid have similar physicochemical nature.*

*Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*

**Figure 2.**

*Three-dimensional structural analysis of seed allergen proteins. Figures of first row corresponding to the 3D structures of the Lup a gamma conglutin protein; second row represent different views of Lup an gamma conglutin; and third raw are the figures of the consensus sequence with depicted match regions in pink color over the consensus figure (last row). Red color highlights the alpha-helix and yellow color the beta-strand.*

groups (methylations), phosphates (phosphorylations), carboxyl groups (carboxylations), nitro groups (T-nitrations), or nitroxil groups (S- nitrosylations).

These types of modifications may induce rearrangements in structure, which could indirectly affect lineal and/or conformational epitopes' influence pm molecular allergy, limiting or favoring immunological recognition as well as generating antigenic diversity [25]. It is interesting to analyze location of where these modifications may occur and the type of modification together with the influence of these modifications in the 2D structural elements.

Phosphorylation is considered a factor of change of molecular pH dynamics [26], generating important alterations in the biophysics of the protein [27]. It has been observed sites of phosphorylation in most of the proteins examined: *Gly 5, Gly 8* and their isoforms; *Lup a 1, Lup a* alpha and delta conglutins (*L. albus*); *Lup an 1* and its isoform *Lup an 1.0101, Lup an alpha, Lup an delta* and *Lup an gamma (L. angustifolius)*. In the sequences of *Lup l 4 (L. luteus)* and *Cic a 6 (C. arietinum)* are also abundant modifications as glycosidations which potential importance in the allergenicity behavior of these proteins. In this regard, it has been demonstrated in some cases the increasing immunogenicity [28] for *Gly 5* and *Gly 8*; *Lup a 1, Lup a 4*, *Lup a alpha, delta*, and *gamma conglutins*; *Lup an 1* and it isoform *Lup an 1.0101*, *Lup an alpha* and *gamma conglutins*; *Lup l 4* and *Cic a 6* (**Table 4**).

Methylations are quite less abundant modifications. It is observed that their deficiency generates serious alterations in the functioning of proteins, thus having important implications on their three-dimensional structuring as carboxylation [29]. Only two methylation sites were found: one on *Lup a alpha conglutin* and one on *Lup an alpha conglutin* (**Table 4B**). Carboxylations were found on the *Gly m 8.0101* isoform;

*Lup a alpha, delta,* and *gamma conglutins*; *Lup an 1* and its isoform *Lup an 1.0101;* and *Lup an 3* and *Lup an alpha conglutin* (**Table 4A, B**).

Nitrosylation and nitrations generate strong covalent bonds in the protein structure [30, 31]. Nitrations were found on *Lup a 1*, *Lup a 4*, and *Lup a alpha conglutin*; *Lup a gamma conglutin*, *Lup an 1,* and *Lup an 1.0101*; *Lup an 3.0101*, *Lup an alpha* and *gamma conglutin*; *Lup l 4*; *Cic a 6* and *Ara h 5.0101*. Nitrosylations in comparison were less abundant, found in *Lup a alpha conglutin*; *Lup an 3* and its isoform *Lup an 3.0101*, and *Lup an alpha, delta,* and *gamma conglutins* (**Table 4**).

Post-translational modifications on T-cell epitopes have been found in *Gly m 5*.0301 isoform, a glycosidation at position 351, and a nitration at 172; *Lup a alpha conglutin* presents three methylation sites at positions 199, 448, and 497; *Lup a delta conglutin* contains a glycosidation site at position 76; a nitrosylation site at position 13 was found in *Lup an 3*, while in its isoform a nitration at position 104 and a nitrosylation at position 112 are highlighted; *Lup an delta* conglutin presents a candidate phosphorylation site at position 76 and Cic a 6 a nitrosylation at 107. In other cases, IgE epitopes are affected, with the only case of *Lup a alpha* conglutin with a methylation site at position 102. **Table 5** presents a summary of this data.

The direct implications of these post-translational modifications may be directly linked to the effects on the variation of the structure of these regions, generating differential epitopes recognition and consequently the allergen response.

Analyzing the location and type of modifications could help to elucidate the relationship of protein structure epitope distribution to the allergen potential of the protein, however, it will not be confirmed whether the different modifications would accentuate or lessen the allergenic impact until a clinical review of the process is carried out. The possibility of inducing post-translational modifications on plant proteins as a therapeutic tool is being examined [27].

### **3.4 Secondary structure analysis**

The combined analysis of secondary structure with multiple alignments allows a direct sequence–structure–functional comparation between different allergen proteins. An interesting analysis has been made to identify the areas of allergens with shared mutual domains as part of structural domains with important implications for cross-reactivity potential.

The *Gly m 5, Gly m 5.0301*, and *Lup a 1* secondary structure comparison showed that in sequences of these proteins (**Table 2A**), the percentage of identity with *Lup a 1* was the highest compared to the rest of the alignments performed (**Table 3**). However, the percentage was not potentially enough to induce cross-reactivity. Comparative analysis between the secondary structure predictions of these proteins shows strong similarities in the distribution of α-helix and β-strand over middle regions of the proteins (amino acids 20–430) (**Figure 3**), giving an additional perspective of the possible regions with potential cross-reactivity in addition to the information provided by the alignments.

The three allergen proteins include Cupin superfamily domains with a wide variety of representative enzymes, but notably contains the non-enzymatic seed storage proteins [32]. Functional domains that could be candidates to potentially undergo posttranslational modifications for *Lup a 1* are one of the two barrel domains with antiparallel b-sheets. The first one is a Cupin\_1.1 (**Table 6A**), a candidate for glycosidation (**Table 4B**). Similarly, in the case of *Gly m 5* and its isoform *Gly m 5.0301*, in both proteins where also present these modifications in their globular


*Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*

*Specific amino acids affected by each type of post-translational modification on the different legume proteins: phosphorylation, glycosylation, carboxylation (pyrrolidone carboxylic acid), methylation, nitrosylation, and nitration sites. The () symbol means no results.*

#### **Table 4.**

*Post-translational modifications predicted over legumes.*


*This table summarizes the T-cell and IgE epitopes directly affected by the main post-translational modifications indicating the amino acid number affected.*

#### **Table 5.**

*T-cell and IgE epitopes from allergens affected by post-translational modifications.*

domain (antiparallel β-barrels) (**Table 6A**), which is a candidate to undergo glycosylation (**Table 4A**). In three cases, modifications by glycosidation of one of their functional domains is a shared functional and allergenic feature.

*Lup a gamma conglutin* and *Lup an gamma conglutin* were analyzed. Although they belong to different species of lupin, they showed few differences in alignment and their comparison with soybean proteins of reference (**Table 2B, D**). The identity percentage among them is greater than 50%. These allergen proteins could be considered to exhibit CR, due to sequence identity but also to similarities of their secondary structure (**Figure 1**).

Regarding the predictions of post-translational modifications of these proteins relevant to 2D structural domains, it was found that *Lup a gamma conglutin* can be modified by a potential glycosidation (**Table 4B**). This modification is located in the *Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*


#### **Figure 3.**

*2D structure of allergen proteins. Multiple alignment of the major allergen Gly m 5, its isoform Gly m 5.0301 from (*Glycine max*) and Lup a 1 (*Lupinus albus*) together with the secondary sequence is represented in yellow by coiled-coil zones and in red by helix zones. In addition to the percentage of joint identity, number of identical amino acid positions and number of amino acid have similar physicochemical nature.*


*This table summarizes the protein domains of the different proteins in their different types, specifying the range of amino acids that occupy in alignment.*

#### **Table 6.**

*Functional domains predicted over legumes allergens.*

region of the protein domain xylanase inhibitor C-terminal (**Table 6B**). *Lup an gamma conglutin* has two possible domains affected by post-translational modifications: a phosphorylation and two nitrosylations (**Table 4B**) that affect the region comprised in the C-terminal xylanase inhibitor domain (**Table 6B**) and two nitrosylations (**Table 4B**) over the same domain. It also presents a glycosidation (**Table 4B**) in the xylanase inhibitor N-terminal domain (**Table 6B**).

#### **3.5 Three-dimensional structure analysis**

Analysis of three-dimensional structure of proteins (**Figure 4**) provides insight into their sequence conformation and epitope arrangement. It also helps to determine the consequences of possible structural changes occurring between protein isoforms with minimal or large number of changes (**Table 2**) in their sequences [33].

Post-translational modifications over protein domains also may generate changes in their three-dimensional structure, affecting exposure epitopes and increasing or decreasing their allergenic potential.

Some candidates to examine the three-dimensional structure are *Gly m 5*, *Gly m 5.0101*, and *Lup a 1* that share common barrel domains with alternating folds between the α-helix and β-strand. These domains are in a special conformation, forming a solenoid in which the β-strand is arranged on the inside of the toroid and the α-helix is arranged on the outside in the same domain (**Figures 2** and **5**).

The structural differences observed in the consensus structure between the three structures indicate that in *Gly m 5.0301*, an element of the 2D structure corresponding to a β-strand structural connection is not present in the isoform *Gly m 5*. Neither is it present in *Lup a 1*, which is a specific and important structural feature that can make a *Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*

**Figure 4.**

*3D structural analysis of seed allergen proteins. Three-dimensional structures of the* Gly m 5.0301 *proteins are described, followed by* Gly m 5.0301 *and the change points between the two proteins marked in soft pink color in consensus figure (last row). Red denotes the alpha-helix and yellow denotes the beta-strand. T-epitope location is marked by a blue circle.*

specific conformational epitope (**Figures 4** and **5**). This structural change does not contain any epitope sequence. However, the change found is located between the *Cupin-1* domain of *Gly m 5* and its isoform, whereas this change in *Lup a 1* is located in the Cupin\_1.1 domain (**Table 6A**).

Tridimensional structure comparison between Lup a gamma conglutin and Lup an gamma conglutin result on two principal differences observed between both conglutins, which is an α-helix in the gamma conglutin of *L. albus* that is not present in *L. angustifolius* (**Figure 2**). Regarding post-translational modification sites, in this loop there are no predicted modifications in this region encompassing the N-terminal xylanase inhibitor domain (**Table 6B**).

The 3D analysis was useful to determine other cases of interest previously mentioned, such as *Pis s 2* and *Cic a 1* in comparison with *Gly m 5* and its isoform that showed considerable identity ratios (**Table 2C, E**). *Lup an 1* and *Lup an 1.0101* showed large differences between their identity, and even more differences were found when compared to *Gly m 5*, which is somehow reflected in their 3D structures.

#### **3.6 Identification and analysis of T-cell binding epitopes**

An epitope is the portion of a macromolecule that is recognized by the immune system, specifically the sequence to which antibodies, B-cell receptors or T-cell receptors, can bind to initiate an immune response. Analysis of the epitopes shared for specific allergen proteins could be relevant to identify potential cross-reactivity.

#### **Figure 5.**

*3D structural analysis of seed allergen proteins. Three-dimensional structures of the* Gly m 5 *proteins followed by* Lup a 1 *and representative changes between these two proteins marked in pink in the consensus figure (last row). Red denotes the alpha-helix and yellow denotes the beta-strand.The three-dimensional structure of the proteins* Gly m 5, Gly m 5.0301 *(*Glycine max*), and* Lup a 1 *(*L. albus*) showed a structure with large number of similarities, which is also reflected in the previous analysis of their secondary structure (Figure 3), with two barrel domains common in all of them.*

Presence of common T-cell epitopes among different legume species may support cross-reactivity processes; the greater the probability of occurrence, the larger the number of common epitopes.

The data obtained from the analysis of T-cell epitopes allows us to know which epitopes are shared among allergen proteins in the different legume species and to examine possible cases of cross-reactivity. Thus, in the case of soybean *G. max*, epitopes common to peanut, *A. hypogaea* species and chickpea *C. airietinum* species are described in **Table 7A**. It is remarkable that the soybean protein isoform *Gly m 5.0301* has an epitope in common with *Ara h 9.0101*, while the major allergen *Gly m 5* does not contain this epitope (**Table 7A**). This feature may be related to the crossreactivity between specific sequences and these legume cultivars containing these specific proteins.

On the other hand, the different lupin species show that up to 18 T-cell epitopes are found commonly shared between *L. albus* and *L. angustifolius* (**Table 7B** part 1, 2, 3 and 4). Shared epitopes are also observed between *L. albus* and *A. hypogaea* (four epitopes) (**Table 7B** part 1, 2 and 4); *A. duranensis* (one epitope), *C. arietinum* (same number of epitopes) (**Table 7** part 1). Comparison with *L. angustifolius* showed three epitopes commonly shared with *A. hypogaea* (**Table 7B** parts 2, 3 and 4), and one epitope with *C. arietinum* and *L. culinaris* (**Table 7B** part 3).

*Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*

Among these allergen proteins, there are also epitopes shared more than one time among more than two species. The same epitope is shared among the allergenic proteins: *Lup a 4* with *Ara h 8.0101* and *Cic a 4* (**Table 7B** part 1); *Lup an alpha conglutin*, *Lup an 3.0101, Ara h 3*, and *Ara h 3.0201* (**Table 7B** part 4). the most shared epitope was between *Lup an 3*, *Lup an 3.0101*, *Ara h 9.0101*, *Ara h 17*, *Cic a 3*, *Len c 3*, and *Len c 3.0101* (**Table 7B** part 3).

Prediction of secondary and tertiary structures allowed us to determine the spatial location of epitopes in proteins and to assess whether they may be affected in their spatial arrangement by post-translational modifications in protein domains over interest proteins.

*Gly m 5*, *Gly m 5.0301*, and *Lup a 1* analysis also showed that T-epitope regions founded over these proteins integrate part of the functional barrel domains of these proteins. In the case of *Gly m 5*, a single T-epitope (**Table 6A**) is located in the region of the structural domain between β-strands (**Figure 5**). This region is located into Gly m 5-barrel domain (Cupin\_1) (**Table 6A**) in the amino acidic region located close to the site of glycosidation (**Table 5A**). This structural epitope is of special interest by its specificity, location, and potential specific allergenicity induced by this protein.

The T-cell epitopes analyzed on *L.* gamma conglutins resulted in the presence of two epitopes on the C-terminal xylanase and one on the N-terminal xylanase domain of *L. albus* (**Table 6B**, **Table 7B** part 1and 2) and one over N-terminal xylanase domain of *L. angustifolius* (**Table 6B** and **7** part 1). These are not directly or proximally affected by post-translational modifications, but they do affect the domains in which they are located.

Therefore, epitopic regions matched between *L. albus* and *L. angustifolius* conglutin, which are the most abundant compared to other epitopes (**Table 7B**). This supports the idea of conservation of protein structures and evidences the data found by simple comparative alignment.

#### **3.7 Identification and analysis of IgE-binding epitopes**

The IgE antibodies are produced by immune B cells, which in turn are stimulated by T cells responsible for recognizing the epitope in a sensitization step. To trigger the allergen inflammatory process, IgE antibodies stimulate the release of histamines. Thus, the recognition of these sequences allows for predicting the recognition capacity of IgE antibodies and whether they will potentially trigger the allergenic response (**Figure 6**).

The analysis of the allergenic nature of the protein based on amino acid and dipeptide analysis composition has been used for the assessment of the above proteins. It is noticeable that the 30cases with clinically confirmed allergenic epitopes are predicted by their sequence to have an allergenic nature, as is the case of *Gly m 8* (**Table 8B**), *Ara h 13.0102*, and *Ara h 15.0101* (**Table 8**: D). Other potential allergens are *Lup a 4* (**Table 8A**), *Lup an 3* and *Lup an 3.0101* (**Table 8A**) and *Lup an delta conglutin*; *Pis s 3, Pis s 3.0101, Pis s 6, Pis s agglutin* and *Pis s albumin* (**Table 8B**); *Ara h 5, Ara h 5.0101* (**Table 8C**), *Ara h 8, Ara h 8.0101, Ara h 8.0102* (**Table 8D**); as both: 43 *Lup l 4* (**Table 8A**); *Ara h 17* (**Table 8D**) and *Cic a 3* (**Table 8C**).

Other proteins assessed as ambiguous or non-allergenic even though they present bibliographic and clinical antecedents of being allergenic include *Lup a gamma* conglutin [34] and *Lup an gamma conglutin* [35] (**Table 8A**); *Ara h 10.0101* [36], *Ara h 11.0101*, and *Ara h 11.0102* [37]; and *Ara i 2.0101* and *Ara i 6.0101* [38] (**Table 8C**).


**Allergen name T-cell epitopes FQRLNALEP LRCAGVALS IRVLERFDQ FGPLRRCN VVLNGRATITI IVRNIKGKN** Lup an 1.0101 80% (IRVLERFNQ) 204–213 248–260 Lup an alpha conglutin 86–94 115–123 286–294 Lup an delta conglutin 191–198 Ara h 1 80% (IRVLQRFDQ) 204–212 Ara h 1.0101 80% (IRVLQRFDQ) 193–201 **Allergen name T-cell epitopes IVRVSREQI IRVNKHM VRRVRRPH WRISDEN B part 3** Lup a 1 302–310 Lup a alpha conglutin 355–363 Lup a gamma conglutin 318–326 412–420 Lup an 1 77% (IVRVSKKQI)373– 381 Lup an 1.0101 77% (IVRVSKKQI) 373– 381 Lup an 3.0101 360–367 Lup an delta conglutin 88% (IRVNKHL) 324– 332 88% (WRISSEN) 421– 429 **Allergen name T-cell epitopes FPILGWLGL FVIPAGYPI FVPYYNVNA YVLNGSAWF YVAFKTNDI YKFLVPPPQ B part 4** Lup a 1 433–442 Lup a 4 Lup a alpha conglutin 411–418 432–444 445–452 493–501 542–550 Lup an 3.0101 88.88% (FPILRWLGL) 413–421 434–442 447–455 495–503 544–552 Ara h 3 77% (FVPHYNTNA) 404–412 Ara h 3.0201 77% (FVPHYNTNA) 454–465

*Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*


*This table lists the T -cell epitopes shared on at least two occasions by different species, describing the range of amino acids in which they are located and the percentage of identity with the epitope in the case in which identity is not exact.*

#### **Table 7.**

*Range of amino acids occupied by T-cell epitopes joint over legumes.*

*Gly m 5*, *Gly m 5.0301*, and *Lup a 1* have shown that the IgE epitopes found on these proteins are part of the functional barrel domains of these proteins. In *Lup a 1* protein, two epitopes are located in the Cupin\_1.1 domain, which is not affected by posttranslational modifications; soybean proteins *Gly m 5* contain an IgE-epitope inside the *Cupin\_1* domain, moreover *Gly m 5.0301* also contains the same epitope in the same region and in different positions having no modifications. However, *Gly m 5.0301* does contain epitopes directly affected by glycosidation, within the structural *Cupin\_1* domain, an epitope at position 351 (**Table 5A, 6A** and **9A**).

The clinically proven epitopes found in the sequence analysis allowed us to observe how many and to what extent IgE epitopes are shared between proteins of different species and to assess potential cross-reactivity. According to the results, some of the

**Figure 6.** *Summary of the epitope recognision process.*

*Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*



*The table summarizes the predictions about the allergenic potential of proteins based on the amino acid and peptide composition. The signal () means that the protein has clinically proven epitopes.*

#### **Table 8.**

*Allergenic legume character prediction.*

candidate species and proteins for cross-reactivity with soybean (*G. max*) are the peanut (*A. hypogaea*) with three IgE epitopes commonly shared; lupin (*L. albus*) with one epitope in common (**Table 9A**). These findings are supported by bibliographic



*This table summarizes the IgE epitopes clinically confirmed in different species, and the accuracy percentage of these epitopes found according to the protein sequence.*

#### **Table 9.**

*IgE epitopes shared between different legume species.*

reports [38]. It is also found that *L. albus* shares four epitopes with *A. hypogea* and *L. angustifolius* (**Table 9A**), and other two with *A. hypogea*. Looking at other cases, it is observed that in close species such as peanut, species such as *A. duranensis* and *A.*

*Comparative Analysis of Molecular Allergy Features of Seed Proteins from Soybean… DOI: http://dx.doi.org/10.5772/intechopen.106971*


*This table summarizes the IgE epitopes clinically confirmed in different species, and the accuracy percentage of these epitopes found according to the protein sequence.*

#### **Table 10.**

*IgE epitopes shared only by same legume species.*

*hypogea* shared ten common epitopes (**Table 9B, C**), similarly to *Lupinus* finding four epitopes in common (**Table 9A, B**).

In addition, shared T-cell epitopes have been found among species that do not include soybean such as *L. albus* and *L. angustifolia* (**Table 9**: AB), but not found in *L. luteus*; *A. hypogaea* (**Table 9A**-**D**), and *A. duranensis* (**Table 9B**-**D**); *C. arietinum* (**Table 9A, B**); and *P. sativum* (**Table 9A**). These epitopes have been identified as relevant epitopes in previous studies on sensitizations between allergens of different species with similar structure and sequence leading to the development of allergic cross-reactions [38, 39].

An interesting fact is that different isoforms of the same protein may or may not present the same IgE epitope and, in the case of having it, it does not necessarily have the same degree of similarity. Establishing a relationship with the information obtained in the alignments, we can conclude that the small differences observed in the sequence between isoforms of the same protein can be key to conformation and epitopes presence (**Table 10**).

### **4. Conclusions**

This chapter presented a study of functional and allergenic features of legume seed proteins.

Analysis of allergenic legume proteins legume as well as all available isoforms allowed for extracting shared epitopes that can be linked to cross-reactivity processes among the eight studied species (*G. max, A. hypogaea, L. albus, L. angustifolius, A. duranensis, C. arietinum, P. sativum,* and *L. culinaris)*. Shared epitopes were not found with soybean or with the rest of the legume allergens examined from *A. duranensis*.

Small differences in the amino acid sequences (less than 1%) of the same allergen isoforms implied important changes in epitopic conformation and sequences of T-cell and IgE recognizable epitopes. Small differences in amino sequences of isoforms from the same inferred changes over 2D and 3D structure conformation that may affect

functional protein domains. Post-translational modifications allowed identification of possible phosphorylation, glycosylation, carboxylation, methylation, nitrosylation, and nitration sites in protein functional domains, near or directly located in different type of epitopes with potential influence in allergenic response.

Primary sequence alignments together with three-dimensional protein modeling allowed to study the conservation of proteins as conglutin gamma proteins among different *Lupinus*. species, assessing also their potential allergenicity.

The changes described close to the sequence or related to spatial distribution of the epitopes may involve potential alterations on protein allergenicity.

Obtaining reliable clinical data on legume allergies in developing countries could be helpful in clarifying whether the increase in food allergies is actually due to poor dietary habits and increasing industrialization processes.

Further studies on the characterization of more allergenic proteins, including isoforms of major allergens already described, not only sequential but also threedimensional conformational epitopes, can be a great advancement for the prevention of cross-reactivity and the improvement of knowledge of allergies produced by legumes, which in turn could promote the introduction of this food as a substitute for other foods of lower nutritional quality and with greater environmental impact.
