Abstract
Free full text
Multiple-Allele MHC Class II Epitope Engineering by a Molecular Dynamics-Based Evolution Protocol
Abstract
Epitopes that bind simultaneously to all human alleles of Major Histocompatibility Complex class II (MHC II) are considered one of the key factors for the development of improved vaccines and cancer immunotherapies. To engineer MHC II multiple-allele binders, we developed a protocol called PanMHC-PARCE, based on the unsupervised optimization of the epitope sequence by single-point mutations, parallel explicit-solvent molecular dynamics simulations and scoring of the MHC II-epitope complexes. The key idea is accepting mutations that not only improve the affinity but also reduce the affinity gap between the alleles. We applied this methodology to enhance a Plasmodium vivax epitope for multiple-allele binding. In vitro rate-binding assays showed that four engineered peptides were able to bind with improved affinity toward multiple human MHC II alleles. Moreover, we demonstrated that mice immunized with the peptides exhibited interferon-gamma cellular immune response. Overall, the method enables the engineering of peptides with improved binding properties that can be used for the generation of new immunotherapies.
1 Introduction
Peptides have been used in vaccine formulations to trigger specific immune responses toward a particular disease (1). These peptides, acting as epitopes, are able to bind receptors such as the Major Histocompatibility Complex class II (MHC II) (e.g., Figure 1A ), which displays them on cell surfaces for T-cell recognition and immune-response activation (2). The use of artificial peptides capable of mimicking natural epitopes has also been proposed (3). For example, designed peptides have been used for immunological therapies (4), and as neoepitopes targeting tumor-specific mutations, with the potential for becoming cancer vaccines (5, 6).
Peptides have also been successfully used as adjuvants, namely components of a vaccine which increase the immune response (7–9). A primary example of this is PADRE (Pan DR T-helper Epitope), which was developed as a universal T-helper epitope (10). This peptide helps to trigger the immunological machinery, and complements the specific response generated by the antigenic sub-units (11). PADRE was designed through multiple and expensive experimental trials. The targets of PADRE, and of many other adjuvant peptides, are MHC II receptors.
In humans, MHC II exists in several different alleles, some of which involve mutations localized in the peptide-binding pocket. Consequently, peptides targeting this receptor typically have different binding affinities for different alleles. Multiple-allele T-helpers could be incorporated into vaccines and immunological agents, leading to an improved activity for all human populations (2, 12). For example, PADRE binds with high affinity to multiple alleles, and has been evaluated in vitro and in vivo with positive results (13, 14). However, PADRE has no pathogen specificity (i.e., no information about pathogenic epitopes was used to create its consensus sequence), which limits its capability to target a specific disease. Moreover, designing peptides through a trial-and-error experiments is extremely challenging, given the large amount of possible mutations, and the costs and time required to test potential candidates experimentally (15).
Machine learning models trained on peptide-sequence datasets (16), aid the prediction of binding affinities of epitopes toward different MHC II alleles (17, 18). The IEDB consensus tool (19) combines three methods [CombLib (20), SMM-align (21) and Sturniolo (22)] to score epitopes by comparing against five million random peptide-sequences toward specific MHC II alleles. NetMHCIIPan (23) predicts peptide-binding to any MHC II molecule using artificial neural networks. Machine learning strategies can predict the likelihood of antigen presentation in the context of specific HLA class I (24) or for both class I and II alleles, trained with metadata such as mass spectroscopy (25, 26). Multiple-allele scoring prediction of MHC II epitopes has also been developed using machine learning methods trained with a variety of data (27, 28). However, the extension of these methods to engineer new epitope sequences with improved binding is challenging because large affinity changes can be triggered by single-point mutations, which are difficult to predict (29).
To address this problem, one can resort to de novo design, in which peptides are engineered based on the physico-chemical properties of their interaction with the targets. This route has been successful in the design of antimicrobial and membrane pore-forming peptides (30, 31), or peptides capable of binding with high affinity to organic molecules (32), and of antibody fragments (33). The wide availability of MHC I and II structures (34, 35) has greatly facilitated targeted peptide-binding design. Recently, it has been shown that structural modelling toward MHC I, in combination with NetMHC predictions, brings valuable insights of neoepitopes’ immunogenicity (36). De novo design of MHC I binders targeting multiple alleles has also been attempted (37). Rosetta applications (38) have been used to design epitope scaffolds for neutralizing antibodies (39), and to reduce the immunogenicty of a target protein by searching for its potential epitopes and designing less immunogenic sequences (40). Recently, an energy term for the same software has been customized to deimmunize biological entities toward MHC II (41). However, engineering peptides to have higher affinities toward multiple MHC II alleles is still a major challenge, due to its highly dynamic nature.
In this work, we developed an in silico design approach to address these challenges, and engineer peptides with improved experimental binding affinity toward many MHC II alleles simultaneously. Based on extensive molecular dynamics (MD) simulations, our novel methodology runs multiple designs (42) in parallel, one for each allele, and automatically selects epitope mutations which make the affinity higher for the majority of the alleles, rejecting those which increase the affinity unevenly. Using this approach, we engineered modifications to a Plasmodium vivax (P. vivax) epitope to improve simultaneously its affinity toward several human MHC II (HLA) alleles. A small subset of sequences was tested in vitro, finding four modified peptides with better affinities. To confirm their ability to induce specific immune responses ex vivo, each peptide was mixed with the adjuvant AddaVax and used to immunize C57BL/6 mice. All peptides induced specific in1terferon gamma (IFNγ)-producing cells and, the engineered peptides remained immunogenic. This work will open the route to in silico multiple-allele epitope engineering.
2 Results
2.1 Multiple-Allele Peptide Engineering Protocol
The multiple-allele engineering protocol enables the design of peptides capable of binding with high affinity to multiple targets ( Figure 1 ). Its key innovation is to run in parallel multiple PARCE (42) simulations, accepting mutations that increase the affinity to the majority of targets. The protocol starts with a given epitope sequence, and requires building an atomistic model of the peptide bound to each allele. Figure 1A shows an example of a starting structure for a single MHC II allele. A sequence of single-point mutations is then attempted (see the Methods). MD simulations are ran in parallel, one for each allele with the new mutated peptide ( Figure 1B ). Each MD trajectory is then scored using several scoring functions (see the Methods). The mutation is considered favorable for each allele if the majority of scoring functions provide a positive score. Then, the new peptide sequence is accepted, if it is favorable for at least 75% of the alleles; otherwise, the sequence is rejected ( Figure 1C ). Therefore, the protocol is accepting mutations that not only improve the affinity but importantly reduce the affinity gap between the alleles. The process is iterated until the estimated affinity toward the target does not significantly improve. Figure 1D shows an example of the evolution of the scoring functions, which collectively shift toward better scores for the multiple alleles. An animation of the design process is shown in Supplementary Video 1 for an Influenza A virus peptide.
2.2 Optimization Design of an Influenza Epitope Toward a Single MHC II Allele
We first benchmarked the protocol on the design of an epitope for a single allele of MHC II. As a starting sequence we have chosen an Influenza A virus peptide from the Hemagglutinin antigen, with sequence YPKYVKQNTLKLAT. For this epitope, structural information and binding data toward the allele DRB1*01:01 (IC 50= 130 nM) are available (20). We tested different mutation strategies ( Figure 2A and Supplementary Note 1 ) with 100 attempted mutations per strategy, in order to observe convergence of the scoring functions (see Supplementary Figure 1 ). Uniformly-distributed random mutations in the peptide sequence favour sequences with many hydrophobic amino acids, which are prone to aggregation. This motivated us to exclude sequences with many hydrophobic amino acids or those that violated many empirical peptide-synthesis rules (see Supplementary Note 2 ). We also explored the possibility of enhancing the mutation probability in the peptide core region using bioinformatics derived information ( Figure 2A ).
From the different runs, a total of 105 peptide candidates that are potentially better binders than the original peptide (see Supplementary Tables 1–5 ) were obtained. In Figure 2B , the average rank of each peptide is shown (using the six scoring functions and calculated over 5 ns of MD) as a function of the number of violated empirical synthesis rules and a hydrophobicity scale. We used the average rank, physicochemical and similarity filters (see Methods and Supplementary Note 2 ), to select a small set of 22 candidates for longer (200 ns) MD simulations. By monitoring several structural observables (e.g., the Cα RMSD shown in Figure 2C ), we selected 17 complexes that were more stable. The average rank, calculated using the average scores over the last 100 ns, was also used to select the peptide candidates for experimental testing. Two controls reporting positive binding data toward the MHC II allele (the PADRE epitope AKFVAAWTLKAAA, and an influenza immunogenic epitope PKYVKQNTLKLAT) were also scored with the same procedure (see Supplementary Table 6 for details).
The selected candidates, the original peptide, and the two controls were synthesized and analysed using the ProImmune REVEAL® MHC-peptide binding assay. This experiment assessed the level of binding to the MHC II (HLA) allele DRB1*01:01 ( Figure 2D ). The ProImmune REVEAL® binding score for each MHC-peptide complex is calculated at 0 and 24 hours in comparison to the binding of the positive controls (PADRE and Influenza peptide) at 0 hours. The results are shown in Figure 2E . Interestingly, we found that one peptide (YSKYKKQNTLKAAT -pep8) reported a performance much superior to the original peptide sequence, and even to the PADRE control used in the experiment. This peptide only reports three modified positions with respect to the original sequence. Another peptide, YHVYKKVNTLLCKT (pep9), reported a performance similar to the original peptide, despite being highly modified by six substitutions in both the core and flanking regions. We highlight that these two peptides were identified within a relatively small pool of 17 peptides. In addition, the binding remains stable for these peptides ( Figure 2E ) after 24 hours, which is crucial for MHC presentation (43). From a virtual screening perspective, this is a highly positive result that motivates us to design epitopes for multiple-allele binding.
2.3 Multiple-Allele Binding Engineering of a P.vivax Epitope
We selected the epitope from P. vivax with sequence DYDVVYLKPLAGMYK, which has been assayed against multiple MHC II alleles, and with positive immunological responses in an animal model (44). An advantage of this epitope is that it belongs to the Merozoite Surface Protein (MSP-1), which is also used as a source of epitopes for P. falciparum (45) Because of the available experimental data, and the differences in binding affinities between the MHC II alleles: DRB1*01:01 (IC 50= 1 nM), DRB1*15:01 (IC 50 = 792.9 nM), DRB1*04:01 (IC 50 = 1636.1 nM) and DRB1*03:01 (IC 50 = 17807.9 nM), the sequence is a suitable starting point for the design of a better multiple-allele epitope. Note that this epitope is quite active toward allele DRB1*01:01, but the affinity is several orders of magnitude smaller for the other alleles, being worst for allele DRB1*03:01.
Two alternative mutation protocols were attempted ( Figure 3A ) using the multiple-allele engineering presented in section 2.1 ( Figure 1B ). For the first mutation strategy (strategy 6 in Figure 3A ), we selected peptide positions that were in contact with non-conserved amino acids of the alleles as evinced by a multiple sequence alignment of orthologs of the MSP-1 antigen in Plasmodium species (positions 7, 9, 11 and 13, see Figure 3C ). The motivation for this mutation protocol is that mutations in contact with polymorphic residues of the target might impact binding, without dramatically disrupting the immune response. The second mutation strategy (strategy 7 in Figure 3A ) consisted of modifying amino acids that belong to the flanking regions. This allows the conservation of core interactions, and avoids drastically affecting immunogenicity. In this protocol the design was run by modifying any of the 6 flanking amino acids randomly.
The design for multiple-allele binding was run for both strategies, obtaining 42 designed sequences. Most of the substitutions are found in the C-terminal region, obtaining sequences with a maximum of 4 substitutions from the original peptide. In Figure 3D , the number of alleles with improved scores is shown as a function of the hydrophobicity scale and the number of empirical-synthesis rules violated per peptide. To select a small set of candidates, we also took into account the number of empirical solubility rules violated (see Methods). From these results, a total of 18 designed sequences (yellow circles in Figure 3D ) were prioritized and subjected to MD simulations of 100 ns (see Supplementary Tables 7, 8 ). The long MD was used to calculate an average rank per allele using the individual rank of each scoring function evaluated over the last half of the trajectory ( Figure 3E ). We note that most sequences predicted better affinities for the alleles with the worst experimental IC 50 values (i.e., DRB1*04:01 and DRB1*03:01), which is ideal for balancing binding among the four alleles. We also subjected four known epitopes to the same scoring protocol: the PADRE epitope (AKFVAAWTLKAAA), an influenza immunogenic epitope (PKYVKQNTLKLAT), the Vimentin peptide (SAVRLRSSVPGVR) and the natural CLIP substrate (PVSKMRMATPLLMQA), which were used as controls. According to the average scores, the controls (for most cases) ranked better than the reference epitope (see Supplementary Figure 2 ). With respect to the designed peptides, the controls are ranked in intermediate positions.
The 18 designed candidates and the additional controls were synthesized and analysed using the ProImmune REVEAL® MHC-peptide binding assay to determine their level of binding to the MHC II alleles DRB1*01:01, DRB1*03:01, DRB1*04:01 and DRB1*15:01. A multiple-allele score was measured to evaluate if the peptides improved toward multiple alleles ( Figure 4A ). Of the 18 peptides, the peptide DYCVVYLKPLAGYDN could not be synthesized by the Prospector PEPscreen® technology. This peptide has the highest hydrophobicity scale (highest yellow circle in Figure 3D ), indicating that the pre-selection filters are important. From the remaining set, we found that four of the peptides bound better than the majority of the alleles with respect to the reference epitope. None of the peptides, including the controls, were able to trigger a signal for the allele DRB1*03:01 (see Figure 4A ). This is probably because the affinity of the original reference is too low toward this allele (four orders of magnitude in comparison to DRB1*01:01) and the peptides’ signal is not resolvable with respect to the internal control of the ProImmune experiment. Therefore, we cannot draw conclusions if the designed peptides are better or worse than the reference for this allele.
Three out of the four peptides with better binding affinities were obtained from the strategy where only flanking amino acids were modified. These results suggest that the flanking region is prone to mutations that successfully increase the multiple-allele affinity. The ProImmune REVEAL® scores were also internally weighted based on the potential impact of the peptides in the population (see Supplementary Figure 3 ). Remarkably, we observed that one of the peptides (DYTVVYLKPLAGYDH) is superior to the PADRE epitope toward the studied alleles, which is a major achievement of this study.
Using long MD simulations, we investigated the physicochemical interactions that stabilize the designed peptides. In Figure 4B , we compared the interactions of the reference epitope to those created by the best-performing designed peptide using allele DRB1*01:01. These peptides differ by four mutations: a D3T in the C-terminal, and M13Y, Y14D and K15H in the N-terminal regions. Note that the charge distributions are quite different: -2 charge at the C-terminal and +1 at the N-terminal of the reference in comparison to -1 and 0, respectively, for the best-designed peptide. These differences have a significant impact on binding. The D3T avoids unfavorable negative-negative charge interactions with residue GLU:55 chain A (left side Figure 4B ). The M13Y, Y14D and K15H mutations present a perfect complement to the charge and hydrophobic distribution of the receptor (right side Figure 4B ). We highlight the Y14D replacement where the aspartic acid makes tight salt-bridges with residues LYS:75 or ARG:76 (see Supplementary Note 3 for an analysis of the other designed peptides).
We note that our method is the first to engineer modifications on MHC II epitopes to design neoepitopes for multiple-allele binding. So far, machine learning state-of-the-art methods have been focused on scoring pre-existing sequences (18, 46). We investigated how the ranks predicted with these methods compare with the experimental rank (extracted from the multiple-allele REVEAL® score) of the 18 P. vivax designed peptides. In Supplementary Figure 4A , we present the predicted ranks for NetMHCIIPan 4.0 (23), the IEDB consensus tool (19), and MARIA (25) toward the resolved experimental alleles. Although some of these methods rank a couple of the peptides favorably, their scores are in general poorly correlated to the experiments, and some of the best peptides are not recognized as such. Considering the four best-performing designed peptides as hits, we calculated an enrichment plot: number of hits as a function of the best-ranked peptides for each method (see Supplementary Figure 4B ). We find that our method PanMHC-PARCE has the highest enrichment.
2.4 Ex vivo Experiments
In order to verify if the peptides are able to induce a cellular immune response ex vivo, C57BL/6 mice were immunized with individual peptides in the presence of AddaVax adjuvant, while the control group received only the adjuvant (see Supplementary Figure 5A ). Fifteen days after the last dose, splenocytes from immunized mice were incubated with each peptide to evaluate the number of specific IFNγ-producing cells by ELISpot. For P. vivax engineered peptides (see Supplementary Figure 5B ), we observed that splenocytes from mice immunized with DYDVVYWKPLAGIYK and its reference epitope DYDVVYLKPLAGMYK, presented the highest numbers of specific IFNγ-producing cells (228 and 295 SFU/106 cells, respectively) when compared to groups immunized with peptides DYTVVYLKPLAGYDH, DYTVVYLKPLAGYDN or VYDVVYLKPLAGCYK (127, 54 and 145 SFU/106 cells, respectively). Remarkably, the P. vivax designed peptides also presented cross-reactivity, suggesting that the stimulation across this set of peptides is transferable (see Supplementary Figure 6 ).
The engineered Influenza peptide also retained its ability to induce a specific immune response (see Supplementary Figure 5C ), since YSKYKKQNTLKAAT induced a similar number of specific IFNγ-producing cells compared with the reference epitope YPKYVKQNTLKLAT (473 and 463 SFU/106 cells, respectively). Furthermore, Influenza peptides induced a similar IFNγ response when compared with the previously described PADRE epitope (AKFVAAWTLKAAA). In contrast, the control group showed a negligible specific response. Regarding the controls, we used the Influenza peptide (YPKYVKQNTLKLAT) for the groups that received the different Plasmodium peptides in the immunization. On the other hand, the PADRE peptide was used as a negative control for the groups immunized with the Influenza peptides. Our results showed that there was zero response using these controls. Thus, we were able to observe that the production of interferon-gamma was specific, since no significant interferon-gamma production was observed using uncorrelated peptides (see Supplementary Table 9 ).
We note that the designed peptides were not optimized for the mouse H-2-IAb allele, which differs by at least 10 mutations in the binding pocket with respect to the human alleles assessed above (see Supplementary Figure 7 ). Even if the peptides were not designed for that allele, their quality can be assessed by following the same procedure described in Methods. In particular, we ran long MD simulations of the best P. vivax and Influenza peptides toward the mouse MHC allele and ranked them using the same sampling/scoring approach. Our computational predictions of the rank of these peptides to the H-2-IAb allele correlate relatively well with the ex vivo experimental rank (see Supplementary Table 10 for P. vivax and Supplementary Table 11 for the Influenza design). Together, these results demonstrate that improving the binding of peptides to MHC molecules can conserve the induction of cellular immune responses, even for alleles which are significantly different from the human alleles.
3 Discussion
In this work, we have introduced a computational design protocol to improve the affinity of epitopes toward human MHC II alleles. The key novelty is that the method enables the engineering of peptides with immunological properties for many alleles simultaneously. A first validation performed on a single allele indicated which and how many positions on the peptide can be modified in order to improve peptide-binding. These findings were used to guide a multiple-allele binding design of a reported P. vivax T-helper epitope. The in vitro experiments proved that four designed peptides had a multiple-allelic performance superior to the reference P. vivax epitope. Out of the mutation strategies explored, and for the particular case of multiple-allele optimization, we found that the most promising results are obtained with the mutation strategy that modified the flanking regions (strategy 7 in Figure 3A ), finding 3 of the 4 best performing peptides, including the best. An additional advantage of this strategy is that by modifying the flanking regions, there is a reduced risk of drastically changing the interaction with the T-cell receptor and the immune response pathway.
It was previously demonstrated that the use of binding prediction algorithms allows for the identification of T-cell epitopes that could be used for epitope-driven vaccine design (47). Our contribution may lead to a better choice of epitopes capable of inducing immune responses as well as binding to multiple alleles, covering the majority of the target population (48). Here, we observed cross-species recognition of T-cell epitopes since they were able to bind titin vitro to human MHC II (DRB1*01:01, DRB1*15:01, DRB1*04:01 and DRB1*03:01) and also elicit ex vivo T-cell responses in H-2b mice. Indeed, this phenomenon has previously been described for natural T-cell epitopes from several pathogen-derived antigens (49–51). For example, the P. vivax epitope DYDVVYLKPLAGMYK was able to induce a strong immune response in mice and non-human primates (44). Here, we have shown that C57BL/6 mice immunized with two doses of four engineered peptides based on the DYDVVYLKPLAGMYK epitope presented a potent and specific cellular immune response. In addition, we observed that the predicted DYDVVYWKPLAGIYK peptide induced an IFNγ response similar to that observed in the reference peptide. T-cell immunity is also considered vital for the containment of the spread of influenza infection and for minimizing the period of illness, since antibodies generated after vaccination are not able to keep up with the frequent antigenic drift (52). The modifications introduced in the YPKYVKQNTLKLAT peptide showed an improvement in its ability to bind to human MHC II, as well as in its ability to induce similar production of IFNγ when compared to the reference peptide. Therefore, epitopes were able not only to bind to MHC molecules, but also to induce strong T-cell immunity.
Regarding the protocol, the combination of a consensus strategy based on six scoring functions and molecular dynamics sampling is an efficient alternative to exhaustive computational methods that calculate more accurate but time-consuming observables, such as binding free energies (53). This is necessary given the large number of mutations that must be explored in order to find optimal candidates, as well as to guarantee the convergence of the protocol. In this sense, the better the sampling of the peptide sequence space, the higher the probability of detecting peptides with good experimental properties. The protocol has been optimized for the MHC II system, which is an experimentally well-characterized receptor bound to many substrates (54). The code is fully open source and publicly available to the design of peptide-binders for different targets (not only MHC II) (42). The protocol is sufficiently flexible that additional criteria, such as peptide stability [e.g., for increasing TCR response (55)], could be included in the protocol. Regarding the computational time, the cost is directly proportional to the total MD simulation time, which scales linearly with the number of alleles. Our calculations for the multiple-allele design (4 alleles) for the P. vivax epitope were around 1 microsecond in total. Therefore, if one wants to perform the same number of mutations, the computational cost would be 250 nanoseconds times the number of alleles. We also remark that our approach is “embarrassingly parallel” with respect to the number of alleles, as it is based on performing independent MD simulations. If the user wants to include more alleles, a protocol to model peptides bound to MHC II crystal structures is available in ref (29).
The essence of the design protocol is to exploit structural information and biophysical simulations for exploring, in an unsupervised manner, the peptide sequence and protein-peptide conformations for multiple alleles simultaneously. Machine learning methods have been very successful in identifying whether a predetermined sequence is a potential epitope. These methods have limitations in predicting affinity differences (29), and they are highly dependent on the training set. The results presented in Supplementary Figure 4 highlight that data-based approaches are less accurate than our approach in predicting novel epitopes. Importantly, our protocol is a genuine design scheme, which allowed us to explore novel sequences that otherwise would not have been available, especially those with random changes in the flanking regions of the peptide. However, in the future, for efficiency purposes it might be useful to combine both data-driven and ab initio strategies. A possible direction for future research would be to exploit the efficiency of these tools in a first massive-screening stage, and then use this structural-dynamics protocol for refinement of a smaller set of candidates selected from the previous stage (prior to experimental validation).
Our strategy enables researchers to engineer epitopes for a pathogen of interest, increasing the actionable range of potential antigenic subunits, toward MHC II alleles. In fact, this approach can be generalized to design peptidic adjuvants that can increase the expected cellular response by targeting both MHC II and T-cell receptors simultaneously. Of course the designed peptides must be tested to avoid side-effects associated with molecular mimicry, which are commonly associated with autoimmune events (56). Currently, developing a new vaccine is an expensive, time-consuming and non-trivial process (more so for neglected diseases that lack investment). We consider the multiple-allelic design of great potential in the first steps of epitope optimization for a cheaper and more effective vaccine development.
4 Materials and Methods
4.1 Multiple Allele Engineering Protocol
In the following section, we present a brief description of the multiple-allele protocol (shown in Figure 1B ). To begin with an equilibrated system, the starting complexes are sampled with MD simulations for 100 ns (for details about the MD see the section 4.5). After the initial MD sampling, we generated single-point mutations in the peptide sequence following different strategies for each system. The prediction of rotamers for the mutated amino acids was done with Scwrl4 (57). The program was selected based on a previous assessment of single-point mutation protocols (58). After generating the mutation for all complexes, we performed the following minimization and equilibraton procedure in parallel. First, minimization of the predicted side chain alone is performed with Rosetta (38). In order to relocate overlapping atoms and avoid clashes, a second minimization with Gromacs version 5.1.4 (59) is run with the new amino acid and the water molecules surrounding it within a 2 Å radius. A minimization of the full system is performed with a subsequent NVT equilibration of 100 picoseconds (ps). Finally for each mutation, we performed 5 ns of MD simulations to sample the bound conformations. To compute the score, a snapshot of the trajectory was saved every 100 ps for each complex. This sampling/scoring strategy was previously benchmarked using different MHC II-peptide complexes to define the optimal simulation parameters (54).
The conformations are scored with six scoring functions used for protein-protein and protein-peptide affinity predictions: Haddock (60), Vina (61), a combination of DFIRE and GOAP (DFIRE-GOAP) (62, 63), Pisa (64), FireDock (65), and the BMF-BLUUES scoring combination (66, 67). Details of each scoring function are available in ref (54). For the P. vivax application, we exchanged BMF-BLUUES for IRAD (68), which correlated better with the experimental affinities of the P. vivax epitope to the four alleles (see Supplementary Figure 8 ). Similarly as in ref (69). an average score over all the conformations for each scoring function was used. If the score difference between the previous and current mutation is negative, then the scoring function predicts a good mutation. For each complex, a consensus-by-vote approach is used (see Supplementary Note 4 ), where a mutation is considered favorable if the number of scoring functions that consider it good is three or higher. Finally, the mutation is accepted if the number of alleles that consider it favorable is three of four alleles; otherwise it is rejected. The mutation process is iterated for many attempts.
4.2 Influenza Epitope: Optimization Toward a Single MHC II Allele
We first attempted to design peptides with higher affinity toward a single MHC II (HLA) allele. This phase was relevant in identifying key factors, such as how to perform the mutations, which and how many positions can be modified, and how to optimize the parameters used in the protocol.
4.2.1 Starting MHC II-Epitope System
The starting complex for this optimization was the human MHC II (HLA) allele DRB1:01*01 bound to a peptide of 14 amino acids that is part of the Influenza A virus Hemagglutinin antigen (YPKYVKQNTLKLAT). This sequence has a reported bioactivity of IC 50 = 130 nM from a curated dataset of peptide binders against multiple MHC II alleles (20). The starting crystal structure was that of 1DLH (70) from the Protein Data Bank (PDB) (71), which contains a similar peptide that is missing the tyrosine at the N-terminal flanking region. The missing amino acid was modelled using the Rosetta Remodel functionality (72). This complex was first relaxed using Rosetta with the peptide-protein backbone fixed. Then, it was minimized and NVT/NPT equilibrated using Gromacs version 5.1.4 (59). Afterwards, it was subjected to an MD simulation of 100 ns.
4.2.2 Mutation Strategies and Design
To obtain a diverse and optimal set of peptide candidates, we performed five independent peptide design runs using different mutation strategies. These included the random selection of amino acids or prioritization of residues based on bioinformatic filters and probability matrices (see Figure 2A and Supplementary Note 1 ). The design protocol was run, and resulted in 105 accepted sequences for all runs. To select the sequences for the long MD, we applied an additional filter based on the similarity between each pair of peptides. This was done to avoid the inclusion of very similar sequences and increase the diversity of the library (see Supplementary Note 2 ). 200 ns of MD were performed for 22 peptide candidates bound to the single allele. Using the average rank from the long MD, a total of 17 designed peptides were selected for the competitive binding assay experiments, together with the reference peptide and two positive controls: PADRE and the original Influenza epitope. This allowed the evaluation of a full batch of 20 peptides required for the assay.
4.3 Design Engineering of P. vivax Epitope Multiple-Allele MHC II Binding
4.3.1 P. vivax Epitope
We searched for P. vivax-derived peptides with reported affinities toward multiple alleles and immunological assays available at the IEDB database (19). A mySQL version of the database was downloaded and accessed using 15-mer peptides with reported IC 50 values as queries. From that search, we selected an epitope which is part of the merozoite surface protein 1 (MSP-1) antigen (DYDVVYLKPLAGMYK). The peptide has been tested against four human MHC II (HLA) alleles: DRB1*01:01 (IC 50 = 1 nM), DRB1*03:01 (IC 50 = 17807.9 nM), DRB1*04:01 (IC 50 = 1636.1 nM) and DRB1*15:01 (IC 50 = 792.9 nM) (44). In the same work, immune assays were conducted in mice with positive results.
4.3.2 Natural Variants of the Peptide
Sequences of MSP-1 from different Plasmodium species, including P. vivax, were obtained from the PlasmoDB database (73). These were used to run a multiple sequence alignment with the Clustal Omega program (74) to search for natural variants. The species in the alignment were clustered based on publicly available phylogenetic trees of the protein, given its role as an antigen for malaria vaccine studies (75). In addition, we conducted a search of polymorphisms reported in the region of the P. vivax genome that codes for this protein. The information was obtained from the MalariaGEN project (76). The variants were mapped onto the epitope region, looking for silent or non-synonymous mutations that can provide clues as to which positions of the peptide are more susceptible to modification. A Logo representation of the sequence variants was created using WebLogo3 (77).
4.3.3 Starting Structures and Simulations
The selected P. vivax epitope was modelled bound with the four MHC II allele (HLA) structures: DRB1*01:01 (PDB id 1DLH), DRB1*03:01 (PDB id 1A6A), DRB1*04:01 (PDB id 1J8H) and DRB1*15:01 (PDB id 1BX2). We noted that these PDBs have different peptides bound to the receptor. We modelled the new epitope by aligning the core regions and mutating position-by-position based on each template. We applied the NetMHCIIpan-3.1 tool (23) to predict the 9-mer core region of the epitope to be modelled. The mutations were performed using the fixbb package from Rosetta (78). The modelling of additional amino acids in the flanking region was made with the Remodel package from RosettaCommons (72), with side chain relaxation. Each modelled complex was subjected to 100 ns of MD simulations as described below.
4.3.4 Mutation Strategies
We aimed at performing minimal modifications on the peptide with the hope of not interfering with the epitope’s immunological activity. From the single-allele peptide design phase, we found that it is possible to improve the peptide’s activity with a small number of mutations (e.g., ≤4), and both the core and flanking regions provide valuable sites. Therefore, we defined two design strategies for multiple-allele binding enhancement ( Figure 3A ). The first strategy (strategy 6 in Figure 3A ) consists of mutations performed over natural variants and peptide positions in contact with MHC II polymorphisms. We used the starting MD trajectories to monitor the contacts created between the peptide and polymorphic residues from the MHC II β chain. Using a threshold of 4 Å, we detected amino acids in the core region and in the flanking region interacting with highly polymorphic sites. In addition, we analyzed the multiple sequence alignment of the antigenic region in the P. vivax epitope to identify non conserved residues in the peptide. Based on this analysis, we selected four positions, three in the core and one in the flanking region (stars in Figure 3B ), to mutate during the design protocol. The second strategy (strategy 7 in Figure 3A ) involves modifying only amino acids from the flanking regions, without changing the identified core of the peptide. For both strategies, the probability of generating the new mutation is uniform (i.e., there are no preferential amino acids).
4.3.5 PanMHC-PARCE Design Details
For the design protocol, we used as starting complexes the last frame from the MD simulations of the original P. vivax epitope bound to each of four alleles. We ran 100 attempted mutations for each mutation strategy, obtaining a total of 42 new designed sequences for all runs. We evaluated for how many alleles the affinity is improved in comparison to the reference peptide using the consensus criteria for each individual allele. We also calculated the bioinformatic properties and filters, similar to the single-allele phase (see Supplementary Note 2 : Peptide selection criteria for P. vivax). Using these results, we selected 18 new sequences with the desired properties, and a better predicted rank for multiple alleles. This small set was subjected to long MD simulations of 100 ns for the four alleles. We also ran MD simulations of four additional controls reporting positive binding data toward the MHC II: the PADRE epitope (AKFVAAWTLKAAA), an influenza immunogenic epitope (PKYVKQNTLKLAT), the peptide Vimentin (SAVRLRSSVPGVR) and the natural CLIP substrate (PVSKMRMATPLLMQA). The peptides were modelled onto the MHC II binding sites following the methodology explained in ref (54). The set of designed peptides used in the long MD and the controls were experimentally tested. The code used is publicly available and explained in the Supplementary Note 5 .
4.3.6 Prediction of Selected Peptides Toward the Mouse MHC Allele
The designed sequences from P. vivax and Influenza that were included in the ex vivo experiment were subjected to binding predictions toward the mouse MHC II allele H-2-IAb. We used the same MD/scoring approach implemented with the human alleles for the long MD simulations (see above), but using as reference the mouse MHC II crystal structure with PDB id 1r5v.
4.4 Peptide-Candidate Selection
For each mutation strategy, we performed 100 attempted-mutations. We monitored the evolution of the scores to verify if these attempts were optimizing (i.e., lowering) their values (such as in Figure 1D ). To combine the results from the different design runs, we used the scores calculated from the 5 ns MD simulations to obtain an average rank for each accepted peptide. Specifically, all the accepted peptides were ranked using each scoring function, and the average rank over the six functions was used to prioritize those peptides that had potentially higher affinities.
This rank was used together with three bioinformatics filters to select the candidates for long MD simulations. Two filters consisted of empirical rules to account for solubility and synthesis issues associated with the peptides. The solubility and synthesis rules describe violations raised by certain amino acid patterns found in the peptide sequence (https://bioserv.rpbs.univ-paris-diderot.fr/services/SolyPep/index.html) (79). The higher the number of violations, the lower the probability of synthesizing the peptide. The third filter used a hydrophobic score of the peptide from the Eisenberg hydrophobicity scale defined for proteinogenic amino acids (80). The thresholds for each filter and details of the empirical rules are described in Supplementary Note 2 . After applying the selection criteria, a small set of peptide candidates was subjected to longer MD simulations (see the Methods). The last half of the trajectory was used to calculate the average score for the same six scoring functions used in the design. Using the average of each scoring function, an average rank was calculated and used to re-rank the candidates. This long MD re-ranking was used to select the set of designed peptides for the experiments.
4.5 MD Simulations
Each protein-peptide complex was subjected to MD simulations with previous minimization and NVT/NPT equilibration phases. The system was minimized using the steepest descent algorithm, with 50000 steps and a maximum force threshold of 10 kJ/mol/nm. NVT and NPT equilibrations were performed for 100 ps using position restraints on the heavy atoms of the protein to allow for the equilibration of the solvent. Gromacs version 5.1.4 (59) was used to perform the MD simulations. The Amber99SB-ILDN protein force-field (81) and TIP3P water model (82) were used. The protein was solvated with a cubic box of water with a distance of 8Å from the furthest atom of the protein. After solvation, counterions of Na+ and Cl - were included in the solvent to make the box neutral. The electrostatic interactions were calculated using the Particle Mesh Ewald (PME) method with 1.0 nm short-range electrostatic and van der Waals cutoffs (83). The equations of motion were solved with the leap-frog integrator (84) using a timestep of 2 femtoseconds (fs). The simulation was run using a modified Berendsen thermostat (85) at 350K temperature-coupling, and the Parrinello-Rahman barostat (86). This was done to allow a fast exploration of the conformational space. To maintain the system at this temperature, all the receptor atoms located at a distance greater than 12 Å from any peptide atom were restrained. The atoms from the receptor located at a distance lower than the threshold remained flexible, as well as the peptide.
4.6 Experiments
4.6.1 Rate-Binding Assays
A gold-standard method of rate binding experiments against the MHC II alleles was performed with the Proimmune REVEAL® binding assay. The method uses antibody-labelled peptides that emit a signal if native conformations of the complexes are detected. Consequently, we can verify if a peptide binds to a particular MHC II allele and if the complex remains stable. An illustration of the molecular complex and the emitted signal is shown in Figure 2D . Using a control baseline, provided by Proimmune, a score (between 0 an 100) determines a proxy affinity toward the MHC II allele within two time points, one at 0 hours and a second after 24 hours. The peptides were synthesized using the Prospector PEPscreen® technology with high purity standards based on quality controls obtained by MALDI-TOF mass spectrometry (87).
For the single-allele binding optimization phase, 17 peptides selected were assayed together with the controls against the DRB1*01:01 allele. For the multiple-allele engineering phase, 18 P. vivax engineered epitopes were evaluated for the MHC II alleles DRB1*01:01, DRB1*03:01, DRB1*04:01 and DRB1*15:01. A multiple-allele score was calculated by averaging the scores of each allele, and by weighting each allele based on the their frequencies in the world population. The calculations were provided by Proimmune. These measures were used to evaluate the binding performance of each peptide toward the four alleles simultaneously.
4.6.2 Peptide Synthesis for Ex vivo Experiment
The peptides were synthesized by GenScript USA Inc with more than 75% purity to be tested ex vivo: reference P. vivax (DYDVVYLKPLAGMYK) and four predicted peptides (DYTVVYLKPLAGYDH, DYTVVYLKPLAGYDN, VYDVVYLKPLAGCYK and DYDVVYWKPLAGIYK); PADRE epitope (AKFVAAWTLKAAA); Influenza reference (PKYVKQNTLKLAT) and one predicted sequence (YSKYKKQNTLKAAT). Peptides were resuspended in DMSO (10 mg/mL) and stored at -20 degree Celsius.
4.6.3 Mice and Immunization
Six- to eight-week-old female C57Bl/6 mice were bred at Centro de Desenvolvimento de Modelos Experimentais para Medicina e Biologia (CEDEME) – UNIFESP. All mice were housed in a temperature-controlled, light-cycled facility at the Division of Immunology- UNIFESP. All experiments using mice in this study were approved by the UNIFESP Institutional Animal Care and Use Committee (IACUC) under protocol number 4615161120, and were in accordance with the recommendations of the Federal Law 11.794 (2008), and the Guide for the Care and Use of Laboratory Animals of the Brazilian National Council of Animal Experimentation (CONCEA). For immunization, mice received two doses, at 2-week intervals, with 50μg of each peptide in the presence of AddaVax adjuvant (1: 1 v/v; In vivogen) in a total volume of 100 μL delivered subcutaneous at the base of the tail.
4.6.4 Splenocyte Isolation
Fifteen days after the last dose, mice were euthanized and spleens were aseptically removed. Single cell suspensions were obtained after red blood cells lysis with ammonium chloride potassium (ACK). Cells were then resuspended in R-10 (RPMI supplemented with 10% fetal bovine serum, 2 mM L-glutamine, 1% v/v vitamin solution, 1 mM sodium pyruvate, 1% v/v non-essential amino acids solution, 1% v/v pen strep, 40 μg/mL of gentamicin and 5×105 M 2-mercaptoethanol (all from Gibco). Cell viability and concentration were estimated using a cell counter (CountessTM Automated Cell Counter, Invitrogen).
4.6.5 T Cell ELISpot Assay
IFNγ-producing cells were assessed using Mouse IFNγ ELISPOT kit (BD Bioscience). The procedure was performed according to the manufacturer’s instructions. Briefly, 96-well plates (MAIPS 4510, Millipore) were coated with IFNγ capture antibody and incubated overnight at 4°C. The plates were washed twice and blocked for 2 hours with R10 at room temperature. Splenocytes were incubated for 18 hours at 37°C in 5% CO2 in the presence of each peptide (10μg/mL), Concanavalin A (ConA-2.5 μg/mL; positive control) or R10 (negative control). The plates were washed and incubated with biotinylated anti-mouse IFNγ for 2 hours at room temperature. The plates were then washed and incubated with avidin-HRP for 45 minutes at room temperature in the dark. After extensive washes, the spots were developed with 3-amino-9-ethylcarbazole (AEC) (BD Biosciences) and the number of spots were counted using the AID ELISpot Reader System (Autoimmun Diagnostika GmbH, Germany). The number of IFNγ-producing cells/106 splenocytes was calculated subtracting unstimulated values from stimulated. Statistical significance (p - values) was calculated by One-way ANOVA followed by Tukey honestly significantly different (HSD) post hoc test. Statistical analysis and graphical representation were performed using GraphPad Prism version 7.0 software.
Data Availability Statement
The datasets presented in this study can be found in onlinerepositories and the Supplementary Information. The code repository can be found at: https://hub.docker.com/r/rochoa85/panmhc-parce.
Ethics Statement
The animal study was reviewed and approved by UNIFESP Institutional Animal Care and Use Committee (IACUC).
Author Contributions
RO developed the code, implemented the bioinformatics tools, ran the analysis and wrote the manuscript. VASL and DSR performed the ex vivo experimental studies and reviewed the manuscript. AL and PC assisted in computational analysis, supported the rate binding experiments and wrote the manuscript. All authors contributed to the article and approved the submitted version.
Funding
This work and RO and PC have been supported by MinCiencias, University of Antioquia and Ruta N, Colombia, the Max Planck Society, Germany. This research was also supported by the Sao Paulo Research Foundation [FAPESP, grant number 2017/17471-7]. VSL received fellowship from FAPESP and DSR. from CNPq.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Acknowledgments
We would like to thank Dr. Robinson Ramirez for the advice regarding immunological concepts and experimental design. The simulations were run using the supercomputer CINECA (https://www.hpc.cineca.it/) [and a local server with an NVIDIA Titan X GPU. P.C. gratefully acknowledges the support of the NVIDIA Corporation for the donation of this GPU.
Supplementary Material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu.2022.862851/full#supplementary-material
References
Articles from Frontiers in Immunology are provided here courtesy of Frontiers Media SA
Full text links
Read article at publisher's site: https://doi.org/10.3389/fimmu.2022.862851
Read article for free, from open access legal sources, via Unpaywall: https://www.frontiersin.org/articles/10.3389/fimmu.2022.862851/pdf
Citations & impact
Impact metrics
Citations of article over time
Alternative metrics
Discover the attention surrounding your research
https://www.altmetric.com/details/127374333
Article citations
T Cell Peptide Prediction, Immune Response, and Host-Pathogen Relationship in Vaccinated and Recovered from Mild COVID-19 Subjects.
Biomolecules, 14(10):1217, 26 Sep 2024
Cited by: 0 articles | PMID: 39456150 | PMCID: PMC11505848
Identifying Key Drivers of Efficient B Cell Responses: On the Role of T Help, Antigen-Organization, and Toll-like Receptor Stimulation for Generating a Neutralizing Anti-Dengue Virus Response.
Vaccines (Basel), 12(6):661, 14 Jun 2024
Cited by: 0 articles | PMID: 38932390 | PMCID: PMC11209419
Graph-pMHC: graph neural network approach to MHC class II peptide presentation and antibody immunogenicity.
Brief Bioinform, 25(3):bbae123, 01 Mar 2024
Cited by: 3 articles | PMID: 38555476 | PMCID: PMC10981672
Pangenomic and immunoinformatics based analysis of Nipah virus revealed CD4+ and CD8+ T-Cell epitopes as potential vaccine candidates.
Front Pharmacol, 14:1290436, 14 Nov 2023
Cited by: 1 article | PMID: 38035008 | PMCID: PMC10682379
Connection between MHC class II binding and aggregation propensity: The antigenic peptide 10 of Paracoccidioides brasiliensis as a benchmark study.
Comput Struct Biotechnol J, 21:1746-1758, 18 Feb 2023
Cited by: 0 articles | PMID: 36890879 | PMCID: PMC9986244
Go to all (7) article citations
Data
Data behind the article
This data has been text mined from the article, or deposited into data resources.
Protein structures in PDBe (5)
-
(2 citations)
PDBe - 1DLHView structure
-
(1 citation)
PDBe - 1r5vView structure
-
(1 citation)
PDBe - 1A6AView structure
-
(1 citation)
PDBe - 1BX2View structure
-
(1 citation)
PDBe - 1J8HView structure
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
Is there MHC Class II restriction of the response to MHC Class I in transplant patients?
Transplantation, 73(4):642-651, 01 Feb 2002
Cited by: 12 articles | PMID: 11889447
Toward the prediction of class I and II mouse major histocompatibility complex-peptide-binding affinity: in silico bioinformatic step-by-step guide using quantitative structure-activity relationships.
Methods Mol Biol, 409:227-245, 01 Jan 2007
Cited by: 5 articles | PMID: 18450004
An immunoinformatics-derived DNA vaccine encoding human class II T cell epitopes of Ebola virus, Sudan virus, and Venezuelan equine encephalitis virus is immunogenic in HLA transgenic mice.
Hum Vaccin Immunother, 13(12):2824-2836, 02 Jun 2017
Cited by: 12 articles | PMID: 28575582 | PMCID: PMC5718811
Ii-Key/MHC class II epitope peptides as helper T cell vaccines for cancer and infectious disease.
Front Biosci, 11:46-58, 01 Jan 2006
Cited by: 5 articles | PMID: 16146713
Review
Funding
Funders who supported this work.