Ranking of peptides bound to MHC class II using scoring functions and molecular simulations
Rodrigo Ochoa1, Alessandro Laio2,3, Pilar Cossio1,4
1Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia
2International School for Advanced Studies (SISSA), Trieste, Italy
3The Abdus Salam International Centre for Theoretical Physics (ICTP), Trieste, Italy
4Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
Predict the binding of peptides and proteins is a task of particular interest in areas such as drug discovery and vaccine development. The majority of approaches to predict peptide-protein affinities is through sequence-based models trained with experimental binding data. In this study we focused in the interaction between peptides and MHC class II receptors, a challenging protein system due to the lack of binding information for peptides of variable sizes used to train the predictive models. Here we propose a structure and dynamic-based approach to model peptides bound to a MHC class II structure in order to rank them based on their reported activities.
A dataset of binding experiments between different MHC class II alleles and peptides of different sizes was obtained. From the dataset, we looked for structures in the Protein Data Bank (PDB) of MHC class II receptors in complex with peptides reporting experimental activity. The crystal with PDB code 1BX2 was selected, and used as template to model seven additional peptides through iterative single point mutations using the Rosetta functionalities. Each model was subjected to finite-temperature molecular dynamics simulations of 100 nanoseconds with the Gromacs package. The last part of the converged trajectories was used to obtain equilibrium distributions of the complex conformations in order to test 18 scoring functions created for protein-protein and protein-peptide docking. Among the scoring functions, Haddock, Pisa, Goap and BMF-BLUUES were the functions with better results after averaging the scores of the trajectory snapshots. In addition, the results were improved after applying a threshold consensus approach with a set of scoring functions. At the same time, a rank by pairs between the peptides was tested through a linear and logistic regression models that were fit with numerical values and signs of the activity differences between peptides respectively. Finally, spearman correlations were calculated between the scores and the experimental data.
Given the variability of the scores during the selected trajectories, we split the converged regions into four blocks, and run a bootstrapping approach with 2000 replica to correlate randomly any combination of the scores obtained per block. We found for six scoring functions average spearman correlations above 0.5. This integration of molecular dynamics sampling with different scoring functions allowed us to configure a prediction tool that in consensus has the potential to generate high ranking correlation against available experimental data. Different consensus were tested, but the fitting of a generalized logistic regression model reported a maximum accuracy of 82%, becoming the most promising result for further applications. One advantage of our method is the implementation of a rational protocol able to predict binding ranks, no matter if the sequence space was previously assayed in binding experiments. We also found that depending on the number of amino acid substitutions between the peptides, the required sampling time can increase. In our case, we sampled peptides differing in approximately 10 amino acids during 100 nanoseconds.
The methodology is useful for the rational modelling and prediction of high affinity peptides for MHC class II receptors, with potential application for novel sequences that are not included in the chemical space of available binding datasets.