AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction
- URL: http://arxiv.org/abs/2407.18184v2
- Date: Thu, 07 Nov 2024 22:33:50 GMT
- Title: AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction
- Authors: Chunan Liu, Lilian Denzler, Yihong Chen, Andrew Martin, Brooks Paige,
- Abstract summary: We introduce a filtered antibody-antigen complex structure dataset, AsEP.
AsEP is the largest of its kind and provides clustered groups.
We propose a novel method, WALLE, which leverages both protein language models and structural modeling from graph neural networks.
- Score: 12.433560411515575
- License:
- Abstract: Epitope identification is vital for antibody design yet challenging due to the inherent variability in antibodies. While many deep learning methods have been developed for general protein binding site prediction tasks, whether they work for epitope prediction remains an understudied research question. The challenge is also heightened by the lack of a consistent evaluation pipeline with sufficient dataset size and epitope diversity. We introduce a filtered antibody-antigen complex structure dataset, AsEP (Antibody-specific Epitope Prediction). AsEP is the largest of its kind and provides clustered epitope groups, allowing the community to develop and test novel epitope prediction methods and evaluate their generalisability. AsEP comes with an easy-to-use interface in Python and pre-built graph representations of each antibody-antigen complex while also supporting customizable embedding methods. Using this new dataset, we benchmark several representative general protein-binding site prediction methods and find that their performances fall short of expectations for epitope prediction. To address this, we propose a novel method, WALLE, which leverages both unstructured modeling from protein language models and structural modeling from graph neural networks. WALLE demonstrate up to 3-10X performance improvement over the baseline methods. Our empirical findings suggest that epitope prediction benefits from combining sequential features provided by language models with geometrical information from graph representations. This provides a guideline for future epitope prediction method design. In addition, we reformulate the task as bipartite link prediction, allowing convenient model performance attribution and interpretability. We open source our data and code at https://github.com/biochunan/AsEP-dataset.
Related papers
- BeeTLe: A Framework for Linear B-Cell Epitope Prediction and
Classification [0.43512163406551996]
This paper presents a new deep learning-based framework for linear B-cell prediction as well as antibody type-specific classification.
We propose an amino acid encoding method based on eigen decomposition to help the model learn the representations of antibodies.
Experimental results on data curated from the largest public database demonstrate the validity of the proposed methods.
arXiv Detail & Related papers (2023-09-05T09:18:29Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - Geometric Deep Learning for Structure-Based Drug Design: A Survey [83.87489798671155]
Structure-based drug design (SBDD) leverages the three-dimensional geometry of proteins to identify potential drug candidates.
Recent advancements in geometric deep learning, which effectively integrate and process 3D geometric data, have significantly propelled the field forward.
arXiv Detail & Related papers (2023-06-20T14:21:58Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - PheME: A deep ensemble framework for improving phenotype prediction from
multi-modal data [42.56953523499849]
We present PheME, an Ensemble framework using Multi-modality data of structured EHRs and unstructured clinical notes for accurate Phenotype prediction.
We leverage ensemble learning to combine outputs from single-modal models and multi-modal models to improve phenotype predictions.
arXiv Detail & Related papers (2023-03-19T23:41:04Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Reprogramming Pretrained Language Models for Protein Sequence
Representation Learning [68.75392232599654]
We propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework.
R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences.
Our model can attain better accuracy and significantly improve the data efficiency by up to $105$ times over the baselines set by pretrained and standard supervised methods.
arXiv Detail & Related papers (2023-01-05T15:55:18Z) - Predicting the Binding of SARS-CoV-2 Peptides to the Major
Histocompatibility Complex with Recurrent Neural Networks [0.40040974874482094]
We adapt and extend USMPep, a proposed, conceptually simple prediction algorithm based on recurrent neural networks.
We evaluate the performance on a recently released SARS-CoV-2 dataset with binding stability measurements.
arXiv Detail & Related papers (2021-04-16T17:16:35Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Bayesian neural network with pretrained protein embedding enhances
prediction accuracy of drug-protein interaction [3.499870393443268]
Deep learning approaches can predict drug-protein interactions without trial-and-error by humans.
We propose two methods to construct a deep learning framework that exhibits superior performance with a small labeled dataset.
arXiv Detail & Related papers (2020-12-15T10:24:34Z) - Interpretable Structured Learning with Sparse Gated Sequence Encoder for
Protein-Protein Interaction Prediction [2.9488233765621295]
Predicting protein-protein interactions (PPIs) by learning informative representations from amino acid sequences is a challenging yet important problem in biology.
We present a novel deep framework to model and predict PPIs from sequence alone.
Our model incorporates a bidirectional gated recurrent unit to learn sequence representations by leveraging contextualized and sequential information from sequences.
arXiv Detail & Related papers (2020-10-16T17:13:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.