Attention-aware contrastive learning for predicting T cell
receptor-antigen binding specificity
- URL: http://arxiv.org/abs/2206.11255v1
- Date: Tue, 17 May 2022 10:53:32 GMT
- Title: Attention-aware contrastive learning for predicting T cell
receptor-antigen binding specificity
- Authors: Yiming Fang, Xuejun Liu, and Hui Liu
- Abstract summary: It has been verified that only a small fraction of the neoantigens presented by MHC class I molecules on the cell surface can elicit T cells.
We propose an attentive-mask contrastive learning model, ATMTCR, for inferring TCR-antigen binding specificity.
- Score: 7.365824008999903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It has been verified that only a small fraction of the neoantigens presented
by MHC class I molecules on the cell surface can elicit T cells. The limitation
can be attributed to the binding specificity of T cell receptor (TCR) to
peptide-MHC complex (pMHC). Computational prediction of T cell binding to
neoantigens is an challenging and unresolved task. In this paper, we propose an
attentive-mask contrastive learning model, ATMTCR, for inferring TCR-antigen
binding specificity. For each input TCR sequence, we used Transformer encoder
to transform it to latent representation, and then masked a proportion of
residues guided by attention weights to generate its contrastive view.
Pretraining on large-scale TCR CDR3 sequences, we verified that contrastive
learning significantly improved the prediction performance of TCR binding to
peptide-MHC complex (pMHC). Beyond the detection of important amino acids and
their locations in the TCR sequence, our model can also extracted high-order
semantic information underlying the TCR-antigen binding specificity. Comparison
experiments were conducted on two independent datasets, our method achieved
better performance than other existing algorithms. Moreover, we effectively
identified important amino acids and their positional preferences through
attention weights, which indicated the interpretability of our proposed model.
Related papers
- A large language model for predicting T cell receptor-antigen binding specificity [4.120928123714289]
We propose a Masked Language Model (MLM) to overcome limitations in model generalization.
Specifically, we randomly masked sequence segments and train tcrLM to infer the masked segment, thereby extract expressive feature from TCR sequences.
Our extensive experimental results demonstrate that tcrLM achieved AUC values of 0.937 and 0.933 on independent test sets and external validation sets.
arXiv Detail & Related papers (2024-06-24T08:36:40Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to
Cancer Classification [4.824821328103934]
T cell receptors (TCRs) are essential proteins for the adaptive immune system.
Recent advancements in sequencing technologies have enabled the comprehensive profiling of TCR repertoires.
This has led to the discovery of TCRs with potent anti-cancer activity and the development of TCR-based immunotherapies.
arXiv Detail & Related papers (2023-04-25T20:43:41Z) - T-Cell Receptor Optimization with Reinforcement Learning and Mutation
Policies for Precesion Immunotherapy [21.004878412411053]
T-cell receptors (TCRs) are protein complexes found on the surface of T cells and can bind to peptides.
This process is known as TCR recognition and constitutes a key step for immune response.
In this paper, we formulated the search for optimized TCRs as a reinforcement learning problem and presented a framework TCRPPO with a mutation policy.
arXiv Detail & Related papers (2023-03-02T20:25:14Z) - Pre-training Co-evolutionary Protein Representation via A Pairwise
Masked Language Model [93.9943278892735]
Key problem in protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences.
We propose a novel method to capture this information directly by pre-training via a dedicated language model, i.e., Pairwise Masked Language Model (PMLM)
Our result shows that the proposed method can effectively capture the interresidue correlations and improves the performance of contact prediction by up to 9% compared to the baseline.
arXiv Detail & Related papers (2021-10-29T04:01:32Z) - TITAN: T Cell Receptor Specificity Prediction with Bimodal Attention
Networks [0.5371337604556311]
We propose a bimodal neural network that encodes both TCR sequences and epITopes to enable independent study of capabilities to unseen sequences and transfer/ors.
Tcr-distance-distance neural network exhibits competitive performance on unseen TCRs.
Tcr-distance-distance neural network also exhibits competitive performance on unseen TCRs.
arXiv Detail & Related papers (2021-04-21T09:25:14Z) - Ranking-based Convolutional Neural Network Models for Peptide-MHC
Binding Prediction [15.932922003001034]
identifying peptides that can bind to MHC class-I molecules plays a vital role in the design of peptide vaccines.
We develop two allele-specific CNN-based methods named ConvM and SpConvM to tackle the binding prediction problem.
arXiv Detail & Related papers (2020-12-04T20:40:36Z) - Confidence-guided Lesion Mask-based Simultaneous Synthesis of Anatomic
and Molecular MR Images in Patients with Post-treatment Malignant Gliomas [65.64363834322333]
Confidence Guided SAMR (CG-SAMR) synthesizes data from lesion information to multi-modal anatomic sequences.
module guides the synthesis based on confidence measure about the intermediate results.
experiments on real clinical data demonstrate that the proposed model can perform better than the state-of-theart synthesis methods.
arXiv Detail & Related papers (2020-08-06T20:20:22Z) - Lesion Mask-based Simultaneous Synthesis of Anatomic and MolecularMR
Images using a GAN [59.60954255038335]
The proposed framework consists of a stretch-out up-sampling module, a brain atlas encoder, a segmentation consistency module, and multi-scale label-wise discriminators.
Experiments on real clinical data demonstrate that the proposed model can perform significantly better than the state-of-the-art synthesis methods.
arXiv Detail & Related papers (2020-06-26T02:50:09Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - Supervised Learning for Non-Sequential Data: A Canonical Polyadic
Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks.
To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor.
For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.