tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity
- URL: http://arxiv.org/abs/2406.16995v2
- Date: Wed, 04 Dec 2024 14:33:44 GMT
- Title: tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity
- Authors: Xing Fang, Chenpeng Yu, Shiye Tian, Hui Liu,
- Abstract summary: Anti-cancer immune response relies on bindings between T-cell receptors (TCRs) and antigens, which elicits adaptive immunity to eliminate tumor cells.
In this study, we introduce a lightweight masked language model, termed tcrLM, to address this challenge.
We construct the largest TCR CDR3 sequence set with more than 100 million distinct sequences, and pretrain tcrLM on these sequences.
The results demonstrate that tcrLM not only surpasses existing TCR-antigen binding prediction methods, but also outperforms other mainstream protein language models.
- Score: 4.120928123714289
- License:
- Abstract: The anti-cancer immune response relies on the bindings between T-cell receptors (TCRs) and antigens, which elicits adaptive immunity to eliminate tumor cells. This ability of the immune system to respond to novel various neoantigens arises from the immense diversity of TCR repository. However, TCR diversity poses a significant challenge on accurately predicting antigen-TCR bindings. In this study, we introduce a lightweight masked language model, termed tcrLM, to address this challenge. Our approach involves randomly masking segments of TCR sequences and training tcrLM to infer the masked segments, thereby enabling the extraction of expressive features from TCR sequences. To further enhance robustness, we incorporate virtual adversarial training into tcrLM. We construct the largest TCR CDR3 sequence set with more than 100 million distinct sequences, and pretrain tcrLM on these sequences. The pre-trained encoder is subsequently applied to predict TCR-antigen binding specificity. We evaluate model performance on three test datasets: independent, external, and COVID-19 test set. The results demonstrate that tcrLM not only surpasses existing TCR-antigen binding prediction methods, but also outperforms other mainstream protein language models. More interestingly, tcrLM effectively captures the biochemical properties and positional preference of amino acids within TCR sequences. Additionally, the predicted TCR-neoantigen binding scores indicates the immunotherapy responses and clinical outcomes in a melanoma cohort. These findings demonstrate the potential of tcrLM in predicting TCR-antigen binding specificity, with significant implications for advancing immunotherapy and personalized medicine.
Related papers
- DapPep: Domain Adaptive Peptide-agnostic Learning for Universal T-cell Receptor-antigen Binding Affinity Prediction [38.358558338444624]
We introduce a domain-adaptive peptide-agnostic learning framework DapPep for universal TCR-antigen binding affinity prediction.
DapPep consistently outperforms existing tools, showcasing robust generalization capability.
It proves effective in challenging clinical tasks such as sorting reactive T cells in tumor neoantigen therapy and identifying key positions in 3D structures.
arXiv Detail & Related papers (2024-11-26T18:06:42Z) - TopoTxR: A topology-guided deep convolutional network for breast parenchyma learning on DCE-MRIs [49.69047720285225]
We propose a novel topological approach that explicitly extracts multi-scale topological structures to better approximate breast parenchymal structures.
We empirically validate emphTopoTxR using the VICTRE phantom breast dataset.
Our qualitative and quantitative analyses suggest differential topological behavior of breast tissue in treatment-na"ive imaging.
arXiv Detail & Related papers (2024-11-05T19:35:10Z) - TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation [6.920411338236452]
T-cell receptors (TCRs) play a crucial role in the immune system by recognizing and binding to specific antigens presented by infected or cancerous cells.
Language models, such as auto-regressive transformers, offer a powerful solution by learning the probability distributions of TCR repertoires.
We introduce TCR-GPT, a probabilistic model built on a decoder-only transformer architecture, designed to uncover and replicate sequence patterns in TCR repertoires.
arXiv Detail & Related papers (2024-08-02T10:16:28Z) - Using Multiparametric MRI with Optimized Synthetic Correlated Diffusion Imaging to Enhance Breast Cancer Pathologic Complete Response Prediction [71.91773485443125]
Neoadjuvant chemotherapy has recently gained popularity as a promising treatment strategy for breast cancer.
The current process to recommend neoadjuvant chemotherapy relies on the subjective evaluation of medical experts.
This research investigates the application of optimized CDI$s$ to enhance breast cancer pathologic complete response prediction.
arXiv Detail & Related papers (2024-05-13T15:40:56Z) - T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to
Cancer Classification [4.824821328103934]
T cell receptors (TCRs) are essential proteins for the adaptive immune system.
Recent advancements in sequencing technologies have enabled the comprehensive profiling of TCR repertoires.
This has led to the discovery of TCRs with potent anti-cancer activity and the development of TCR-based immunotherapies.
arXiv Detail & Related papers (2023-04-25T20:43:41Z) - T-Cell Receptor Optimization with Reinforcement Learning and Mutation
Policies for Precesion Immunotherapy [21.004878412411053]
T-cell receptors (TCRs) are protein complexes found on the surface of T cells and can bind to peptides.
This process is known as TCR recognition and constitutes a key step for immune response.
In this paper, we formulated the search for optimized TCRs as a reinforcement learning problem and presented a framework TCRPPO with a mutation policy.
arXiv Detail & Related papers (2023-03-02T20:25:14Z) - Exploiting segmentation labels and representation learning to forecast
therapy response of PDAC patients [60.78505216352878]
We propose a hybrid deep neural network pipeline to predict tumour response to initial chemotherapy.
We leverage a combination of representation transfer from segmentation to classification, as well as localisation and representation learning.
Our approach yields a remarkably data-efficient method able to predict treatment response with a ROC-AUC of 63.7% using only 477 datasets in total.
arXiv Detail & Related papers (2022-11-08T11:50:31Z) - Reprogramming Pretrained Language Models for Antibody Sequence Infilling [72.13295049594585]
Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency.
Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance.
In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data.
arXiv Detail & Related papers (2022-10-05T20:44:55Z) - Attention-aware contrastive learning for predicting T cell
receptor-antigen binding specificity [7.365824008999903]
It has been verified that only a small fraction of the neoantigens presented by MHC class I molecules on the cell surface can elicit T cells.
We propose an attentive-mask contrastive learning model, ATMTCR, for inferring TCR-antigen binding specificity.
arXiv Detail & Related papers (2022-05-17T10:53:32Z) - MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response [58.0291320452122]
This paper aims at a unified deep learning approach to predict patient prognosis and therapy response.
We formalize the prognosis modeling as a multi-modal asynchronous time series classification task.
Our predictive model could further stratify low-risk and high-risk patients in terms of long-term survival.
arXiv Detail & Related papers (2020-10-08T15:30:17Z) - Confidence-guided Lesion Mask-based Simultaneous Synthesis of Anatomic
and Molecular MR Images in Patients with Post-treatment Malignant Gliomas [65.64363834322333]
Confidence Guided SAMR (CG-SAMR) synthesizes data from lesion information to multi-modal anatomic sequences.
module guides the synthesis based on confidence measure about the intermediate results.
experiments on real clinical data demonstrate that the proposed model can perform better than the state-of-theart synthesis methods.
arXiv Detail & Related papers (2020-08-06T20:20:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.