Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations
- URL: http://arxiv.org/abs/2505.01433v1
- Date: Tue, 22 Apr 2025 20:22:34 GMT
- Title: Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations
- Authors: Cong Qi, Hanzhang Fang, Siqi jiang, Tianxing Hu, Wei Zhi,
- Abstract summary: We present LANTERN, a deep learning framework that combines large-scale protein language models with chemical representations of peptides.<n>Our model demonstrates superior performance, particularly in zero-shot and few-shot learning scenarios.<n>These results highlight the potential of LANTERN to advance TCR-pMHC binding prediction and support the development of personalized immunotherapies.
- Score: 0.39945675027960637
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding the binding specificity between T-cell receptors (TCRs) and peptide-major histocompatibility complexes (pMHCs) is central to immunotherapy and vaccine development. However, current predictive models struggle with generalization, especially in data-scarce settings and when faced with novel epitopes. We present LANTERN (Large lAnguage model-powered TCR-Enhanced Recognition Network), a deep learning framework that combines large-scale protein language models with chemical representations of peptides. By encoding TCR \b{eta}-chain sequences using ESM-1b and transforming peptide sequences into SMILES strings processed by MolFormer, LANTERN captures rich biological and chemical features critical for TCR-peptide recognition. Through extensive benchmarking against existing models such as ChemBERTa, TITAN, and NetTCR, LANTERN demonstrates superior performance, particularly in zero-shot and few-shot learning scenarios. Our model also benefits from a robust negative sampling strategy and shows significant clustering improvements via embedding analysis. These results highlight the potential of LANTERN to advance TCR-pMHC binding prediction and support the development of personalized immunotherapies.
Related papers
- T-cell receptor specificity landscape revealed through de novo peptide design [2.37499051649312]
An effective binding between T-cell receptors (TCRs) and pathogen-derived peptides presented on Major Histocompatibility Complexes (MHCs) mediates an immune response.<n>Here, we introduce a computational approach to predict TCR interactions with peptides presented on MHC class I alleles, and to design novel immunogenic peptides for specified TCR-MHC complexes.<n>Our approach provides a platform for immunogenic peptide and neoantigen design, opening new computational paths for T-cell vaccine development against viruses and cancer.
arXiv Detail & Related papers (2025-03-01T22:45:19Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.<n>Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.<n>It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - DapPep: Domain Adaptive Peptide-agnostic Learning for Universal T-cell Receptor-antigen Binding Affinity Prediction [38.358558338444624]
We introduce a domain-adaptive peptide-agnostic learning framework DapPep for universal TCR-antigen binding affinity prediction.<n>DapPep consistently outperforms existing tools, showcasing robust generalization capability.<n>It proves effective in challenging clinical tasks such as sorting reactive T cells in tumor neoantigen therapy and identifying key positions in 3D structures.
arXiv Detail & Related papers (2024-11-26T18:06:42Z) - MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction [65.33218256339151]
Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome.
Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs.
We introduce the MeToken model, which tokenizes the micro-environment of each acid, integrating both sequence and structural information into unified discrete tokens.
arXiv Detail & Related papers (2024-11-04T07:14:28Z) - Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation [6.920411338236452]
T-cell receptors (TCRs) play a crucial role in the immune system by recognizing and binding to specific antigens presented by infected or cancerous cells.
Language models, such as auto-regressive transformers, offer a powerful solution by learning the probability distributions of TCR repertoires.
We introduce TCR-GPT, a probabilistic model built on a decoder-only transformer architecture, designed to uncover and replicate sequence patterns in TCR repertoires.
arXiv Detail & Related papers (2024-08-02T10:16:28Z) - Contrastive learning of T cell receptor representations [11.053778245621544]
We introduce a TCR language model called SCEPTR, capable of data-efficient transfer learning.
We introduce a novel pre-training strategy combining autocontrastive learning and masked-language modelling.
We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity.
arXiv Detail & Related papers (2024-06-10T15:50:45Z) - Efficient Prediction of Peptide Self-assembly through Sequential and
Graphical Encoding [57.89530563948755]
This work provides a benchmark analysis of peptide encoding with advanced deep learning models.
It serves as a guide for a wide range of peptide-related predictions such as isoelectric points, hydration free energy, etc.
arXiv Detail & Related papers (2023-07-17T00:43:33Z) - Reprogramming Pretrained Language Models for Antibody Sequence Infilling [72.13295049594585]
Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency.
Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance.
In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data.
arXiv Detail & Related papers (2022-10-05T20:44:55Z) - Attention-aware contrastive learning for predicting T cell
receptor-antigen binding specificity [7.365824008999903]
It has been verified that only a small fraction of the neoantigens presented by MHC class I molecules on the cell surface can elicit T cells.
We propose an attentive-mask contrastive learning model, ATMTCR, for inferring TCR-antigen binding specificity.
arXiv Detail & Related papers (2022-05-17T10:53:32Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.