TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation
- URL: http://arxiv.org/abs/2408.01156v1
- Date: Fri, 2 Aug 2024 10:16:28 GMT
- Title: TCR-GPT: Integrating Autoregressive Model and Reinforcement Learning for T-Cell Receptor Repertoires Generation
- Authors: Yicheng Lin, Dandan Zhang, Yun Liu,
- Abstract summary: T-cell receptors (TCRs) play a crucial role in the immune system by recognizing and binding to specific antigens presented by infected or cancerous cells.
Language models, such as auto-regressive transformers, offer a powerful solution by learning the probability distributions of TCR repertoires.
We introduce TCR-GPT, a probabilistic model built on a decoder-only transformer architecture, designed to uncover and replicate sequence patterns in TCR repertoires.
- Score: 6.920411338236452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: T-cell receptors (TCRs) play a crucial role in the immune system by recognizing and binding to specific antigens presented by infected or cancerous cells. Understanding the sequence patterns of TCRs is essential for developing targeted immune therapies and designing effective vaccines. Language models, such as auto-regressive transformers, offer a powerful solution to this problem by learning the probability distributions of TCR repertoires, enabling the generation of new TCR sequences that inherit the underlying patterns of the repertoire. We introduce TCR-GPT, a probabilistic model built on a decoder-only transformer architecture, designed to uncover and replicate sequence patterns in TCR repertoires. TCR-GPT demonstrates an accuracy of 0.953 in inferring sequence probability distributions measured by Pearson correlation coefficient. Furthermore, by leveraging Reinforcement Learning(RL), we adapted the distribution of TCR sequences to generate TCRs capable of recognizing specific peptides, offering significant potential for advancing targeted immune therapies and vaccine development. With the efficacy of RL, fine-tuned pretrained TCR-GPT models demonstrated the ability to produce TCR repertoires likely to bind specific peptides, illustrating RL's efficiency in enhancing the model's adaptability to the probability distributions of biologically relevant TCR sequences.
Related papers
- Predicting T-Cell Receptor Specificity [7.258321140371502]
We established a TCR generative specificity detection framework consisting of an antigen selector and a TCR classifier based on the Random Forest algorithm.
We used the k-fold validation method to compare the performance of our model with ordinary deep learning methods.
arXiv Detail & Related papers (2024-07-27T23:21:07Z) - A large language model for predicting T cell receptor-antigen binding specificity [4.120928123714289]
We propose a Masked Language Model (MLM) to overcome limitations in model generalization.
Specifically, we randomly masked sequence segments and train tcrLM to infer the masked segment, thereby extract expressive feature from TCR sequences.
Our extensive experimental results demonstrate that tcrLM achieved AUC values of 0.937 and 0.933 on independent test sets and external validation sets.
arXiv Detail & Related papers (2024-06-24T08:36:40Z) - Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - AIRIVA: A Deep Generative Model of Adaptive Immune Repertoires [6.918664738267051]
We present an Adaptive Immune Repertoire-Invariant Variational Autoencoder (AIRIVA) that learns a low-dimensional, interpretable, and compositional representation of TCR repertoires to disentangle systematic effects in repertoires.
arXiv Detail & Related papers (2023-04-26T14:40:35Z) - T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to
Cancer Classification [4.824821328103934]
T cell receptors (TCRs) are essential proteins for the adaptive immune system.
Recent advancements in sequencing technologies have enabled the comprehensive profiling of TCR repertoires.
This has led to the discovery of TCRs with potent anti-cancer activity and the development of TCR-based immunotherapies.
arXiv Detail & Related papers (2023-04-25T20:43:41Z) - T-Cell Receptor Optimization with Reinforcement Learning and Mutation
Policies for Precesion Immunotherapy [21.004878412411053]
T-cell receptors (TCRs) are protein complexes found on the surface of T cells and can bind to peptides.
This process is known as TCR recognition and constitutes a key step for immune response.
In this paper, we formulated the search for optimized TCRs as a reinforcement learning problem and presented a framework TCRPPO with a mutation policy.
arXiv Detail & Related papers (2023-03-02T20:25:14Z) - From Cloze to Comprehension: Retrofitting Pre-trained Masked Language
Model to Pre-trained Machine Reader [130.45769668885487]
Pre-trained Machine Reader (PMR) is a novel method for retrofitting masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data.
To build the proposed PMR, we constructed a large volume of general-purpose and high-quality MRC-style training data.
PMR has the potential to serve as a unified model for tackling various extraction and classification tasks in the MRC formulation.
arXiv Detail & Related papers (2022-12-09T10:21:56Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Sequence Transduction with Graph-based Supervision [96.04967815520193]
We present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels.
We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T.
arXiv Detail & Related papers (2021-11-01T21:51:42Z) - Probabilistic Generating Circuits [50.98473654244851]
We propose probabilistic generating circuits (PGCs) for their efficient representation.
PGCs are not just a theoretical framework that unifies vastly different existing models, but also show huge potential in modeling realistic data.
We exhibit a simple class of PGCs that are not trivially subsumed by simple combinations of PCs and DPPs, and obtain competitive performance on a suite of density estimation benchmarks.
arXiv Detail & Related papers (2021-02-19T07:06:53Z) - Pretraining Techniques for Sequence-to-Sequence Voice Conversion [57.65753150356411]
Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to their ability to convert prosody.
We propose to transfer knowledge from other speech processing tasks where large-scale corpora are easily available, typically text-to-speech (TTS) and automatic speech recognition (ASR)
We argue that VC models with such pretrained ASR or TTS model parameters can generate effective hidden representations for high-fidelity, highly intelligible converted speech.
arXiv Detail & Related papers (2020-08-07T11:02:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.