Active Learning Framework for Cost-Effective TCR-Epitope Binding
Affinity Prediction
- URL: http://arxiv.org/abs/2310.10893v2
- Date: Mon, 30 Oct 2023 17:06:32 GMT
- Title: Active Learning Framework for Cost-Effective TCR-Epitope Binding
Affinity Prediction
- Authors: Pengfei Zhang, Seojin Bang and Heewook Lee
- Abstract summary: ActiveTCR is a framework that incorporates active learning and TCR-epitope binding affinity prediction models.
It aims to maximize performance gains while minimizing the cost of annotation.
Our work is the first systematic investigation of data optimization for TCR-epitope binding affinity prediction.
- Score: 6.3044887592852845
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: T cell receptors (TCRs) are critical components of adaptive immune systems,
responsible for responding to threats by recognizing epitope sequences
presented on host cell surface. Computational prediction of binding affinity
between TCRs and epitope sequences using machine/deep learning has attracted
intense attention recently. However, its success is hindered by the lack of
large collections of annotated TCR-epitope pairs. Annotating their binding
affinity requires expensive and time-consuming wet-lab evaluation. To reduce
annotation cost, we present ActiveTCR, a framework that incorporates active
learning and TCR-epitope binding affinity prediction models. Starting with a
small set of labeled training pairs, ActiveTCR iteratively searches for
unlabeled TCR-epitope pairs that are ''worth'' for annotation. It aims to
maximize performance gains while minimizing the cost of annotation. We compared
four query strategies with a random sampling baseline and demonstrated that
ActiveTCR reduces annotation costs by approximately 40%. Furthermore, we showed
that providing ground truth labels of TCR-epitope pairs to query strategies can
help identify and reduce more than 40% redundancy among already annotated pairs
without compromising model performance, enabling users to train equally
powerful prediction models with less training data. Our work is the first
systematic investigation of data optimization for TCR-epitope binding affinity
prediction.
Related papers
- Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - Contrastive learning of T cell receptor representations [11.053778245621544]
We introduce a TCR language model called SCEPTR, capable of data-efficient transfer learning.
We introduce a novel pre-training strategy combining autocontrastive learning and masked-language modelling.
We anticipate that contrastive learning will be a useful paradigm to decode the rules of TCR specificity.
arXiv Detail & Related papers (2024-06-10T15:50:45Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Learning Repeatable Speech Embeddings Using An Intra-class Correlation
Regularizer [16.716653844774374]
We evaluate the repeatability of embeddings using the intra-class correlation coefficient (ICC)
We propose a novel regularizer, the ICC regularizer, as a complementary component for contrastive losses to guide deep neural networks to produce embeddings with higher repeatability.
We implement the ICC regularizer and apply it to three speech tasks: speaker verification, voice style conversion, and a clinical application for detecting dysphonic voice.
arXiv Detail & Related papers (2023-10-25T23:21:46Z) - An Experimental Study on Private Aggregation of Teacher Ensemble
Learning for End-to-End Speech Recognition [51.232523987916636]
Differential privacy (DP) is one data protection avenue to safeguard user information used for training deep models by imposing noisy distortion on privacy data.
In this work, we extend PATE learning to work with dynamic patterns, namely speech, and perform one very first experimental study on ASR to avoid acoustic data leakage.
arXiv Detail & Related papers (2022-10-11T16:55:54Z) - Reducing Predictive Feature Suppression in Resource-Constrained
Contrastive Image-Caption Retrieval [65.33981533521207]
We introduce an approach to reduce predictive feature suppression for resource-constrained ICR methods: latent target decoding (LTD)
LTD reconstructs the input caption in a latent space of a general-purpose sentence encoder, which prevents the image and caption encoder from suppressing predictive features.
Our experiments show that, unlike reconstructing the input caption in the input space, LTD reduces predictive feature suppression, measured by obtaining higher recall@k, r-precision, and nDCG scores.
arXiv Detail & Related papers (2022-04-28T09:55:28Z) - Adversarial Attacks and Defense for Non-Parametric Two-Sample Tests [73.32304304788838]
This paper systematically uncovers the failure mode of non-parametric TSTs through adversarial attacks.
To enable TST-agnostic attacks, we propose an ensemble attack framework that jointly minimizes the different types of test criteria.
To robustify TSTs, we propose a max-min optimization that iteratively generates adversarial pairs to train the deep kernels.
arXiv Detail & Related papers (2022-02-07T11:18:04Z) - TITAN: T Cell Receptor Specificity Prediction with Bimodal Attention
Networks [0.5371337604556311]
We propose a bimodal neural network that encodes both TCR sequences and epITopes to enable independent study of capabilities to unseen sequences and transfer/ors.
Tcr-distance-distance neural network exhibits competitive performance on unseen TCRs.
Tcr-distance-distance neural network also exhibits competitive performance on unseen TCRs.
arXiv Detail & Related papers (2021-04-21T09:25:14Z) - SISE-PC: Semi-supervised Image Subsampling for Explainable Pathology [0.7226144684379189]
We propose a novel active learning framework that identifies a minimal sub-sampled dataset containing the most uncertain OCT image samples.
The proposed method can be extended to other medical images to minimize prediction costs.
arXiv Detail & Related papers (2021-02-23T09:00:15Z) - Deep Learning for Virtual Screening: Five Reasons to Use ROC Cost
Functions [80.12620331438052]
deep learning has become an important tool for rapid screening of billions of molecules in silico for potential hits containing desired chemical features.
Despite its importance, substantial challenges persist in training these models, such as severe class imbalance, high decision thresholds, and lack of ground truth labels in some datasets.
We argue in favor of directly optimizing the receiver operating characteristic (ROC) in such cases, due to its robustness to class imbalance.
arXiv Detail & Related papers (2020-06-25T08:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.