A Noisy-Label-Learning Formulation for Immune Repertoire Classification
and Disease-Associated Immune Receptor Sequence Identification
- URL: http://arxiv.org/abs/2307.15934v1
- Date: Sat, 29 Jul 2023 09:19:27 GMT
- Title: A Noisy-Label-Learning Formulation for Immune Repertoire Classification
and Disease-Associated Immune Receptor Sequence Identification
- Authors: Mingcai Chen, Yu Zhao, Zhonghuang Wang, Bing He and Jianhua Yao
- Abstract summary: Immune repertoire classification is a frontier research topic in computational biology.
We propose a noisy-label-learning formulation to solve the immune repertoire classification task.
Experiments on the Cytomegalovirus (CMV) and Cancer datasets demonstrate our method's effectiveness and superior performance.
- Score: 7.619591696318021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Immune repertoire classification, a typical multiple instance learning (MIL)
problem, is a frontier research topic in computational biology that makes
transformative contributions to new vaccines and immune therapies. However, the
traditional instance-space MIL, directly assigning bag-level labels to
instances, suffers from the massive amount of noisy labels and extremely low
witness rate. In this work, we propose a noisy-label-learning formulation to
solve the immune repertoire classification task. To remedy the inaccurate
supervision of repertoire-level labels for a sequence-level classifier, we
design a robust training strategy: The initial labels are smoothed to be
asymmetric and are progressively corrected using the model's predictions
throughout the training process. Furthermore, two models with the same
architecture but different parameter initialization are co-trained
simultaneously to remedy the known "confirmation bias" problem in the
self-training-like schema. As a result, we obtain accurate sequence-level
classification and, subsequently, repertoire-level classification. Experiments
on the Cytomegalovirus (CMV) and Cancer datasets demonstrate our method's
effectiveness and superior performance on sequence-level and repertoire-level
tasks.
Related papers
- LayerMatch: Do Pseudo-labels Benefit All Layers? [77.59625180366115]
Semi-supervised learning offers a promising solution to mitigate the dependency of labeled data.
We develop two layer-specific pseudo-label strategies, termed Grad-ReLU and Avg-Clustering.
Our approach consistently demonstrates exceptional performance on standard semi-supervised learning benchmarks.
arXiv Detail & Related papers (2024-06-20T11:25:50Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification [2.706328351174805]
We propose a hierarchical multi-label classification method based on semi-supervised learning of predictive clustering trees.
We also extend the method towards ensemble learning and propose a method based on the random forest approach.
arXiv Detail & Related papers (2022-07-19T12:49:00Z) - Test-time Adaptation with Calibration of Medical Image Classification
Nets for Label Distribution Shift [24.988087560120366]
We propose the first method to tackle label shift for medical image classification.
Our method effectively adapt the model learned from a single training label distribution to arbitrary unknown test label distribution.
We validate our method on two important medical image classification tasks including liver fibrosis staging and COVID-19 severity prediction.
arXiv Detail & Related papers (2022-07-02T07:55:23Z) - ACPL: Anti-curriculum Pseudo-labelling forSemi-supervised Medical Image
Classification [22.5935068122522]
We propose a new SSL algorithm, called anti-curriculum pseudo-labelling (ACPL)
ACPL introduces novel techniques to select informative unlabelled samples, improving training balance and allowing the model to work for both multi-label and multi-class problems.
Our method outperforms previous SOTA SSL methods on both datasets.
arXiv Detail & Related papers (2021-11-25T05:31:52Z) - Rethinking Pseudo Labels for Semi-Supervised Object Detection [84.697097472401]
We introduce certainty-aware pseudo labels tailored for object detection.
We dynamically adjust the thresholds used to generate pseudo labels and reweight loss functions for each category to alleviate the class imbalance problem.
Our approach improves supervised baselines by up to 10% AP using only 1-10% labeled data from COCO.
arXiv Detail & Related papers (2021-06-01T01:32:03Z) - Binary Classification: Counterbalancing Class Imbalance by Applying
Regression Models in Combination with One-Sided Label Shifts [0.4970364068620607]
We introduce a novel method, which addresses the issues of class imbalance.
We generate a set of negative and positive target labels, such that the corresponding regression task becomes balanced.
We evaluate our approach on a number of publicly available data sets and compare our proposed method to one of the most popular oversampling techniques.
arXiv Detail & Related papers (2020-11-30T13:24:47Z) - Unsupervised Person Re-identification via Multi-label Classification [55.65870468861157]
This paper formulates unsupervised person ReID as a multi-label classification task to progressively seek true labels.
Our method starts by assigning each person image with a single-class label, then evolves to multi-label classification by leveraging the updated ReID model for label prediction.
To boost the ReID model training efficiency in multi-label classification, we propose the memory-based multi-label classification loss (MMCL)
arXiv Detail & Related papers (2020-04-20T12:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.