Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection
- URL: http://arxiv.org/abs/2501.11063v1
- Date: Sun, 19 Jan 2025 14:41:55 GMT
- Title: Enhancing Sample Utilization in Noise-Robust Deep Metric Learning With Subgroup-Based Positive-Pair Selection
- Authors: Zhipeng Yu, Qianqian Xu, Yangbangyan Jiang, Yingfei Sun, Qingming Huang,
- Abstract summary: The existence of noisy labels in real-world data negatively impacts the performance of deep learning models.
We propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS)
SGPS constructs reliable positive pairs for noisy samples to enhance the sample utilization.
- Score: 84.78475642696137
- License:
- Abstract: The existence of noisy labels in real-world data negatively impacts the performance of deep learning models. Although much research effort has been devoted to improving the robustness towards noisy labels in classification tasks, the problem of noisy labels in deep metric learning (DML) remains under-explored. Existing noisy label learning methods designed for DML mainly discard suspicious noisy samples, resulting in a waste of the training data. To address this issue, we propose a noise-robust DML framework with SubGroup-based Positive-pair Selection (SGPS), which constructs reliable positive pairs for noisy samples to enhance the sample utilization. Specifically, SGPS first effectively identifies clean and noisy samples by a probability-based clean sample selectionstrategy. To further utilize the remaining noisy samples, we discover their potential similar samples based on the subgroup information given by a subgroup generation module and then aggregate them into informative positive prototypes for each noisy sample via a positive prototype generation module. Afterward, a new contrastive loss is tailored for the noisy samples with their selected positive pairs. SGPS can be easily integrated into the training process of existing pair-wise DML tasks, like image retrieval and face recognition. Extensive experiments on multiple synthetic and real-world large-scale label noise datasets demonstrate the effectiveness of our proposed method. Without any bells and whistles, our SGPS framework outperforms the state-of-the-art noisy label DML methods. Code is available at \url{https://github.com/smuelpeng/SGPS-NoiseFreeDML}.
Related papers
- ANNE: Adaptive Nearest Neighbors and Eigenvector-based Sample Selection for Robust Learning with Noisy Labels [7.897299759691143]
This paper introduces the Adaptive Nearest Neighbors and Eigenvector-based (ANNE) sample selection methodology.
ANNE integrates loss-based sampling with the feature-based sampling methods FINE and Adaptive KNN to optimize performance across a wide range of noise rate scenarios.
arXiv Detail & Related papers (2024-11-03T15:51:38Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Regroup Median Loss for Combating Label Noise [19.51996047333779]
Deep model training requires large-scale datasets of annotated data.
Due to the difficulty of annotating a large number of samples, label noise caused by incorrect annotations is inevitable.
We propose Regroup Median Loss (RML) to reduce the probability of selecting noisy samples and correct losses of noisy samples.
arXiv Detail & Related papers (2023-12-11T10:19:55Z) - Combating Label Noise With A General Surrogate Model For Sample Selection [77.45468386115306]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - PASS: Peer-Agreement based Sample Selection for training with Noisy Labels [16.283722126438125]
The prevalence of noisy-label samples poses a significant challenge in deep learning, inducing overfitting effects.
Current methodologies often rely on the small-loss hypothesis or feature-based selection to separate noisy- and clean-label samples.
We propose a new noisy-label detection method, termed Peer-Agreement based Sample Selection (PASS), to address this problem.
arXiv Detail & Related papers (2023-03-20T00:35:33Z) - Learning from Training Dynamics: Identifying Mislabeled Data Beyond
Manually Designed Features [43.41573458276422]
We introduce a novel learning-based solution, leveraging a noise detector, instanced by an LSTM network.
The proposed method trains the noise detector in a supervised manner using the dataset with synthesized label noises.
Results show that the proposed method precisely detects mislabeled samples on various datasets without further adaptation.
arXiv Detail & Related papers (2022-12-19T09:39:30Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Robust Meta-learning with Sampling Noise and Label Noise via
Eigen-Reptile [78.1212767880785]
meta-learner is prone to overfitting since there are only a few available samples.
When handling the data with noisy labels, the meta-learner could be extremely sensitive to label noise.
We present Eigen-Reptile (ER) that updates the meta- parameters with the main direction of historical task-specific parameters.
arXiv Detail & Related papers (2022-06-04T08:48:02Z) - Noise-resistant Deep Metric Learning with Ranking-based Instance
Selection [59.286567680389766]
We propose a noise-resistant training technique for DML, which we name Probabilistic Ranking-based Instance Selection with Memory (PRISM)
PRISM identifies noisy data in a minibatch using average similarity against image features extracted from several previous versions of the neural network.
To alleviate the high computational cost brought by the memory bank, we introduce an acceleration method that replaces individual data points with the class centers.
arXiv Detail & Related papers (2021-03-30T03:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.