NGC: A Unified Framework for Learning with Open-World Noisy Data
- URL: http://arxiv.org/abs/2108.11035v1
- Date: Wed, 25 Aug 2021 04:04:46 GMT
- Title: NGC: A Unified Framework for Learning with Open-World Noisy Data
- Authors: Zhi-Fan Wu, Tong Wei, Jianwen Jiang, Chaojie Mao, Mingqian Tang,
Yu-Feng Li
- Abstract summary: We propose a new graph-based framework, namely Noisy Graph Cleaning (NGC), which collects clean samples by leveraging geometric structure of data and model predictive confidence.
We conduct experiments on multiple benchmarks with different types of noise and the results demonstrate the superior performance of our method against state of the arts.
- Score: 36.96188289965334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The existence of noisy data is prevalent in both the training and testing
phases of machine learning systems, which inevitably leads to the degradation
of model performance. There have been plenty of works concentrated on learning
with in-distribution (IND) noisy labels in the last decade, i.e., some training
samples are assigned incorrect labels that do not correspond to their true
classes. Nonetheless, in real application scenarios, it is necessary to
consider the influence of out-of-distribution (OOD) samples, i.e., samples that
do not belong to any known classes, which has not been sufficiently explored
yet. To remedy this, we study a new problem setup, namely Learning with
Open-world Noisy Data (LOND). The goal of LOND is to simultaneously learn a
classifier and an OOD detector from datasets with mixed IND and OOD noise. In
this paper, we propose a new graph-based framework, namely Noisy Graph Cleaning
(NGC), which collects clean samples by leveraging geometric structure of data
and model predictive confidence. Without any additional training effort, NGC
can detect and reject the OOD samples based on the learned class prototypes
directly in testing phase. We conduct experiments on multiple benchmarks with
different types of noise and the results demonstrate the superior performance
of our method against state of the arts.
Related papers
- An accurate detection is not all you need to combat label noise in web-noisy datasets [23.020126612431746]
We show that direct estimation of the separating hyperplane can indeed offer an accurate detection of OOD samples.
We propose a hybrid solution that alternates between noise detection using linear separation and a state-of-the-art (SOTA) small-loss approach.
arXiv Detail & Related papers (2024-07-08T00:21:42Z) - A noisy elephant in the room: Is your out-of-distribution detector robust to label noise? [49.88894124047644]
We take a closer look at 20 state-of-the-art OOD detection methods.
We show that poor separation between incorrectly classified ID samples vs. OOD samples is an overlooked yet important limitation of existing methods.
arXiv Detail & Related papers (2024-04-02T09:40:22Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - EARLIN: Early Out-of-Distribution Detection for Resource-efficient
Collaborative Inference [4.826988182025783]
Collaborative inference enables resource-constrained edge devices to make inferences by uploading inputs to a server.
While this setup works cost-effectively for successful inferences, it severely underperforms when the model faces input samples on which the model was not trained.
We propose a novel lightweight OOD detection approach that mines important features from the shallow layers of a pretrained CNN model.
arXiv Detail & Related papers (2021-06-25T18:43:23Z) - On The Consistency Training for Open-Set Semi-Supervised Learning [44.046578996049654]
We study how OOD samples affect training in both low- and high-dimensional spaces.
Our method makes better use of OOD samples and achieves state-of-the-art results.
arXiv Detail & Related papers (2021-01-19T12:38:17Z) - Multi-Task Curriculum Framework for Open-Set Semi-Supervised Learning [54.85397562961903]
Semi-supervised learning (SSL) has been proposed to leverage unlabeled data for training powerful models when only limited labeled data is available.
We address a more complex novel scenario named open-set SSL, where out-of-distribution (OOD) samples are contained in unlabeled data.
Our method achieves state-of-the-art results by successfully eliminating the effect of OOD samples.
arXiv Detail & Related papers (2020-07-22T10:33:55Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.