Addressing out-of-distribution label noise in webly-labelled data
- URL: http://arxiv.org/abs/2110.13699v1
- Date: Tue, 26 Oct 2021 13:38:50 GMT
- Title: Addressing out-of-distribution label noise in webly-labelled data
- Authors: Paul Albert and Diego Ortego and Eric Arazo and Noel O'Connor and
Kevin McGuinness
- Abstract summary: Data gathering and annotation using a search engine is a simple alternative to generating a fully human-annotated dataset.
Although web crawling is very time efficient, some of the retrieved images are unavoidably noisy.
Design robust algorithms for training on noisy data gathered from the web is an important research perspective.
- Score: 8.625286650577134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recurring focus of the deep learning community is towards reducing the
labeling effort. Data gathering and annotation using a search engine is a
simple alternative to generating a fully human-annotated and human-gathered
dataset. Although web crawling is very time efficient, some of the retrieved
images are unavoidably noisy, i.e. incorrectly labeled. Designing robust
algorithms for training on noisy data gathered from the web is an important
research perspective that would render the building of datasets easier. In this
paper we conduct a study to understand the type of label noise to expect when
building a dataset using a search engine. We review the current limitations of
state-of-the-art methods for dealing with noisy labels for image classification
tasks in the case of web noise distribution. We propose a simple solution to
bridge the gap with a fully clean dataset using Dynamic Softening of
Out-of-distribution Samples (DSOS), which we design on corrupted versions of
the CIFAR-100 dataset, and compare against state-of-the-art algorithms on the
web noise perturbated MiniImageNet and Stanford datasets and on real label
noise datasets: WebVision 1.0 and Clothing1M. Our work is fully reproducible
https://git.io/JKGcj
Related papers
- AlleNoise: large-scale text classification benchmark dataset with real-world label noise [40.11095094521714]
We present AlleNoise, a new curated text classification benchmark dataset with real-world instance-dependent label noise.
The noise distribution comes from actual users of a major e-commerce marketplace, so it realistically reflects the semantics of human mistakes.
We demonstrate that a representative selection of established methods for learning with noisy labels is inadequate to handle such real-world noise.
arXiv Detail & Related papers (2024-06-24T09:29:14Z) - Learning Confident Classifiers in the Presence of Label Noise [5.829762367794509]
This paper proposes a probabilistic model for noisy observations that allows us to build a confident classification and segmentation models.
Our experiments show that our algorithm outperforms state-of-the-art solutions for the considered classification and segmentation problems.
arXiv Detail & Related papers (2023-01-02T04:27:25Z) - Embedding contrastive unsupervised features to cluster in- and
out-of-distribution noise in corrupted image datasets [18.19216557948184]
Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset.
Their main drawback remains the proportion of incorrect (noisy) samples retrieved.
We propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning.
We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere.
arXiv Detail & Related papers (2022-07-04T16:51:56Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - Learning to Aggregate and Refine Noisy Labels for Visual Sentiment
Analysis [69.48582264712854]
We propose a robust learning method to perform robust visual sentiment analysis.
Our method relies on an external memory to aggregate and filter noisy labels during training.
We establish a benchmark for visual sentiment analysis with label noise using publicly available datasets.
arXiv Detail & Related papers (2021-09-15T18:18:28Z) - Towards Good Practices for Efficiently Annotating Large-Scale Image
Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images.
We propose modifications and best practices aimed at minimizing human labeling effort.
Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z) - Attention-Aware Noisy Label Learning for Image Classification [97.26664962498887]
Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision.
The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr.
This paper proposes the attention-aware noisy label learning approach to improve the discriminative capability of the network trained on datasets with potential label noise.
arXiv Detail & Related papers (2020-09-30T15:45:36Z) - Data-driven Meta-set Based Fine-Grained Visual Classification [61.083706396575295]
We propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.
Specifically, guided by a small amount of clean meta-set, we train a selection net in a meta-learning manner to distinguish in- and out-of-distribution noisy images.
arXiv Detail & Related papers (2020-08-06T03:04:16Z) - Audio Tagging by Cross Filtering Noisy Labels [26.14064793686316]
We present a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging.
Our method achieves state-of-the-art performance and even surpasses the ensemble models.
arXiv Detail & Related papers (2020-07-16T07:55:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.