Related papers: Addressing out-of-distribution label noise in webly-labelled data

Addressing out-of-distribution label noise in webly-labelled data

URL: http://arxiv.org/abs/2110.13699v1
Date: Tue, 26 Oct 2021 13:38:50 GMT
Title: Addressing out-of-distribution label noise in webly-labelled data
Authors: Paul Albert and Diego Ortego and Eric Arazo and Noel O'Connor and Kevin McGuinness
Abstract summary: Data gathering and annotation using a search engine is a simple alternative to generating a fully human-annotated dataset. Although web crawling is very time efficient, some of the retrieved images are unavoidably noisy. Design robust algorithms for training on noisy data gathered from the web is an important research perspective.
Score: 8.625286650577134
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A recurring focus of the deep learning community is towards reducing the labeling effort. Data gathering and annotation using a search engine is a simple alternative to generating a fully human-annotated and human-gathered dataset. Although web crawling is very time efficient, some of the retrieved images are unavoidably noisy, i.e. incorrectly labeled. Designing robust algorithms for training on noisy data gathered from the web is an important research perspective that would render the building of datasets easier. In this paper we conduct a study to understand the type of label noise to expect when building a dataset using a search engine. We review the current limitations of state-of-the-art methods for dealing with noisy labels for image classification tasks in the case of web noise distribution. We propose a simple solution to bridge the gap with a fully clean dataset using Dynamic Softening of Out-of-distribution Samples (DSOS), which we design on corrupted versions of the CIFAR-100 dataset, and compare against state-of-the-art algorithms on the web noise perturbated MiniImageNet and Stanford datasets and on real label noise datasets: WebVision 1.0 and Clothing1M. Our work is fully reproducible https://git.io/JKGcj

Related papers

AlleNoise: large-scale text classification benchmark dataset with real-world label noise [40.11095094521714]
We present AlleNoise, a new curated text classification benchmark dataset with real-world instance-dependent label noise. The noise distribution comes from actual users of a major e-commerce marketplace, so it realistically reflects the semantics of human mistakes. We demonstrate that a representative selection of established methods for learning with noisy labels is inadequate to handle such real-world noise.
arXiv Detail & Related papers (2024-06-24T09:29:14Z)
Learning Confident Classifiers in the Presence of Label Noise [5.829762367794509]
This paper proposes a probabilistic model for noisy observations that allows us to build a confident classification and segmentation models. Our experiments show that our algorithm outperforms state-of-the-art solutions for the considered classification and segmentation problems.
arXiv Detail & Related papers (2023-01-02T04:27:25Z)
Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets [18.19216557948184]
Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset. Their main drawback remains the proportion of incorrect (noisy) samples retrieved. We propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning. We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere.
arXiv Detail & Related papers (2022-07-04T16:51:56Z)
Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space. We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z)
Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise. This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N) We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z)
Learning to Aggregate and Refine Noisy Labels for Visual Sentiment Analysis [69.48582264712854]
We propose a robust learning method to perform robust visual sentiment analysis. Our method relies on an external memory to aggregate and filter noisy labels during training. We establish a benchmark for visual sentiment analysis with label noise using publicly available datasets.
arXiv Detail & Related papers (2021-09-15T18:18:28Z)
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets [90.61266099147053]
We investigate efficient annotation strategies for collecting multi-class classification labels for a large collection of images. We propose modifications and best practices aimed at minimizing human labeling effort. Simulated experiments on a 125k image subset of the ImageNet100 show that it can be annotated to 80% top-1 accuracy with 0.35 annotations per image on average.
arXiv Detail & Related papers (2021-04-26T16:29:32Z)
Attention-Aware Noisy Label Learning for Image Classification [97.26664962498887]
Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision. The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr. This paper proposes the attention-aware noisy label learning approach to improve the discriminative capability of the network trained on datasets with potential label noise.
arXiv Detail & Related papers (2020-09-30T15:45:36Z)
Data-driven Meta-set Based Fine-Grained Visual Classification [61.083706396575295]
We propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition. Specifically, guided by a small amount of clean meta-set, we train a selection net in a meta-learning manner to distinguish in- and out-of-distribution noisy images.
arXiv Detail & Related papers (2020-08-06T03:04:16Z)
Audio Tagging by Cross Filtering Noisy Labels [26.14064793686316]
We present a novel framework, named CrossFilter, to combat the noisy labels problem for audio tagging. Our method achieves state-of-the-art performance and even surpasses the ensemble models.
arXiv Detail & Related papers (2020-07-16T07:55:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.