Embedding contrastive unsupervised features to cluster in- and
out-of-distribution noise in corrupted image datasets
- URL: http://arxiv.org/abs/2207.01573v1
- Date: Mon, 4 Jul 2022 16:51:56 GMT
- Title: Embedding contrastive unsupervised features to cluster in- and
out-of-distribution noise in corrupted image datasets
- Authors: Paul Albert, Eric Arazo, Noel E. O'Connor and Kevin McGuinness
- Abstract summary: Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset.
Their main drawback remains the proportion of incorrect (noisy) samples retrieved.
We propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning.
We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere.
- Score: 18.19216557948184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using search engines for web image retrieval is a tempting alternative to
manual curation when creating an image dataset, but their main drawback remains
the proportion of incorrect (noisy) samples retrieved. These noisy samples have
been evidenced by previous works to be a mixture of in-distribution (ID)
samples, assigned to the incorrect category but presenting similar visual
semantics to other classes in the dataset, and out-of-distribution (OOD)
images, which share no semantic correlation with any category from the dataset.
The latter are, in practice, the dominant type of noisy images retrieved. To
tackle this noise duality, we propose a two stage algorithm starting with a
detection step where we use unsupervised contrastive feature learning to
represent images in a feature space. We find that the alignment and uniformity
principles of contrastive learning allow OOD samples to be linearly separated
from ID samples on the unit hypersphere. We then spectrally embed the
unsupervised representations using a fixed neighborhood size and apply an
outlier sensitive clustering at the class level to detect the clean and OOD
clusters as well as ID noisy outliers. We finally train a noise robust neural
network that corrects ID noise to the correct category and utilizes OOD samples
in a guided contrastive objective, clustering them to improve low-level
features. Our algorithm improves the state-of-the-art results on synthetic
noise image datasets as well as real-world web-crawled data. Our work is fully
reproducible [github].
Related papers
- An accurate detection is not all you need to combat label noise in web-noisy datasets [23.020126612431746]
We show that direct estimation of the separating hyperplane can indeed offer an accurate detection of OOD samples.
We propose a hybrid solution that alternates between noise detection using linear separation and a state-of-the-art (SOTA) small-loss approach.
arXiv Detail & Related papers (2024-07-08T00:21:42Z) - AdapNet: Adaptive Noise-Based Network for Low-Quality Image Retrieval [41.9012882549912]
We propose an Adaptive Noise-Based Network (AdapNet) to learn robust abstract representations.
AdapNet surpasses state-of-the-art methods on the Noise Revisited Oxford and Noise Revisited Paris benchmarks.
arXiv Detail & Related papers (2024-05-28T00:25:41Z) - SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder [13.453138169497903]
SeNM-VAE is a semi-supervised noise modeling method that leverages both paired and unpaired datasets to generate realistic degraded data.
We employ our method to generate paired training samples for real-world image denoising and super-resolution tasks.
Our approach excels in the quality of synthetic degraded images compared to other unpaired and paired noise modeling methods.
arXiv Detail & Related papers (2024-03-26T09:03:40Z) - Deep Semantic Statistics Matching (D2SM) Denoising Network [70.01091467628068]
We introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network.
It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space.
By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks.
arXiv Detail & Related papers (2022-07-19T14:35:42Z) - Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy.
Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification.
We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z) - Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images [98.82804259905478]
We present Neighbor2Neighbor to train an effective image denoising model with only noisy images.
In detail, input and target used to train a network are images sub-sampled from the same noisy image.
A denoising network is trained on sub-sampled training pairs generated in the first stage, with a proposed regularizer as additional loss for better performance.
arXiv Detail & Related papers (2021-01-08T02:03:25Z) - Attention-Aware Noisy Label Learning for Image Classification [97.26664962498887]
Deep convolutional neural networks (CNNs) learned on large-scale labeled samples have achieved remarkable progress in computer vision.
The cheapest way to obtain a large body of labeled visual data is to crawl from websites with user-supplied labels, such as Flickr.
This paper proposes the attention-aware noisy label learning approach to improve the discriminative capability of the network trained on datasets with potential label noise.
arXiv Detail & Related papers (2020-09-30T15:45:36Z) - Data-driven Meta-set Based Fine-Grained Visual Classification [61.083706396575295]
We propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.
Specifically, guided by a small amount of clean meta-set, we train a selection net in a meta-learning manner to distinguish in- and out-of-distribution noisy images.
arXiv Detail & Related papers (2020-08-06T03:04:16Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.