Re-assessing ImageNet: How aligned is its single-label assumption with its multi-label nature?
- URL: http://arxiv.org/abs/2412.18409v1
- Date: Tue, 24 Dec 2024 12:55:31 GMT
- Title: Re-assessing ImageNet: How aligned is its single-label assumption with its multi-label nature?
- Authors: Esla Timothy Anzaku, Seyed Amir Mousavi, Arnout Van Messem, Wesley De Neve,
- Abstract summary: We analyze the effectiveness of pre-trained state-of-the-art deep neural network (DNN) models on ImageNet and one of its variants, ImageNetV2.<n>Our findings show that these reported declines are largely attributable to a characteristic of the dataset that has not received sufficient attention.<n>Our findings highlight the importance of considering the multi-label nature of the ImageNet dataset during benchmarking.
- Score: 1.4828022319975973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ImageNet, an influential dataset in computer vision, is traditionally evaluated using single-label classification, which assumes that an image can be adequately described by a single concept or label. However, this approach may not fully capture the complex semantics within the images available in ImageNet, potentially hindering the development of models that effectively learn these intricacies. This study critically examines the prevalent single-label benchmarking approach and advocates for a shift to multi-label benchmarking for ImageNet. This shift would enable a more comprehensive assessment of the capabilities of deep neural network (DNN) models. We analyze the effectiveness of pre-trained state-of-the-art DNNs on ImageNet and one of its variants, ImageNetV2. Studies in the literature have reported unexpected accuracy drops of 11% to 14% on ImageNetV2. Our findings show that these reported declines are largely attributable to a characteristic of the dataset that has not received sufficient attention -- the proportion of images with multiple labels. Taking this characteristic into account, the results of our experiments provide evidence that there is no substantial degradation in effectiveness on ImageNetV2. Furthermore, we acknowledge that ImageNet pre-trained models exhibit some capability at capturing the multi-label nature of the dataset even though they were trained under the single-label assumption. Consequently, we propose a new evaluation approach to augment existing approaches that assess this capability. Our findings highlight the importance of considering the multi-label nature of the ImageNet dataset during benchmarking. Failing to do so could lead to incorrect conclusions regarding the effectiveness of DNNs and divert research efforts from addressing other substantial challenges related to the reliability and robustness of these models.
Related papers
- When VLMs Meet Image Classification: Test Sets Renovation via Missing Label Identification [11.49089004019603]
We present a comprehensive framework named REVEAL to address both noisy labels and missing labels in image classification test sets.<n> REVEAL detects potential noisy labels and omissions, aggregates predictions from various methods, and refines label accuracy through confidence-informed predictions and consensus-based filtering.<n>Our method effectively reveals missing labels from public datasets and provides soft-labeled results with likelihoods.
arXiv Detail & Related papers (2025-05-22T02:47:36Z) - Adaptive Hierarchical Graph Cut for Multi-granularity Out-of-distribution Detection [10.200872243175183]
This paper focuses on a significant yet challenging task: out-of-distribution detection (OOD detection)<n>Previous works have made decent success, but they are ineffective for real-world challenging applications.<n>We propose a novel Adaptive Hierarchical Graph Cut network (AHGC) to explore the semantic relationship between different images.
arXiv Detail & Related papers (2024-12-20T08:32:02Z) - ImagiNet: A Multi-Content Benchmark for Synthetic Image Detection [0.0]
We introduce ImagiNet, a dataset of 200K examples spanning four categories: photos, paintings, faces, and miscellaneous.
Synthetic images in ImagiNet are produced with both open-source and proprietary generators, whereas real counterparts for each content type are collected from public datasets.
arXiv Detail & Related papers (2024-07-29T13:57:24Z) - Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration [60.95748658638956]
This paper introduces the Multi-Label Confidence task, aiming to provide well-calibrated confidence scores in multi-label scenarios.
Existing single-label calibration methods fail to account for category correlations, which are crucial for addressing semantic confusion.
We propose the Dynamic Correlation Learning and Regularization algorithm, which leverages multi-grained semantic correlations to better model semantic confusion.
arXiv Detail & Related papers (2024-07-09T13:26:21Z) - ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [78.58860252442045]
We introduce generative model as a data source for hard images that benchmark deep models' robustness.
We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D.
Our work suggests that diffusion models can be an effective source to test vision models.
arXiv Detail & Related papers (2024-03-27T17:23:39Z) - Intrinsic Self-Supervision for Data Quality Audits [35.69673085324971]
Benchmark datasets in computer vision often contain off-topic images, near duplicates, and label errors.
In this paper, we revisit the task of data cleaning and formalize it as either a ranking problem, or a scoring problem.
We find that a specific combination of context-aware self-supervised representation learning and distance-based indicators is effective in finding issues without annotation biases.
arXiv Detail & Related papers (2023-05-26T15:57:04Z) - Semi-Supervised Learning with Pseudo-Negative Labels for Image
Classification [14.100569951592417]
We propose a mutual learning framework based on pseudo-negative labels.
By reducing the prediction probability on pseudo-negative labels, the dual model can improve its prediction ability.
Our framework achieves state-of-the-art results on several main benchmarks.
arXiv Detail & Related papers (2023-01-10T14:15:17Z) - Category-Adaptive Label Discovery and Noise Rejection for Multi-label
Image Recognition with Partial Positive Labels [78.88007892742438]
Training multi-label models with partial positive labels (MLR-PPL) attracts increasing attention.
Previous works regard unknown labels as negative and adopt traditional MLR algorithms.
We propose to explore semantic correlation among different images to facilitate the MLR-PPL task.
arXiv Detail & Related papers (2022-11-15T02:11:20Z) - Robustifying Deep Vision Models Through Shape Sensitization [19.118696557797957]
We propose a simple, lightweight adversarial augmentation technique that explicitly incentivizes the network to learn holistic shapes.
Our augmentations superpose edgemaps from one image onto another image with shuffled patches, using a randomly determined mixing proportion.
We show that our augmentations significantly improve classification accuracy and robustness measures on a range of datasets and neural architectures.
arXiv Detail & Related papers (2022-11-14T11:17:46Z) - Dual-Perspective Semantic-Aware Representation Blending for Multi-Label
Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images.
The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z) - Uncertainty-Aware Semi-Supervised Few Shot Segmentation [9.098329723771116]
Few shot segmentation (FSS) aims to learn pixel-level classification of a target object in a query image using only a few annotated support samples.
This is challenging as it requires modeling appearance variations of target objects and the diverse visual cues between query and support images with limited information.
We propose a semi-supervised FSS strategy that leverages additional prototypes from unlabeled images with uncertainty guided pseudo label refinement.
arXiv Detail & Related papers (2021-10-18T00:37:46Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - A Theory-Driven Self-Labeling Refinement Method for Contrastive
Representation Learning [111.05365744744437]
Unsupervised contrastive learning labels crops of the same image as positives, and other image crops as negatives.
In this work, we first prove that for contrastive learning, inaccurate label assignment heavily impairs its generalization for semantic instance discrimination.
Inspired by this theory, we propose a novel self-labeling refinement approach for contrastive learning.
arXiv Detail & Related papers (2021-06-28T14:24:52Z) - If your data distribution shifts, use self-learning [24.23584770840611]
Self-learning techniques like entropy and pseudo-labeling are simple and effective at improving performance of a deployed computer vision model under systematic domain shifts.
We conduct a wide range of large-scale experiments and show consistent improvements irrespective of the model architecture.
arXiv Detail & Related papers (2021-04-27T01:02:15Z) - Automated Cleanup of the ImageNet Dataset by Model Consensus,
Explainability and Confident Learning [0.0]
ImageNet was the backbone of various convolutional neural networks (CNNs) trained on ILSVRC12Net.
This paper describes automated applications based on model consensus, explainability and confident learning to correct labeling mistakes.
The ImageNet-Clean improves the model performance by 2-2.4 % for SqueezeNet and EfficientNet-B0 models.
arXiv Detail & Related papers (2021-03-30T13:16:35Z) - W2WNet: a two-module probabilistic Convolutional Neural Network with
embedded data cleansing functionality [2.695466667982714]
Wise2WipedNet (W2WNet) is a new two- module Convolutional Neural Network.
A Wise module exploits Bayesian inference to identify and discard spurious images during the training.
A Wiped module takes care of the final classification while broadcasting information on the prediction confidence at inference time.
arXiv Detail & Related papers (2021-03-24T11:28:59Z) - Re-labeling ImageNet: from Single to Multi-Labels, from Global to
Localized Labels [34.13899937264952]
ImageNet has been arguably the most popular image classification benchmark, but it is also the one with a significant level of label noise.
Recent studies have shown that many samples contain multiple classes, despite being assumed to be a single-label benchmark.
We argue that the mismatch between single-label annotations and effectively multi-label images is equally, if not more, problematic in the training setup, where random crops are applied.
arXiv Detail & Related papers (2021-01-13T11:55:58Z) - One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations.
We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z) - Are we done with ImageNet? [86.01120671361844]
We develop a more robust procedure for collecting human annotations of the ImageNet validation set.
We reassess the accuracy of recently proposed ImageNet classifiers, and find their gains to be substantially smaller than those reported on the original labels.
The original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end.
arXiv Detail & Related papers (2020-06-12T13:17:25Z) - From ImageNet to Image Classification: Contextualizing Progress on
Benchmarks [99.19183528305598]
We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset.
Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.
arXiv Detail & Related papers (2020-05-22T17:39:16Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.