The Impact of the Single-Label Assumption in Image Recognition Benchmarking
- URL: http://arxiv.org/abs/2412.18409v2
- Date: Wed, 28 May 2025 01:15:13 GMT
- Title: The Impact of the Single-Label Assumption in Image Recognition Benchmarking
- Authors: Esla Timothy Anzaku, Seyed Amir Mousavi, Arnout Van Messem, Wesley De Neve,
- Abstract summary: Deep neural networks (DNNs) are typically evaluated under the assumption that each image has a single correct label.<n>Many images in benchmarks like ImageNet contain multiple valid labels, creating a mismatch between evaluation protocols and the actual complexity of visual data.<n>We rigorously assess the impact of multi-label characteristics on reported accuracy gaps.
- Score: 1.4828022319975973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks (DNNs) are typically evaluated under the assumption that each image has a single correct label. However, many images in benchmarks like ImageNet contain multiple valid labels, creating a mismatch between evaluation protocols and the actual complexity of visual data. This mismatch can penalize DNNs for predicting correct but unannotated labels, which may partly explain reported accuracy drops, such as the widely cited 11 to 14 percent top-1 accuracy decline on ImageNetV2, a replication test set for ImageNet. This raises the question: do such drops reflect genuine generalization failures or artifacts of restrictive evaluation metrics? We rigorously assess the impact of multi-label characteristics on reported accuracy gaps. To evaluate the multi-label prediction capability (MLPC) of single-label-trained models, we introduce a variable top-$k$ evaluation, where $k$ matches the number of valid labels per image. Applied to 315 ImageNet-trained models, our analyses demonstrate that conventional top-1 accuracy disproportionately penalizes valid but secondary predictions. We also propose Aggregate Subgroup Model Accuracy (ASMA) to better capture multi-label performance across model subgroups. Our results reveal wide variability in MLPC, with some models consistently ranking multiple correct labels higher. Under this evaluation, the perceived gap between ImageNet and ImageNetV2 narrows substantially. To further isolate multi-label recognition performance from contextual cues, we introduce PatchML, a synthetic dataset containing systematically combined object patches. PatchML demonstrates that many models trained with single-label supervision nonetheless recognize multiple objects. Altogether, these findings highlight limitations in single-label evaluation and reveal that modern DNNs have stronger multi-label capabilities than standard metrics suggest.
Related papers
- When VLMs Meet Image Classification: Test Sets Renovation via Missing Label Identification [11.49089004019603]
We present a comprehensive framework named REVEAL to address both noisy labels and missing labels in image classification test sets.<n> REVEAL detects potential noisy labels and omissions, aggregates predictions from various methods, and refines label accuracy through confidence-informed predictions and consensus-based filtering.<n>Our method effectively reveals missing labels from public datasets and provides soft-labeled results with likelihoods.
arXiv Detail & Related papers (2025-05-22T02:47:36Z) - Adaptive Hierarchical Graph Cut for Multi-granularity Out-of-distribution Detection [10.200872243175183]
This paper focuses on a significant yet challenging task: out-of-distribution detection (OOD detection)<n>Previous works have made decent success, but they are ineffective for real-world challenging applications.<n>We propose a novel Adaptive Hierarchical Graph Cut network (AHGC) to explore the semantic relationship between different images.
arXiv Detail & Related papers (2024-12-20T08:32:02Z) - ImagiNet: A Multi-Content Benchmark for Synthetic Image Detection [0.0]
We introduce ImagiNet, a dataset of 200K examples spanning four categories: photos, paintings, faces, and miscellaneous.
Synthetic images in ImagiNet are produced with both open-source and proprietary generators, whereas real counterparts for each content type are collected from public datasets.
arXiv Detail & Related papers (2024-07-29T13:57:24Z) - Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration [60.95748658638956]
This paper introduces the Multi-Label Confidence task, aiming to provide well-calibrated confidence scores in multi-label scenarios.
Existing single-label calibration methods fail to account for category correlations, which are crucial for addressing semantic confusion.
We propose the Dynamic Correlation Learning and Regularization algorithm, which leverages multi-grained semantic correlations to better model semantic confusion.
arXiv Detail & Related papers (2024-07-09T13:26:21Z) - ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [78.58860252442045]
We introduce generative model as a data source for hard images that benchmark deep models' robustness.
We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D.
Our work suggests that diffusion models can be an effective source to test vision models.
arXiv Detail & Related papers (2024-03-27T17:23:39Z) - Intrinsic Self-Supervision for Data Quality Audits [35.69673085324971]
Benchmark datasets in computer vision often contain off-topic images, near duplicates, and label errors.
In this paper, we revisit the task of data cleaning and formalize it as either a ranking problem, or a scoring problem.
We find that a specific combination of context-aware self-supervised representation learning and distance-based indicators is effective in finding issues without annotation biases.
arXiv Detail & Related papers (2023-05-26T15:57:04Z) - Semi-Supervised Learning with Pseudo-Negative Labels for Image
Classification [14.100569951592417]
We propose a mutual learning framework based on pseudo-negative labels.
By reducing the prediction probability on pseudo-negative labels, the dual model can improve its prediction ability.
Our framework achieves state-of-the-art results on several main benchmarks.
arXiv Detail & Related papers (2023-01-10T14:15:17Z) - Category-Adaptive Label Discovery and Noise Rejection for Multi-label
Image Recognition with Partial Positive Labels [78.88007892742438]
Training multi-label models with partial positive labels (MLR-PPL) attracts increasing attention.
Previous works regard unknown labels as negative and adopt traditional MLR algorithms.
We propose to explore semantic correlation among different images to facilitate the MLR-PPL task.
arXiv Detail & Related papers (2022-11-15T02:11:20Z) - Robustifying Deep Vision Models Through Shape Sensitization [19.118696557797957]
We propose a simple, lightweight adversarial augmentation technique that explicitly incentivizes the network to learn holistic shapes.
Our augmentations superpose edgemaps from one image onto another image with shuffled patches, using a randomly determined mixing proportion.
We show that our augmentations significantly improve classification accuracy and robustness measures on a range of datasets and neural architectures.
arXiv Detail & Related papers (2022-11-14T11:17:46Z) - Dual-Perspective Semantic-Aware Representation Blending for Multi-Label
Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images.
The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z) - Uncertainty-Aware Semi-Supervised Few Shot Segmentation [9.098329723771116]
Few shot segmentation (FSS) aims to learn pixel-level classification of a target object in a query image using only a few annotated support samples.
This is challenging as it requires modeling appearance variations of target objects and the diverse visual cues between query and support images with limited information.
We propose a semi-supervised FSS strategy that leverages additional prototypes from unlabeled images with uncertainty guided pseudo label refinement.
arXiv Detail & Related papers (2021-10-18T00:37:46Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - A Theory-Driven Self-Labeling Refinement Method for Contrastive
Representation Learning [111.05365744744437]
Unsupervised contrastive learning labels crops of the same image as positives, and other image crops as negatives.
In this work, we first prove that for contrastive learning, inaccurate label assignment heavily impairs its generalization for semantic instance discrimination.
Inspired by this theory, we propose a novel self-labeling refinement approach for contrastive learning.
arXiv Detail & Related papers (2021-06-28T14:24:52Z) - If your data distribution shifts, use self-learning [24.23584770840611]
Self-learning techniques like entropy and pseudo-labeling are simple and effective at improving performance of a deployed computer vision model under systematic domain shifts.
We conduct a wide range of large-scale experiments and show consistent improvements irrespective of the model architecture.
arXiv Detail & Related papers (2021-04-27T01:02:15Z) - Automated Cleanup of the ImageNet Dataset by Model Consensus,
Explainability and Confident Learning [0.0]
ImageNet was the backbone of various convolutional neural networks (CNNs) trained on ILSVRC12Net.
This paper describes automated applications based on model consensus, explainability and confident learning to correct labeling mistakes.
The ImageNet-Clean improves the model performance by 2-2.4 % for SqueezeNet and EfficientNet-B0 models.
arXiv Detail & Related papers (2021-03-30T13:16:35Z) - W2WNet: a two-module probabilistic Convolutional Neural Network with
embedded data cleansing functionality [2.695466667982714]
Wise2WipedNet (W2WNet) is a new two- module Convolutional Neural Network.
A Wise module exploits Bayesian inference to identify and discard spurious images during the training.
A Wiped module takes care of the final classification while broadcasting information on the prediction confidence at inference time.
arXiv Detail & Related papers (2021-03-24T11:28:59Z) - Re-labeling ImageNet: from Single to Multi-Labels, from Global to
Localized Labels [34.13899937264952]
ImageNet has been arguably the most popular image classification benchmark, but it is also the one with a significant level of label noise.
Recent studies have shown that many samples contain multiple classes, despite being assumed to be a single-label benchmark.
We argue that the mismatch between single-label annotations and effectively multi-label images is equally, if not more, problematic in the training setup, where random crops are applied.
arXiv Detail & Related papers (2021-01-13T11:55:58Z) - One-bit Supervision for Image Classification [121.87598671087494]
One-bit supervision is a novel setting of learning from incomplete annotations.
We propose a multi-stage training paradigm which incorporates negative label suppression into an off-the-shelf semi-supervised learning algorithm.
arXiv Detail & Related papers (2020-09-14T03:06:23Z) - Are we done with ImageNet? [86.01120671361844]
We develop a more robust procedure for collecting human annotations of the ImageNet validation set.
We reassess the accuracy of recently proposed ImageNet classifiers, and find their gains to be substantially smaller than those reported on the original labels.
The original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end.
arXiv Detail & Related papers (2020-06-12T13:17:25Z) - From ImageNet to Image Classification: Contextualizing Progress on
Benchmarks [99.19183528305598]
We study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset.
Our analysis pinpoints how a noisy data collection pipeline can lead to a systematic misalignment between the resulting benchmark and the real-world task it serves as a proxy for.
arXiv Detail & Related papers (2020-05-22T17:39:16Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.