Related papers: SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks

SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks

URL: http://arxiv.org/abs/2505.15628v1
Date: Wed, 21 May 2025 15:14:34 GMT
Title: SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks
Authors: Iuliia Kotseruba, John K. Tsotsos,
Abstract summary: We analyze the impact of capture conditions, such as camera parameters and lighting, on deep-learning model performance on 3 vision tasks.<n>We create a new benchmark, SNAP, consisting of images of objects taken under controlled lighting conditions and with densely sampled camera settings.<n>Our results show that computer vision datasets are significantly biased, the models trained on this data do not reach human accuracy even on the well-exposed images, and are susceptible to both major exposure changes and minute variations of camera settings.
Score: 12.246649738388388
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generalization of deep-learning-based (DL) computer vision algorithms to various image perturbations is hard to establish and remains an active area of research. The majority of past analyses focused on the images already captured, whereas effects of the image formation pipeline and environment are less studied. In this paper, we address this issue by analyzing the impact of capture conditions, such as camera parameters and lighting, on DL model performance on 3 vision tasks -- image classification, object detection, and visual question answering (VQA). To this end, we assess capture bias in common vision datasets and create a new benchmark, SNAP (for $\textbf{S}$hutter speed, ISO se$\textbf{N}$sitivity, and $\textbf{AP}$erture), consisting of images of objects taken under controlled lighting conditions and with densely sampled camera settings. We then evaluate a large number of DL vision models and show the effects of capture conditions on each selected vision task. Lastly, we conduct an experiment to establish a human baseline for the VQA task. Our results show that computer vision datasets are significantly biased, the models trained on this data do not reach human accuracy even on the well-exposed images, and are susceptible to both major exposure changes and minute variations of camera settings. Code and data can be found at https://github.com/ykotseruba/SNAP

Related papers

Adaptive Camera Sensor for Vision Models [4.566795168995489]
Lens is a novel camera sensor control method that enhances model performance by capturing high-quality images from the model's perspective.<n>At its core, Lens utilizes VisiT, a training-free, model-specific quality indicator that evaluates individual unlabeled samples at test time.<n>To validate Lens, we introduce ImageNet-ES Diverse, a new benchmark dataset capturing natural perturbations from varying sensor and lighting conditions.
arXiv Detail & Related papers (2025-03-04T01:20:23Z)
Explorations in Self-Supervised Learning: Dataset Composition Testing for Object Classification [0.0]
We investigate the impact of sampling and pretraining using datasets with different image characteristics on the performance of self-supervised learning (SSL) models for object classification.<n>We find that depth pretrained models are more effective on low resolution images, while RGB pretrained models perform better on higher resolution images.
arXiv Detail & Related papers (2024-12-01T11:21:01Z)
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models. BVS supports a large number of adjustable parameters at the scene level. We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z)
Visual Context-Aware Person Fall Detection [52.49277799455569]
We present a segmentation pipeline to semi-automatically separate individuals and objects in images. Background objects such as beds, chairs, or wheelchairs can challenge fall detection systems, leading to false positive alarms. We demonstrate that object-specific contextual transformations during training effectively mitigate this challenge.
arXiv Detail & Related papers (2024-04-11T19:06:36Z)
Foveation in the Era of Deep Learning [6.602118206533142]
We introduce an end-to-end differentiable foveated active vision architecture that leverages a graph convolutional network to process foveated images. Our model learns to iteratively attend to regions of the image relevant for classification. We find that our model outperforms a state-of-the-art CNN and foveated vision architectures of comparable parameters and a given pixel or budget.
arXiv Detail & Related papers (2023-12-03T16:48:09Z)
An Ensemble Model for Distorted Images in Real Scenarios [0.0]
In this paper, we apply the object detector YOLOv7 to detect distorted images from the CDCOCO dataset. Through carefully designed optimizations, our model achieves excellent performance on the CDCOCO test set. Our denoising detection model can denoise and repair distorted images, making the model useful in a variety of real-world scenarios and environments.
arXiv Detail & Related papers (2023-09-26T15:12:55Z)
Ambiguous Images With Human Judgments for Robust Visual Event Classification [34.62731821199598]
We create datasets of ambiguous images and use them to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos. All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments. We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models.
arXiv Detail & Related papers (2022-10-06T17:52:20Z)
Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)
Validation of object detection in UAV-based images using synthetic data [9.189702268557483]
Machine learning (ML) models for UAV-based detection are often validated using data curated for tasks unrelated to the UAV application. Such errors arise due to differences in imaging conditions between images from UAVs and images in training. Our work is focused on understanding the impact of different UAV-based imaging conditions on detection performance by using synthetic data generated using a game engine.
arXiv Detail & Related papers (2022-01-17T20:56:56Z)
One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image. We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z)
Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations. We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.