Ambiguous Images With Human Judgments for Robust Visual Event
Classification
- URL: http://arxiv.org/abs/2210.03102v1
- Date: Thu, 6 Oct 2022 17:52:20 GMT
- Title: Ambiguous Images With Human Judgments for Robust Visual Event
Classification
- Authors: Kate Sanders, Reno Kriz, Anqi Liu, Benjamin Van Durme
- Abstract summary: We create datasets of ambiguous images and use them to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos.
All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments.
We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models.
- Score: 34.62731821199598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contemporary vision benchmarks predominantly consider tasks on which humans
can achieve near-perfect performance. However, humans are frequently presented
with visual data that they cannot classify with 100% certainty, and models
trained on standard vision benchmarks achieve low performance when evaluated on
this data. To address this issue, we introduce a procedure for creating
datasets of ambiguous images and use it to produce SQUID-E ("Squidy"), a
collection of noisy images extracted from videos. All images are annotated with
ground truth values and a test set is annotated with human uncertainty
judgments. We use this dataset to characterize human uncertainty in vision
tasks and evaluate existing visual event classification models. Experimental
results suggest that existing vision models are not sufficiently equipped to
provide meaningful outputs for ambiguous images and that datasets of this
nature can be used to assess and improve such models through model training and
direct evaluation of model calibration. These findings motivate large-scale
ambiguous dataset creation and further research focusing on noisy visual data.
Related papers
- LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - FLORIDA: Fake-looking Real Images Dataset [43.37813040320147]
We curated a dataset of 510 genuine images that exhibit a fake appearance and conducted an assessment using two AI models.
We show that two models exhibited subpar performance when applied to our dataset.
Our dataset can serve as a valuable tool for assessing the ability of deep learning models to comprehend complex visual stimuli.
arXiv Detail & Related papers (2023-10-29T23:25:10Z) - Revealing the Underlying Patterns: Investigating Dataset Similarity,
Performance, and Generalization [0.0]
Supervised deep learning models require significant amount of labeled data to achieve an acceptable performance on a specific task.
We establish image-image, dataset-dataset, and image-dataset distances to gain insights into the model's behavior.
arXiv Detail & Related papers (2023-08-07T13:35:53Z) - Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style.
Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction.
By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z) - DASH: Visual Analytics for Debiasing Image Classification via
User-Driven Synthetic Data Augmentation [27.780618650580923]
Image classification models often learn to predict a class based on irrelevant co-occurrences between input features and an output class in training data.
We call the unwanted correlations "data biases," and the visual features causing data biases "bias factors"
It is challenging to identify and mitigate biases automatically without human intervention.
arXiv Detail & Related papers (2022-09-14T00:44:41Z) - Vision Models Are More Robust And Fair When Pretrained On Uncurated
Images Without Supervision [38.22842778742829]
Discriminative self-supervised learning allows training models on any random group of internet images.
We train models on billions of random images without any data pre-processing or prior assumptions about what we want the model to learn.
We extensively study and validate our model performance on over 50 benchmarks including fairness, to distribution shift, geographical diversity, fine grained recognition, image copy detection and many image classification datasets.
arXiv Detail & Related papers (2022-02-16T22:26:47Z) - Benchmarking human visual search computational models in natural scenes:
models comparison and reference datasets [0.0]
We select publicly available state-of-the-art visual search models in natural scenes and evaluate them on different datasets.
We propose an improvement to the Ideal Bayesian Searcher through a combination with a neural network-based visual search model.
arXiv Detail & Related papers (2021-12-10T19:56:45Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.