Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies between Model Predictions and Human Responses in VQA
- URL: http://arxiv.org/abs/2410.02773v1
- Date: Tue, 17 Sep 2024 13:44:25 GMT
- Title: Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies between Model Predictions and Human Responses in VQA
- Authors: Jian Lan, Diego Frassinelli, Barbara Plank,
- Abstract summary: This study focuses on the Visual Question Answering (VQA) task.
We evaluate how well vision-language models correlate with the distribution of human responses.
- Score: 26.968874222330978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large vision-language models frequently struggle to accurately predict responses provided by multiple human annotators, particularly when those responses exhibit human uncertainty. In this study, we focus on the Visual Question Answering (VQA) task, and we comprehensively evaluate how well the state-of-the-art vision-language models correlate with the distribution of human responses. To do so, we categorize our samples based on their levels (low, medium, high) of human uncertainty in disagreement (HUD) and employ not only accuracy but also three new human-correlated metrics in VQA, to investigate the impact of HUD. To better align models with humans, we also verify the effect of common calibration and human calibration. Our results show that even BEiT3, currently the best model for this task, struggles to capture the multi-label distribution inherent in diverse human responses. Additionally, we observe that the commonly used accuracy-oriented calibration technique adversely affects BEiT3's ability to capture HUD, further widening the gap between model predictions and human distributions. In contrast, we show the benefits of calibrating models towards human distributions for VQA, better aligning model confidence with human uncertainty. Our findings highlight that for VQA, the consistent alignment between human responses and model predictions is understudied and should become the next crucial target of future studies.
Related papers
- How Aligned are Generative Models to Humans in High-Stakes Decision-Making? [10.225573060836478]
Large generative models (LMs) are increasingly being considered for high-stakes decision-making.
This work considers how such models compare to humans and predictive AI models on a specific case of recidivism prediction.
arXiv Detail & Related papers (2024-10-20T19:00:59Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Using LLMs to Model the Beliefs and Preferences of Targeted Populations [4.0849074543032105]
We consider the problem of aligning a large language model (LLM) to model the preferences of a human population.
Modeling the beliefs, preferences, and behaviors of a specific population can be useful for a variety of different applications.
arXiv Detail & Related papers (2024-03-29T15:58:46Z) - It HAS to be Subjective: Human Annotator Simulation via Zero-shot
Density Estimation [15.8765167340819]
Human annotator simulation (HAS) serves as a cost-effective substitute for human evaluation such as data annotation and system assessment.
Human perception and behaviour during human evaluation exhibit inherent variability due to diverse cognitive processes and subjective interpretations.
This paper introduces a novel meta-learning framework that treats HAS as a zero-shot density estimation problem.
arXiv Detail & Related papers (2023-09-30T20:54:59Z) - Toward Reliable Human Pose Forecasting with Uncertainty [51.628234388046195]
We develop an open-source library for human pose forecasting, including multiple models, supporting several datasets.
We devise two types of uncertainty in the problem to increase performance and convey better trust.
arXiv Detail & Related papers (2023-04-13T17:56:08Z) - Investigations of Performance and Bias in Human-AI Teamwork in Hiring [30.046502708053097]
In AI-assisted decision-making, effective hybrid teamwork (human-AI) is not solely dependent on AI performance alone.
We investigate how both a model's predictive performance and bias may transfer to humans in a recommendation-aided decision task.
arXiv Detail & Related papers (2022-02-21T17:58:07Z) - Empirical Estimates on Hand Manipulation are Recoverable: A Step Towards
Individualized and Explainable Robotic Support in Everyday Activities [80.37857025201036]
Key challenge for robotic systems is to figure out the behavior of another agent.
Processing correct inferences is especially challenging when (confounding) factors are not controlled experimentally.
We propose equipping robots with the necessary tools to conduct observational studies on people.
arXiv Detail & Related papers (2022-01-27T22:15:56Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Model-agnostic Fits for Understanding Information Seeking Patterns in
Humans [0.0]
In decision making tasks under uncertainty, humans display characteristic biases in seeking, integrating, and acting upon information relevant to the task.
Here, we reexamine data from previous carefully designed experiments, collected at scale, that measured and catalogued these biases in aggregate form.
We design deep learning models that replicate these biases in aggregate, while also capturing individual variation in behavior.
arXiv Detail & Related papers (2020-12-09T04:34:58Z) - Are Visual Explanations Useful? A Case Study in Model-in-the-Loop
Prediction [49.254162397086006]
We study explanations based on visual saliency in an image-based age prediction task.
We find that presenting model predictions improves human accuracy.
However, explanations of various kinds fail to significantly alter human accuracy or trust in the model.
arXiv Detail & Related papers (2020-07-23T20:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.