RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
- URL: http://arxiv.org/abs/2412.13835v1
- Date: Wed, 18 Dec 2024 13:25:11 GMT
- Title: RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
- Authors: Alberto Testoni, Barbara Plank, Raquel Fernández,
- Abstract summary: We introduce RACQUET, a dataset targeting distinct aspects of ambiguity in image-based question answering.<n>We reveal significant limitations and problems of overconfidence of state-of-the-art large multimodal language models in addressing ambiguity in their responses.<n>Our results underscore the urgency of equipping models with robust strategies to deal with uncertainty without resorting to undesirable stereotypes.
- Score: 29.832360523402592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ambiguity resolution is key to effective communication. While humans effortlessly address ambiguity through conversational grounding strategies, the extent to which current language models can emulate these strategies remains unclear. In this work, we examine referential ambiguity in image-based question answering by introducing RACQUET, a carefully curated dataset targeting distinct aspects of ambiguity. Through a series of evaluations, we reveal significant limitations and problems of overconfidence of state-of-the-art large multimodal language models in addressing ambiguity in their responses. The overconfidence issue becomes particularly relevant for RACQUET-BIAS, a subset designed to analyze a critical yet underexplored problem: failing to address ambiguity leads to stereotypical, socially biased responses. Our results underscore the urgency of equipping models with robust strategies to deal with uncertainty without resorting to undesirable stereotypes.
Related papers
- Adaptive Elicitation of Latent Information Using Natural Language [6.162198958758635]
We propose an adaptive elicitation framework that actively reduces uncertainty on the latent entity.
Our framework adopts a predictive view of uncertainty, using a meta-learned language model to simulate future observations.
In experiments on the 20 questions game, dynamic opinion polling, and adaptive student assessment, our method consistently outperforms baselines in identifying critical unknowns.
arXiv Detail & Related papers (2025-04-05T15:18:55Z) - Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) [66.51642638034822]
Reasoning is central to human intelligence, enabling structured problem-solving across diverse tasks.
Recent advances in large language models (LLMs) have greatly enhanced their reasoning abilities in arithmetic, commonsense, and symbolic domains.
This paper offers a concise yet insightful overview of reasoning techniques in both textual and multimodal LLMs.
arXiv Detail & Related papers (2025-04-04T04:04:56Z) - Survey of Adversarial Robustness in Multimodal Large Language Models [17.926240920647892]
Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in artificial intelligence.
Their deployment in real-world applications raises significant concerns about adversarial vulnerabilities.
This paper reviews the adversarial robustness of MLLMs, covering different modalities.
arXiv Detail & Related papers (2025-03-18T06:54:59Z) - Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.
Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.
We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z) - Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach [30.9778838504609]
Vision-language pretraining with transformers has demonstrated exceptional performance across numerous multimodal tasks.
Existing multimodal attack methods have largely overlooked cross-modal interactions between visual and textual modalities.
We propose a novel Joint Multimodal Transformer Feature Attack (JMTFA) that concurrently introduces adversarial perturbations in both visual and textual modalities.
arXiv Detail & Related papers (2024-08-24T04:31:37Z) - Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity)
Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z) - How to Understand "Support"? An Implicit-enhanced Causal Inference
Approach for Weakly-supervised Phrase Grounding [18.97081348819219]
Weakly-supervised Phrase Grounding (WPG) is an emerging task of inferring the fine-grained phrase-region matching.
This paper proposes an Implicit-Enhanced Causal Inference approach to address the challenges of modeling the implicit relations.
arXiv Detail & Related papers (2024-02-29T12:49:48Z) - Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks [0.6282171844772422]
An increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications.<n>The recent discovery of named entities as adversarial examples in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs.<n>We developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge.
arXiv Detail & Related papers (2024-02-16T09:29:38Z) - Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty [53.336235704123915]
We investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties.
We find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses.
We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations.
Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty.
arXiv Detail & Related papers (2024-01-12T18:03:30Z) - Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs)
Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process.
We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z) - Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models [27.491408293411734]
Large Language Models (LLMs) show promising results in language generation and instruction following but frequently "hallucinate"
Our research introduces a simple redundancy: not all tokens in auto-regressive text equally represent the underlying meaning.
arXiv Detail & Related papers (2023-07-03T22:17:16Z) - We're Afraid Language Models Aren't Modeling Ambiguity [136.8068419824318]
Managing ambiguity is a key part of human language understanding.
We characterize ambiguity in a sentence by its effect on entailment relations with another sentence.
We show that a multilabel NLI model can flag political claims in the wild that are misleading due to ambiguity.
arXiv Detail & Related papers (2023-04-27T17:57:58Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - Dive into Ambiguity: Latent Distribution Mining and Pairwise Uncertainty
Estimation for Facial Expression Recognition [59.52434325897716]
We propose a solution, named DMUE, to address the problem of annotation ambiguity from two perspectives.
For the former, an auxiliary multi-branch learning framework is introduced to better mine and describe the latent distribution in the label space.
For the latter, the pairwise relationship of semantic feature between instances are fully exploited to estimate the ambiguity extent in the instance space.
arXiv Detail & Related papers (2021-04-01T03:21:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.