Related papers: Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

URL: http://arxiv.org/abs/2602.20878v1
Date: Tue, 24 Feb 2026 13:20:07 GMT
Title: Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs
Authors: Dhita Putri Pratama, Soyeon Caren Han, Yihao Ding,
Abstract summary: We introduce Vision-Language Causal Graphs (VLCGs), a structured, query-conditioned representation that explicitly encodes causally relevant objects, attributes, relations, and scene-grounded assumptions.<n>We present ViLCaR, a diagnostic benchmark comprising tasks for Causal Attribution, Causal Inference, and Question Answering, along with graph-aligned evaluation metrics.<n> Experiments in state-of-the-art LVLMs show that injecting structured relevance information significantly improves attribution and inference compared to zero-shot and standard in-context learning.
Score: 18.83755844366017
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Vision-Language Models (LVLMs) achieve strong performance on visual question answering benchmarks, yet often rely on spurious correlations rather than genuine causal reasoning. Existing evaluations primarily assess the correctness of the answers, making it unclear whether failures arise from limited reasoning capability or from misidentifying causally relevant information. We introduce Vision-Language Causal Graphs (VLCGs), a structured, query-conditioned representation that explicitly encodes causally relevant objects, attributes, relations, and scene-grounded assumptions. Building on this representation, we present ViLCaR, a diagnostic benchmark comprising tasks for Causal Attribution, Causal Inference, and Question Answering, along with graph-aligned evaluation metrics that assess relevance identification beyond final answer accuracy. Experiments in state-of-the-art LVLMs show that injecting structured relevance information significantly improves attribution and inference consistency compared to zero-shot and standard in-context learning. These findings suggest that current limitations in LVLM causal reasoning stem primarily from insufficient structural guidance rather than a lack of reasoning capacity.

Related papers

Can Unified Generation and Understanding Models Maintain Semantic Equivalence Across Different Output Modalities? [61.533560295383786]
Unified Multimodal Large Language Models (U-MLLMs) integrate understanding and generation within a single architecture.<n>We observe that U-MLLMs fail to maintain semantic equivalence when required to render the same results in the image modality.<n>We introduce VGUBench, a framework to decouple reasoning logic from generation fidelity.
arXiv Detail & Related papers (2026-02-27T06:23:56Z)
CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching [50.65932158912512]
We propose a new causal reasoning benchmark, CausalFlip, to encourage the development of new large language models.<n>CaulFlip consists of causal judgment questions built over event triples that could form different confounder, chain, and collider relations.<n>We evaluate LLMs under multiple training paradigms, including answer-only training, explicit Chain-of-Thought supervision, and a proposed internalized causal reasoning approach.
arXiv Detail & Related papers (2026-02-23T18:06:15Z)
Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification [56.51953062869371]
DoVerifier is a symbolic verifier that checks whether causal expressions are derivable from a given causal graph using rules from do-calculus and probability theory.<n>Our evaluations on synthetic data and causal QA benchmarks show that DoVerifier more accurately captures semantic correctness of causal reasoning traces.
arXiv Detail & Related papers (2026-01-29T03:22:58Z)
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning [62.23671919314693]
Large language models (LLMs) have demonstrated significant improvements in contextual understanding.<n>However, their ability to attend to truly critical information during long-context reasoning and generation still falls behind the pace.<n>We introduce a two-stage framework called Learning to Focus (LeaF) to mitigate confounding factors.
arXiv Detail & Related papers (2025-06-09T15:16:39Z)
What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning [26.671128120554457]
causal reasoning is fundamental to solving complex high-level reasoning tasks.<n>Existing benchmarks often include a mixture of reasoning questions.<n>We introduce VQA-Causal and VCR-Causal to isolate and rigorously evaluate causal reasoning abilities.
arXiv Detail & Related papers (2025-06-01T07:17:46Z)
The Third Pillar of Causal Analysis? A Measurement Perspective on Causal Representations [23.129188507631284]
Causal reasoning and discovery often face challenges due to the complexity, noisiness, and high-dimensionality of real-world data.<n>What makes learned representations useful for causal downstream tasks and how to evaluate them are still not well understood.
arXiv Detail & Related papers (2025-05-23T10:25:17Z)
Hallucination Detection in LLMs with Topological Divergence on Attention Graphs [60.83579255387347]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.<n>We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z)
A Critical Review of Causal Reasoning Benchmarks for Large Language Models [2.1311710788645617]
We present a comprehensive overview of LLM benchmarks for causality. We derive a set of criteria that a useful benchmark or set of benchmarks should aim to satisfy.
arXiv Detail & Related papers (2024-07-10T20:11:51Z)
Cause and Effect: Can Large Language Models Truly Understand Causality? [1.2334534968968969]
This research proposes a novel architecture called Context Aware Reasoning Enhancement with Counterfactual Analysis(CARE CA) framework. The proposed framework incorporates an explicit causal detection module with ConceptNet and counterfactual statements, as well as implicit causal detection through Large Language Models. The knowledge from ConceptNet enhances the performance of multiple causal reasoning tasks such as causal discovery, causal identification and counterfactual reasoning.
arXiv Detail & Related papers (2024-02-28T08:02:14Z)
Everything Has a Cause: Leveraging Causal Inference in Legal Text Analysis [62.44432226563088]
Causal inference is the process of capturing cause-effect relationship among variables. We propose a novel Graph-based Causal Inference framework, which builds causal graphs from fact descriptions without much human involvement. We observe that the causal knowledge contained in GCI can be effectively injected into powerful neural networks for better performance and interpretability.
arXiv Detail & Related papers (2021-04-19T16:13:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.