Revealing Multimodal Causality with Large Language Models
- URL: http://arxiv.org/abs/2509.17784v2
- Date: Thu, 30 Oct 2025 01:43:48 GMT
- Title: Revealing Multimodal Causality with Large Language Models
- Authors: Jin Li, Shoujin Wang, Qi Zhang, Feng Liu, Tongliang Liu, Longbing Cao, Shui Yu, Fang Chen,
- Abstract summary: We propose MLLM-CD, a novel framework for multimodal causal discovery from unstructured data.<n>It consists of three key components: (1) a novel contrastive factor discovery module to identify genuine multimodal factors; (2) a statistical causal structure discovery module to infer causal relationships among discovered factors; and (3) an iterative multimodal counterfactual reasoning module to refine the discovery outcomes.<n>Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed MLLM-CD.
- Score: 80.95511545591107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Uncovering cause-and-effect mechanisms from data is fundamental to scientific progress. While large language models (LLMs) show promise for enhancing causal discovery (CD) from unstructured data, their application to the increasingly prevalent multimodal setting remains a critical challenge. Even with the advent of multimodal LLMs (MLLMs), their efficacy in multimodal CD is hindered by two primary limitations: (1) difficulty in exploring intra- and inter-modal interactions for comprehensive causal variable identification; and (2) insufficiency to handle structural ambiguities with purely observational data. To address these challenges, we propose MLLM-CD, a novel framework for multimodal causal discovery from unstructured data. It consists of three key components: (1) a novel contrastive factor discovery module to identify genuine multimodal factors based on the interactions explored from contrastive sample pairs; (2) a statistical causal structure discovery module to infer causal relationships among discovered factors; and (3) an iterative multimodal counterfactual reasoning module to refine the discovery outcomes iteratively by incorporating the world knowledge and reasoning capabilities of MLLMs. Extensive experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed MLLM-CD in revealing genuine factors and causal relationships among them from multimodal unstructured data.
Related papers
- Structured and Abstractive Reasoning on Multi-modal Relational Knowledge Images [58.553448128258566]
This paper bridges the dual gaps in large-scale high-quality data and capability enhancement methodologies.<n>We introduce STAR-64K, a dataset comprising 64K high-quality multi-modal instruction samples, and conduct experiments across 5 open-source MLLMs.
arXiv Detail & Related papers (2025-10-22T02:23:40Z) - Causal MAS: A Survey of Large Language Model Architectures for Discovery and Effect Estimation [5.062951330356307]
Large Language Models (LLMs) have demonstrated remarkable capabilities in various reasoning and generation tasks.<n>Their proficiency in complex causal reasoning, discovery, and estimation remains an area of active development.<n>Multi-agent systems, leveraging the collaborative or specialized abilities of multiple LLM-based agents, are emerging as a powerful paradigm to address these limitations.
arXiv Detail & Related papers (2025-08-31T20:48:31Z) - Multimodal Fine-grained Reasoning for Post Quality Evaluation [1.806315356676339]
We propose the Multimodal Fine-grained Topic-post Reasoning (MFTRR) framework, which mimics human cognitive processes.<n>MFTRR reframes post-quality assessment as a ranking task and incorporates multimodal data to better capture quality variations.
arXiv Detail & Related papers (2025-07-21T04:30:50Z) - Deconfounded Reasoning for Multimodal Fake News Detection via Causal Intervention [16.607714608483164]
The rapid growth of social media has led to the widespread dissemination of fake news across multiple content forms.<n>Traditional unimodal detection methods fall short in addressing complex cross-modal manipulations.<n>We propose the Causal Intervention-based Multimodal Decon Detection framework.
arXiv Detail & Related papers (2025-04-12T09:57:43Z) - Exploring Multi-Modal Data with Tool-Augmented LLM Agents for Precise Causal Discovery [45.777770849667775]
We introduce MATMCD, a multi-agent system powered by tool-augmented LLMs.<n>Our empirical study suggests the significant potential of multi-modality enhanced causal discovery.
arXiv Detail & Related papers (2024-12-18T09:50:00Z) - Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework [58.362064122489166]
This paper introduces the Cross-modal Few-Shot Learning task, which aims to recognize instances across multiple modalities while relying on scarce labeled data.<n>We propose a Generative Transfer Learning framework by simulating how humans abstract and generalize concepts.<n>We show that the GTL achieves state-of-the-art performance across seven multi-modal datasets across RGB-Sketch, RGB-Infrared, and RGB-Depth.
arXiv Detail & Related papers (2024-10-14T16:09:38Z) - Multi-Agent Causal Discovery Using Large Language Models [10.020595983728482]
Causal discovery is a critical research area in machine learning.<n>We introduce the Multi-Agent Causal Discovery Framework (MAC)<n>It consists of two key modules: the Debate-Coding Module (DCM) and the Meta-Debate Module (MDM)
arXiv Detail & Related papers (2024-07-21T06:21:47Z) - Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z) - Discovery of the Hidden World with Large Language Models [95.58823685009727]
This paper presents Causal representatiOn AssistanT (COAT) that introduces large language models (LLMs) to bridge the gap.
LLMs are trained on massive observations of the world and have demonstrated great capability in extracting key information from unstructured data.
COAT also adopts CDs to find causal relations among the identified variables as well as to provide feedback to LLMs to iteratively refine the proposed factors.
arXiv Detail & Related papers (2024-02-06T12:18:54Z) - Quantifying & Modeling Multimodal Interactions: An Information
Decomposition Framework [89.8609061423685]
We propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task.
To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks.
We demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies.
arXiv Detail & Related papers (2023-02-23T18:59:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.