MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
- URL: http://arxiv.org/abs/2409.17647v3
- Date: Sun, 27 Oct 2024 06:48:08 GMT
- Title: MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning
- Authors: Tieyuan Chen, Huabin Liu, Tianyao He, Yihang Chen, Chaofan Gan, Xiao Ma, Cheng Zhong, Yang Zhang, Yingxue Wang, Hui Lin, Weiyao Lin,
- Abstract summary: We introduce a new task and dataset, Multi-Event Causal Discovery (MECD)
It aims to uncover the causal relationships between events distributed chronologically across long videos.
We devise a novel framework inspired by the Granger Causality method, using an efficient mask-based event prediction model.
- Score: 23.928977574352796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video causal reasoning aims to achieve a high-level understanding of video content from a causal perspective. However, current video reasoning tasks are limited in scope, primarily executed in a question-answering paradigm and focusing on short videos containing only a single event and simple causal relationships, lacking comprehensive and structured causality analysis for videos with multiple events. To fill this gap, we introduce a new task and dataset, Multi-Event Causal Discovery (MECD). It aims to uncover the causal relationships between events distributed chronologically across long videos. Given visual segments and textual descriptions of events, MECD requires identifying the causal associations between these events to derive a comprehensive, structured event-level video causal diagram explaining why and how the final result event occurred. To address MECD, we devise a novel framework inspired by the Granger Causality method, using an efficient mask-based event prediction model to perform an Event Granger Test, which estimates causality by comparing the predicted result event when premise events are masked versus unmasked. Furthermore, we integrate causal inference techniques such as front-door adjustment and counterfactual inference to address challenges in MECD like causality confounding and illusory causality. Experiments validate the effectiveness of our framework in providing causal relationships in multi-event videos, outperforming GPT-4o and VideoLLaVA by 5.7% and 4.1%, respectively.
Related papers
- Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task.
We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities.
Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z) - EventHallusion: Diagnosing Event Hallucinations in Video LLMs [80.00303150568696]
We first propose EventHallusion, a novel benchmark that focuses on assessing the VideoLMMs' hallucination phenomenon on video event comprehension.
Based on the observation that existing VideoLLMs are entangled with the priors stemming from their foundation models, our EventHallusion is curated by meticulously collecting videos and annotating questions.
We also propose a simple yet effective method, called Temporal Contrastive Decoding (TCD), to tackle the hallucination problems of VideoLLMs.
arXiv Detail & Related papers (2024-09-25T03:49:46Z) - Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM [47.786978666537436]
We propose a Two-Stage Prefix-Enhanced MLLM (TSPE) approach for event attribution in movie videos.
In the local stage, we introduce an interaction-aware prefix that guides the model to focus on the relevant multimodal information within a single clip.
In the global stage, we strengthen the connections between associated events using an inferential knowledge graph.
arXiv Detail & Related papers (2024-09-14T08:30:59Z) - Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering [30.000134835133522]
Event Causality Identification (DECI) aims to identify causal relations between two events in documents.
Recent research tends to use pre-trained language models to generate the event causal relations.
We propose a multi-task learning framework to enhance event causality identification with rationale and structure-aware causal question answering.
arXiv Detail & Related papers (2024-03-17T07:41:58Z) - Glance and Focus: Memory Prompting for Multi-Event Video Question
Answering [36.00733800536469]
VideoQA has emerged as a vital tool to evaluate agents' ability to understand human daily behaviors.
Humans can easily tackle it by using a series of episode memories as anchors to quickly locate question-related key moments for reasoning.
We propose the Glance-Focus model to mimic this effective reasoning strategy.
arXiv Detail & Related papers (2024-01-03T03:51:16Z) - Event Causality Extraction with Event Argument Correlations [13.403222002600558]
Event Causality Extraction aims to extract cause-effect event causality pairs from plain texts.
We propose a method with a dual grid tagging scheme to capture the intra- and inter-event argument correlations for ECE.
arXiv Detail & Related papers (2023-01-27T09:48:31Z) - Unifying Event Detection and Captioning as Sequence Generation via
Pre-Training [53.613265415703815]
We propose a unified pre-training and fine-tuning framework to enhance the inter-task association between event detection and captioning.
Our model outperforms the state-of-the-art methods, and can be further boosted when pre-trained on extra large-scale video-text data.
arXiv Detail & Related papers (2022-07-18T14:18:13Z) - EA$^2$E: Improving Consistency with Event Awareness for Document-Level
Argument Extraction [52.43978926985928]
We introduce the Event-Aware Argument Extraction (EA$2$E) model with augmented context for training and inference.
Experiment results on WIKIEVENTS and ACE2005 datasets demonstrate the effectiveness of EA$2$E.
arXiv Detail & Related papers (2022-05-30T04:33:51Z) - ClarET: Pre-training a Correlation-Aware Context-To-Event Transformer
for Event-Centric Generation and Classification [74.6318379374801]
We propose to pre-train a general Correlation-aware context-to-Event Transformer (ClarET) for event-centric reasoning.
The proposed ClarET is applicable to a wide range of event-centric reasoning scenarios.
arXiv Detail & Related papers (2022-03-04T10:11:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.