Finding the Trigger: Causal Abductive Reasoning on Video Events
- URL: http://arxiv.org/abs/2501.09304v1
- Date: Thu, 16 Jan 2025 05:39:28 GMT
- Title: Finding the Trigger: Causal Abductive Reasoning on Video Events
- Authors: Thao Minh Le, Vuong Le, Kien Do, Sunil Gupta, Svetha Venkatesh, Truyen Tran,
- Abstract summary: Causal Abductive Reasoning on Video Events (CARVE) involves identifying causal relationships between events in a video.
We present a Causal Event Relation Network (CERN) that examines the relationships between video events in temporal and semantic spaces.
- Score: 59.188208873301015
- License:
- Abstract: This paper introduces a new problem, Causal Abductive Reasoning on Video Events (CARVE), which involves identifying causal relationships between events in a video and generating hypotheses about causal chains that account for the occurrence of a target event. To facilitate research in this direction, we create two new benchmark datasets with both synthetic and realistic videos, accompanied by trigger-target labels generated through a novel counterfactual synthesis approach. To explore the challenge of solving CARVE, we present a Causal Event Relation Network (CERN) that examines the relationships between video events in temporal and semantic spaces to efficiently determine the root-cause trigger events. Through extensive experiments, we demonstrate the critical roles of event relational representation learning and interaction modeling in solving video causal reasoning challenges. The introduction of the CARVE task, along with the accompanying datasets and the CERN framework, will advance future research on video causal reasoning and significantly facilitate various applications, including video surveillance, root-cause analysis and movie content management.
Related papers
- MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning [16.209265930309854]
We introduce a new task and dataset, Multi-Event Causal Discovery (MECD)
It aims to uncover the causal relations between events distributed chronologically across long videos.
We devise a novel framework inspired by the Granger Causality method, incorporating an efficient mask-based event prediction model.
arXiv Detail & Related papers (2025-01-13T11:28:49Z) - STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training [87.58996020705258]
Video Large Language Models (Video-LLMs) have recently shown strong derivation in basic video understanding tasks.
Video-LLMs struggle with compositional reasoning that requires multi-step explicit-temporal inference across object relations, interactions and events.
We propose STEP, a novel graph-guided self-training method that enables VideoLLMs to generate reasoning-rich finetuning data from any raw videos to improve itself.
arXiv Detail & Related papers (2024-11-29T11:54:55Z) - MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning [23.928977574352796]
We introduce a new task and dataset, Multi-Event Causal Discovery (MECD)
It aims to uncover the causal relationships between events distributed chronologically across long videos.
We devise a novel framework inspired by the Granger Causality method, using an efficient mask-based event prediction model.
arXiv Detail & Related papers (2024-09-26T08:51:29Z) - Harnessing Temporal Causality for Advanced Temporal Action Detection [53.654457142657236]
We introduce CausalTAD, which combines causal attention and causal Mamba to achieve state-of-the-art performance on benchmarks.
We ranked 1st in the Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024, and 1st in the Moment Queries track at the Ego4D Challenge 2024.
arXiv Detail & Related papers (2024-07-25T06:03:02Z) - Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering [30.000134835133522]
Event Causality Identification (DECI) aims to identify causal relations between two events in documents.
Recent research tends to use pre-trained language models to generate the event causal relations.
We propose a multi-task learning framework to enhance event causality identification with rationale and structure-aware causal question answering.
arXiv Detail & Related papers (2024-03-17T07:41:58Z) - Cross-Modal Reasoning with Event Correlation for Video Question
Answering [32.332251488360185]
We introduce the dense caption modality as a new auxiliary and distill event-correlated information from it to infer the correct answer.
We employ cross-modal reasoning modules for explicitly modeling inter-modal relationships and aggregating relevant information across different modalities.
We propose a question-guided self-adaptive multi-modal fusion module to collect the question-oriented and event-correlated evidence through multi-step reasoning.
arXiv Detail & Related papers (2023-12-20T02:30:39Z) - Visual Causal Scene Refinement for Video Question Answering [117.08431221482638]
We present a causal analysis of VideoQA and propose a framework for cross-modal causal reasoning, named Visual Causal Scene Refinement (VCSR)
Our VCSR involves two essential modules, which refines consecutive video frames guided by the question semantics to obtain more representative segment features for causal front-door intervention.
Experiments on the NExT-QA, Causal-VidQA, and MSRVTT-QA datasets demonstrate the superiority of our VCSR in discovering visual causal scene and achieving robust video question answering.
arXiv Detail & Related papers (2023-05-07T09:05:19Z) - Causalainer: Causal Explainer for Automatic Video Summarization [77.36225634727221]
In many application scenarios, improper video summarization can have a large impact.
Modeling explainability is a key concern.
A Causal Explainer, dubbed Causalainer, is proposed to address this issue.
arXiv Detail & Related papers (2023-04-30T11:42:06Z) - Event Causality Extraction with Event Argument Correlations [13.403222002600558]
Event Causality Extraction aims to extract cause-effect event causality pairs from plain texts.
We propose a method with a dual grid tagging scheme to capture the intra- and inter-event argument correlations for ECE.
arXiv Detail & Related papers (2023-01-27T09:48:31Z) - Weakly-Supervised Video Object Grounding via Causal Intervention [82.68192973503119]
We target at the task of weakly-supervised video object grounding (WSVOG), where only video-sentence annotations are available during model learning.
It aims to localize objects described in the sentence to visual regions in the video, which is a fundamental capability needed in pattern analysis and machine learning.
arXiv Detail & Related papers (2021-12-01T13:13:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.