Causality Model for Semantic Understanding on Videos
- URL: http://arxiv.org/abs/2503.12447v1
- Date: Sun, 16 Mar 2025 10:44:11 GMT
- Title: Causality Model for Semantic Understanding on Videos
- Authors: Li Yicong,
- Abstract summary: This thesis focuses on the domain of semantic video understanding.<n>It explores the potential of causal modeling to advance two fundamental tasks: Video Relation Detection (VidVRD) and Video Question Answering (VideoQA)
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: After a decade of prosperity, the development of video understanding has reached a critical juncture, where the sole reliance on massive data and complex architectures is no longer a one-size-fits-all solution to all situations. The presence of ubiquitous data imbalance hampers DNNs from effectively learning the underlying causal mechanisms, leading to significant performance drops when encountering distribution shifts, such as long-tail imbalances and perturbed imbalances. This realization has prompted researchers to seek alternative methodologies to capture causal patterns in video data. To tackle these challenges and increase the robustness of DNNs, causal modeling emerged as a principle to discover the true causal patterns behind the observed correlations. This thesis focuses on the domain of semantic video understanding and explores the potential of causal modeling to advance two fundamental tasks: Video Relation Detection (VidVRD) and Video Question Answering (VideoQA).
Related papers
- A Causal Adjustment Module for Debiasing Scene Graph Generation [28.44150555570101]
We employ causal inference techniques to model the causality among skewed distributions.
Our method enables the composition of zero-shot relationships, thereby enhancing the model's ability to recognize such relationships.
arXiv Detail & Related papers (2025-03-22T20:44:01Z) - VACT: A Video Automatic Causal Testing System and a Benchmark [55.53300306960048]
VACT is an **automated** framework for modeling, evaluating, and measuring the causal understanding of VGMs in real-world scenarios.<n>We introduce multi-level causal evaluation metrics to provide a detailed analysis of the causal performance of VGMs.
arXiv Detail & Related papers (2025-03-08T10:54:42Z) - Finding the Trigger: Causal Abductive Reasoning on Video Events [59.188208873301015]
Causal Abductive Reasoning on Video Events (CARVE) involves identifying causal relationships between events in a video.<n>We present a Causal Event Relation Network (CERN) that examines the relationships between video events in temporal and semantic spaces.
arXiv Detail & Related papers (2025-01-16T05:39:28Z) - Admitting Ignorance Helps the Video Question Answering Models to Answer [82.22149677979189]
We argue that models often establish shortcuts, resulting in spurious correlations between questions and answers.<n>We propose a novel training framework in which the model is compelled to acknowledge its ignorance when presented with an intervened question.<n>In practice, we integrate a state-of-the-art model into our framework to validate its effectiveness.
arXiv Detail & Related papers (2025-01-15T12:44:52Z) - DIVD: Deblurring with Improved Video Diffusion Model [8.816046910904488]
Diffusion models and video diffusion models have excelled in the fields of image and video generation.<n>We introduce a video diffusion model specifically for the task of video deblurring.<n>Our model outperforms existing models and achieves state-of-the-art results on a range of perceptual metrics.
arXiv Detail & Related papers (2024-12-01T11:39:02Z) - STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training [87.58996020705258]
Video Large Language Models (Video-LLMs) have recently shown strong derivation in basic video understanding tasks.<n>Video-LLMs struggle with compositional reasoning that requires multi-step explicit-temporal inference across object relations, interactions and events.<n>We propose STEP, a novel graph-guided self-training method that enables VideoLLMs to generate reasoning-rich finetuning data from any raw videos to improve itself.
arXiv Detail & Related papers (2024-11-29T11:54:55Z) - Towards Causal Foundation Model: on Duality between Causal Inference and Attention [18.046388712804042]
We take a first step towards building causally-aware foundation models for treatment effect estimations.
We propose a novel, theoretically justified method called Causal Inference with Attention (CInA)
arXiv Detail & Related papers (2023-10-01T22:28:34Z) - Modeling Causal Mechanisms with Diffusion Models for Interventional and Counterfactual Queries [10.818661865303518]
We consider the problem of answering observational, interventional, and counterfactual queries in a causally sufficient setting.
We introduce diffusion-based causal models (DCM) to learn causal mechanisms, that generate unique latent encodings.
Our empirical evaluations demonstrate significant improvements over existing state-of-the-art methods for answering causal queries.
arXiv Detail & Related papers (2023-02-02T04:08:08Z) - iReason: Multimodal Commonsense Reasoning using Videos and Natural
Language with Interpretability [0.0]
Causality knowledge is vital to building robust AI systems.
We propose iReason, a framework that infers visual-semantic commonsense knowledge using both videos and natural language captions.
arXiv Detail & Related papers (2021-06-25T02:56:34Z) - Deconfounded Video Moment Retrieval with Causal Intervention [80.90604360072831]
We tackle the task of video moment retrieval (VMR), which aims to localize a specific moment in a video according to a textual query.
Existing methods primarily model the matching relationship between query and moment by complex cross-modal interactions.
We propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction.
arXiv Detail & Related papers (2021-06-03T01:33:26Z) - Learning Causal Models Online [103.87959747047158]
Predictive models can rely on spurious correlations in the data for making predictions.
One solution for achieving strong generalization is to incorporate causal structures in the models.
We propose an online algorithm that continually detects and removes spurious features.
arXiv Detail & Related papers (2020-06-12T20:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.