Causalainer: Causal Explainer for Automatic Video Summarization
- URL: http://arxiv.org/abs/2305.00455v1
- Date: Sun, 30 Apr 2023 11:42:06 GMT
- Title: Causalainer: Causal Explainer for Automatic Video Summarization
- Authors: Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Min-Hung Chen, Marcel
Worring
- Abstract summary: In many application scenarios, improper video summarization can have a large impact.
Modeling explainability is a key concern.
A Causal Explainer, dubbed Causalainer, is proposed to address this issue.
- Score: 77.36225634727221
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The goal of video summarization is to automatically shorten videos such that
it conveys the overall story without losing relevant information. In many
application scenarios, improper video summarization can have a large impact.
For example in forensics, the quality of the generated video summary will
affect an investigator's judgment while in journalism it might yield undesired
bias. Because of this, modeling explainability is a key concern. One of the
best ways to address the explainability challenge is to uncover the causal
relations that steer the process and lead to the result. Current machine
learning-based video summarization algorithms learn optimal parameters but do
not uncover causal relationships. Hence, they suffer from a relative lack of
explainability. In this work, a Causal Explainer, dubbed Causalainer, is
proposed to address this issue. Multiple meaningful random variables and their
joint distributions are introduced to characterize the behaviors of key
components in the problem of video summarization. In addition, helper
distributions are introduced to enhance the effectiveness of model training. In
visual-textual input scenarios, the extra input can decrease the model
performance. A causal semantics extractor is designed to tackle this issue by
effectively distilling the mutual information from the visual and textual
inputs. Experimental results on commonly used benchmarks demonstrate that the
proposed method achieves state-of-the-art performance while being more
explainable.
Related papers
- Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Inducing Causal Structure for Abstractive Text Summarization [76.1000380429553]
We introduce a Structural Causal Model (SCM) to induce the underlying causal structure of the summarization data.
We propose a Causality Inspired Sequence-to-Sequence model (CI-Seq2Seq) to learn the causal representations that can mimic the causal factors.
Experimental results on two widely used text summarization datasets demonstrate the advantages of our approach.
arXiv Detail & Related papers (2023-08-24T16:06:36Z) - Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z) - Program Generation from Diverse Video Demonstrations [49.202289347899836]
Generalising over multiple observations is a task that has historically presented difficulties for machines to grasp.
We propose a model that can extract general rules from video demonstrations by simultaneously performing summarisation and translation.
arXiv Detail & Related papers (2023-02-01T01:51:45Z) - Invariant Grounding for Video Question Answering [72.87173324555846]
Video Question Answering (VideoQA) is the task of answering questions about a video.
In leading VideoQA models, the typical learning objective, empirical risk minimization (ERM), latches on superficial correlations between video-question pairs and answers.
We propose a new learning framework, Invariant Grounding for VideoQA (IGV), to ground the question-critical scene.
arXiv Detail & Related papers (2022-06-06T04:37:52Z) - iReason: Multimodal Commonsense Reasoning using Videos and Natural
Language with Interpretability [0.0]
Causality knowledge is vital to building robust AI systems.
We propose iReason, a framework that infers visual-semantic commonsense knowledge using both videos and natural language captions.
arXiv Detail & Related papers (2021-06-25T02:56:34Z) - How Good is a Video Summary? A New Benchmarking Dataset and Evaluation
Framework Towards Realistic Video Summarization [11.320914099324492]
We introduce a new benchmarking video dataset called VISIOCITY which comprises of longer videos across six different categories.
We show strategies to automatically generate multiple reference summaries from indirect ground truth present in VISIOCITY.
We propose an evaluation framework for better quantitative assessment of summary quality which is closer to human judgment.
arXiv Detail & Related papers (2021-01-26T01:42:55Z) - Dependency Decomposition and a Reject Option for Explainable Models [4.94950858749529]
Recent deep learning models perform extremely well in various inference tasks.
Recent advances offer methods to visualize features, describe attribution of the input.
We present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs.
arXiv Detail & Related papers (2020-12-11T17:39:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.