Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning
for Video Question Answering
- URL: http://arxiv.org/abs/2401.01510v1
- Date: Wed, 3 Jan 2024 02:29:34 GMT
- Title: Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning
for Video Question Answering
- Authors: Haopeng Li, Qiuhong Ke, Mingming Gong, and Tom Drummond
- Abstract summary: We introduce the concept of uncertainty-aware curriculum learning (CL)
Here, uncertainty serves as the guiding principle for dynamically adjusting the difficulty.
In practice, we seamlessly integrate the VideoQA model into our framework and conduct comprehensive experiments.
- Score: 63.12469700986452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While significant advancements have been made in video question answering
(VideoQA), the potential benefits of enhancing model generalization through
tailored difficulty scheduling have been largely overlooked in existing
research. This paper seeks to bridge that gap by incorporating VideoQA into a
curriculum learning (CL) framework that progressively trains models from
simpler to more complex data. Recognizing that conventional self-paced CL
methods rely on training loss for difficulty measurement, which might not
accurately reflect the intricacies of video-question pairs, we introduce the
concept of uncertainty-aware CL. Here, uncertainty serves as the guiding
principle for dynamically adjusting the difficulty. Furthermore, we address the
challenge posed by uncertainty by presenting a probabilistic modeling approach
for VideoQA. Specifically, we conceptualize VideoQA as a stochastic computation
graph, where the hidden representations are treated as stochastic variables.
This yields two distinct types of uncertainty: one related to the inherent
uncertainty in the data and another pertaining to the model's confidence. In
practice, we seamlessly integrate the VideoQA model into our framework and
conduct comprehensive experiments. The findings affirm that our approach not
only achieves enhanced performance but also effectively quantifies uncertainty
in the context of VideoQA.
Related papers
- Admitting Ignorance Helps the Video Question Answering Models to Answer [82.22149677979189]
We argue that models often establish shortcuts, resulting in spurious correlations between questions and answers.
We propose a novel training framework in which the model is compelled to acknowledge its ignorance when presented with an intervened question.
In practice, we integrate a state-of-the-art model into our framework to validate its effectiveness.
arXiv Detail & Related papers (2025-01-15T12:44:52Z) - Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation [12.638577140117702]
We show that uncertainty features contribute substantially to difficulty prediction, where difficulty is inversely proportional to the number of students who can correctly answer a question.
In addition to showing the value of our approach, we also observe that our model achieves state-of-the-art results on the BEA publicly available dataset.
arXiv Detail & Related papers (2024-12-16T14:55:09Z) - Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting [15.161997580529075]
This paper explores the novel challenge of VideoQA within a continual learning framework.
We propose Collaborative Prompting (ColPro), which integrates specific question constraint prompting, knowledge acquisition prompting, and visual temporal awareness prompting.
Experimental results on the NExT-QA and DramaQA datasets show that ColPro achieves superior performance compared to existing approaches.
arXiv Detail & Related papers (2024-10-01T15:07:07Z) - Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding [49.973156959947346]
Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos.
We introduce a robust network module that benefits from a two-stage cross-modal alignment task.
It integrates Deep Evidential Regression (DER) to explicitly and thoroughly quantify uncertainty during training.
In response, we develop a simple yet effective Geom-regularizer that enhances the uncertainty learning framework from the ground up.
arXiv Detail & Related papers (2024-08-29T05:32:03Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - HySTER: A Hybrid Spatio-Temporal Event Reasoner [75.41988728376081]
We present the HySTER: a Hybrid Spatio-Temporal Event Reasoner to reason over physical events in videos.
We define a method based on general temporal, causal and physics rules which can be transferred across tasks.
This work sets the foundations for the incorporation of inductive logic programming in the field of VideoQA.
arXiv Detail & Related papers (2021-01-17T11:07:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.