Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning
for Video Question Answering
- URL: http://arxiv.org/abs/2401.01510v1
- Date: Wed, 3 Jan 2024 02:29:34 GMT
- Title: Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning
for Video Question Answering
- Authors: Haopeng Li, Qiuhong Ke, Mingming Gong, and Tom Drummond
- Abstract summary: We introduce the concept of uncertainty-aware curriculum learning (CL)
Here, uncertainty serves as the guiding principle for dynamically adjusting the difficulty.
In practice, we seamlessly integrate the VideoQA model into our framework and conduct comprehensive experiments.
- Score: 63.12469700986452
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While significant advancements have been made in video question answering
(VideoQA), the potential benefits of enhancing model generalization through
tailored difficulty scheduling have been largely overlooked in existing
research. This paper seeks to bridge that gap by incorporating VideoQA into a
curriculum learning (CL) framework that progressively trains models from
simpler to more complex data. Recognizing that conventional self-paced CL
methods rely on training loss for difficulty measurement, which might not
accurately reflect the intricacies of video-question pairs, we introduce the
concept of uncertainty-aware CL. Here, uncertainty serves as the guiding
principle for dynamically adjusting the difficulty. Furthermore, we address the
challenge posed by uncertainty by presenting a probabilistic modeling approach
for VideoQA. Specifically, we conceptualize VideoQA as a stochastic computation
graph, where the hidden representations are treated as stochastic variables.
This yields two distinct types of uncertainty: one related to the inherent
uncertainty in the data and another pertaining to the model's confidence. In
practice, we seamlessly integrate the VideoQA model into our framework and
conduct comprehensive experiments. The findings affirm that our approach not
only achieves enhanced performance but also effectively quantifies uncertainty
in the context of VideoQA.
Related papers
- LoGU: Long-form Generation with Uncertainty Expressions [49.76417603761989]
We introduce the task of Long-form Generation with Uncertainty(LoGU)
We identify two key challenges: Uncertainty Suppression and Uncertainty Misalignment.
Our framework adopts a divide-and-conquer strategy, refining uncertainty based on atomic claims.
Experiments on three long-form instruction following datasets show that our method significantly improves accuracy, reduces hallucinations, and maintains the comprehensiveness of responses.
arXiv Detail & Related papers (2024-10-18T09:15:35Z) - Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting [15.161997580529075]
This paper explores the novel challenge of VideoQA within a continual learning framework.
We propose Collaborative Prompting (ColPro), which integrates specific question constraint prompting, knowledge acquisition prompting, and visual temporal awareness prompting.
Experimental results on the NExT-QA and DramaQA datasets show that ColPro achieves superior performance compared to existing approaches.
arXiv Detail & Related papers (2024-10-01T15:07:07Z) - Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding [49.973156959947346]
Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos.
We introduce a robust network module that benefits from a two-stage cross-modal alignment task.
It integrates Deep Evidential Regression (DER) to explicitly and thoroughly quantify uncertainty during training.
In response, we develop a simple yet effective Geom-regularizer that enhances the uncertainty learning framework from the ground up.
arXiv Detail & Related papers (2024-08-29T05:32:03Z) - Perception Matters: Enhancing Embodied AI with Uncertainty-Aware Semantic Segmentation [24.32551050538683]
Embodied AI has made significant progress acting in unexplored environments.
We focus on dated perception models, neglect temporal aggregation, and transfer from ground truth directly to noisy perception at test time.
We address the identified problems through calibrated perception probabilities and uncertainty across aggregation and found decisions.
arXiv Detail & Related papers (2024-08-05T08:14:28Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms.
Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner.
Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z) - HySTER: A Hybrid Spatio-Temporal Event Reasoner [75.41988728376081]
We present the HySTER: a Hybrid Spatio-Temporal Event Reasoner to reason over physical events in videos.
We define a method based on general temporal, causal and physics rules which can be transferred across tasks.
This work sets the foundations for the incorporation of inductive logic programming in the field of VideoQA.
arXiv Detail & Related papers (2021-01-17T11:07:17Z) - Self-supervised pre-training and contrastive representation learning for
multiple-choice video QA [39.78914328623504]
Video Question Answering (Video QA) requires fine-grained understanding of both video and language modalities to answer the given questions.
We propose novel training schemes for multiple-choice video question answering with a self-supervised pre-training stage and a supervised contrastive learning in the main stage as an auxiliary learning.
We evaluate our proposed model on highly competitive benchmark datasets related to multiple-choice video QA: TVQA, TVQA+, and DramaQA.
arXiv Detail & Related papers (2020-09-17T03:37:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.