Related papers: Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

URL: http://arxiv.org/abs/2401.01510v1
Date: Wed, 3 Jan 2024 02:29:34 GMT
Title: Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering
Authors: Haopeng Li, Qiuhong Ke, Mingming Gong, and Tom Drummond
Abstract summary: We introduce the concept of uncertainty-aware curriculum learning (CL) Here, uncertainty serves as the guiding principle for dynamically adjusting the difficulty. In practice, we seamlessly integrate the VideoQA model into our framework and conduct comprehensive experiments.
Score: 63.12469700986452
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While significant advancements have been made in video question answering (VideoQA), the potential benefits of enhancing model generalization through tailored difficulty scheduling have been largely overlooked in existing research. This paper seeks to bridge that gap by incorporating VideoQA into a curriculum learning (CL) framework that progressively trains models from simpler to more complex data. Recognizing that conventional self-paced CL methods rely on training loss for difficulty measurement, which might not accurately reflect the intricacies of video-question pairs, we introduce the concept of uncertainty-aware CL. Here, uncertainty serves as the guiding principle for dynamically adjusting the difficulty. Furthermore, we address the challenge posed by uncertainty by presenting a probabilistic modeling approach for VideoQA. Specifically, we conceptualize VideoQA as a stochastic computation graph, where the hidden representations are treated as stochastic variables. This yields two distinct types of uncertainty: one related to the inherent uncertainty in the data and another pertaining to the model's confidence. In practice, we seamlessly integrate the VideoQA model into our framework and conduct comprehensive experiments. The findings affirm that our approach not only achieves enhanced performance but also effectively quantifies uncertainty in the context of VideoQA.

Related papers

Uncertainty Quantification in Retrieval Augmented Question Answering [57.05827081638329]
We propose to quantify the uncertainty of a QA model via estimating the utility of the passages it is provided with. We train a lightweight neural model to predict passage utility for a target QA model and show that while simple information theoretic metrics can predict answer correctness up to a certain extent, our approach efficiently approximates or outperforms more expensive sampling-based methods.
arXiv Detail & Related papers (2025-02-25T11:24:52Z)
Admitting Ignorance Helps the Video Question Answering Models to Answer [82.22149677979189]
We argue that models often establish shortcuts, resulting in spurious correlations between questions and answers. We propose a novel training framework in which the model is compelled to acknowledge its ignorance when presented with an intervened question. In practice, we integrate a state-of-the-art model into our framework to validate its effectiveness.
arXiv Detail & Related papers (2025-01-15T12:44:52Z)
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation [12.638577140117702]
We show that uncertainty features contribute substantially to difficulty prediction, where difficulty is inversely proportional to the number of students who can correctly answer a question. In addition to showing the value of our approach, we also observe that our model achieves state-of-the-art results on the USMLE and CMCQRD publicly available datasets.
arXiv Detail & Related papers (2024-12-16T14:55:09Z)
LoGU: Long-form Generation with Uncertainty Expressions [49.76417603761989]
We introduce the task of Long-form Generation with Uncertainty(LoGU) We identify two key challenges: Uncertainty Suppression and Uncertainty Misalignment. Our framework adopts a divide-and-conquer strategy, refining uncertainty based on atomic claims. Experiments on three long-form instruction following datasets show that our method significantly improves accuracy, reduces hallucinations, and maintains the comprehensiveness of responses.
arXiv Detail & Related papers (2024-10-18T09:15:35Z)
Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting [15.161997580529075]
This paper explores the novel challenge of VideoQA within a continual learning framework. We propose Collaborative Prompting (ColPro), which integrates specific question constraint prompting, knowledge acquisition prompting, and visual temporal awareness prompting. Experimental results on the NExT-QA and DramaQA datasets show that ColPro achieves superior performance compared to existing approaches.
arXiv Detail & Related papers (2024-10-01T15:07:07Z)
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding [49.973156959947346]
Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos. We introduce a robust network module that benefits from a two-stage cross-modal alignment task. It integrates Deep Evidential Regression (DER) to explicitly and thoroughly quantify uncertainty during training. In response, we develop a simple yet effective Geom-regularizer that enhances the uncertainty learning framework from the ground up.
arXiv Detail & Related papers (2024-08-29T05:32:03Z)
Perception Matters: Enhancing Embodied AI with Uncertainty-Aware Semantic Segmentation [24.32551050538683]
Embodied AI has made significant progress acting in unexplored environments. We focus on dated perception models, neglect temporal aggregation, and transfer from ground truth directly to noisy perception at test time. We address the identified problems through calibrated perception probabilities and uncertainty across aggregation and found decisions.
arXiv Detail & Related papers (2024-08-05T08:14:28Z)
Improving the Reliability of Large Language Models by Leveraging Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination" We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z)
CONVIQT: Contrastive Video Quality Estimator [63.749184706461826]
Perceptual video quality assessment (VQA) is an integral component of many streaming and video sharing platforms. Here we consider the problem of learning perceptually relevant video quality representations in a self-supervised manner. Our results indicate that compelling representations with perceptual bearing can be obtained using self-supervised learning.
arXiv Detail & Related papers (2022-06-29T15:22:01Z)
HySTER: A Hybrid Spatio-Temporal Event Reasoner [75.41988728376081]
We present the HySTER: a Hybrid Spatio-Temporal Event Reasoner to reason over physical events in videos. We define a method based on general temporal, causal and physics rules which can be transferred across tasks. This work sets the foundations for the incorporation of inductive logic programming in the field of VideoQA.
arXiv Detail & Related papers (2021-01-17T11:07:17Z)
Self-supervised pre-training and contrastive representation learning for multiple-choice video QA [39.78914328623504]
Video Question Answering (Video QA) requires fine-grained understanding of both video and language modalities to answer the given questions. We propose novel training schemes for multiple-choice video question answering with a self-supervised pre-training stage and a supervised contrastive learning in the main stage as an auxiliary learning. We evaluate our proposed model on highly competitive benchmark datasets related to multiple-choice video QA: TVQA, TVQA+, and DramaQA.
arXiv Detail & Related papers (2020-09-17T03:37:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.