No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory
- URL: http://arxiv.org/abs/2502.04469v1
- Date: Thu, 06 Feb 2025 19:37:43 GMT
- Title: No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory
- Authors: Imad Eddine Marouf, Enzo Tartaglione, Stephane Lathuiliere, Joost van de Weijer,
- Abstract summary: Continual Learning in Visual Question Answering (VQACL) requires models to learn new visual-linguistic tasks (plasticity) while retaining knowledge from previous tasks (stability)
Existing methods, predominantly designed for unimodal tasks, often struggle to balance these demands effectively.
We introduce QUestion-only replay with Attention Distillation (QUAD), a novel approach for VQACL that leverages only past task questions for regularisation.
- Score: 17.369734751262126
- License:
- Abstract: Continual Learning in Visual Question Answering (VQACL) requires models to learn new visual-linguistic tasks (plasticity) while retaining knowledge from previous tasks (stability). The multimodal nature of VQACL presents unique challenges, requiring models to balance stability across visual and textual domains while maintaining plasticity to adapt to novel objects and reasoning tasks. Existing methods, predominantly designed for unimodal tasks, often struggle to balance these demands effectively. In this work, we introduce QUestion-only replay with Attention Distillation (QUAD), a novel approach for VQACL that leverages only past task questions for regularisation, eliminating the need to store visual data and addressing both memory and privacy concerns. QUAD achieves stability by introducing a question-only replay mechanism that selectively uses questions from previous tasks to prevent overfitting to the current task's answer space, thereby mitigating the out-of-answer-set problem. Complementing this, we propose attention consistency distillation, which uniquely enforces both intra-modal and inter-modal attention consistency across tasks, preserving essential visual-linguistic associations. Extensive experiments on VQAv2 and NExT-QA demonstrate that QUAD significantly outperforms state-of-the-art methods, achieving robust performance in continual VQA.
Related papers
- QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View [2.3982875575861677]
We present our solutions for a spectrum of automation tasks in life-saving intervention procedures within the Trauma THOMPSON (T3) Challenge.
For action recognition and anticipation, we propose a pre-processing strategy that samples and stitches multiple inputs into a single image.
For training, we present an action dictionary-guided design, which consistently yields the most favorable results.
arXiv Detail & Related papers (2024-07-18T06:55:26Z) - Continual Learning for Temporal-Sensitive Question Answering [12.76582814745124]
In real-world applications, it's crucial for models to continually acquire knowledge over time, rather than relying on a static, complete dataset.
Our paper investigates strategies that enable models to adapt to the ever-evolving information landscape.
We propose a training framework for CLTSQA that integrates temporal memory replay and temporal contrastive learning.
arXiv Detail & Related papers (2024-07-17T10:47:43Z) - Exploring Question Decomposition for Zero-Shot VQA [99.32466439254821]
We investigate a question decomposition strategy for visual question answering.
We show that naive application of model-written decompositions can hurt performance.
We introduce a model-driven selective decomposition approach for second-guessing predictions and correcting errors.
arXiv Detail & Related papers (2023-10-25T23:23:57Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning [1.8399318639816038]
We propose prioritized soft Q-decomposition (PSQD) for learning and adapting subtask solutions under lexicographic priorities.
PSQD offers the ability to reuse previously learned subtask solutions in a zero-shot composition, followed by an adaptation step.
We demonstrate the efficacy of our approach by presenting successful learning, reuse, and adaptation results for both low- and high-dimensional simulated robot control tasks.
arXiv Detail & Related papers (2023-10-03T18:36:21Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Improving Visual Question Answering Models through Robustness Analysis
and In-Context Learning with a Chain of Basic Questions [70.70725223310401]
This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.
The experimental results demonstrate that the proposed evaluation method effectively analyzes the robustness of VQA models.
arXiv Detail & Related papers (2023-04-06T15:32:35Z) - Found a Reason for me? Weakly-supervised Grounded Visual Question
Answering using Capsules [85.98177341704675]
The problem of grounding VQA tasks has seen an increased attention in the research community recently.
We propose a visual capsule module with a query-based selection mechanism of capsule features.
We show that integrating the proposed capsule module in existing VQA systems significantly improves their performance on the weakly supervised grounding task.
arXiv Detail & Related papers (2021-05-11T07:45:32Z) - Regularizing Attention Networks for Anomaly Detection in Visual Question
Answering [10.971443035470488]
We evaluate the robustness of state-of-the-art VQA models to five different anomalies.
We propose an attention-based method, which uses confidence of reasoning between input images and questions.
We show that a maximum entropy regularization of attention networks can significantly improve the attention-based anomaly detection.
arXiv Detail & Related papers (2020-09-21T17:47:49Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.