CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual
Question Answering
- URL: http://arxiv.org/abs/2211.10567v1
- Date: Sat, 19 Nov 2022 02:43:30 GMT
- Title: CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual
Question Answering
- Authors: Yao Zhang, Haokun Chen, Ahmed Frikha, Yezi Yang, Denis Krompass,
Gengyuan Zhang, Jindong Gu, Volker Tresp
- Abstract summary: We introduce CL-CrossVQA, a rigorous Continual Learning benchmark for Cross-domain Visual Question Answering.
We conduct extensive experiments on 4 VLPMs, 4 CL approaches, and 5 VQA datasets from different domains.
- Score: 31.983067109848342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Question Answering (VQA) is a multi-discipline research task. To
produce the right answer, it requires an understanding of the visual content of
images, the natural language questions, as well as commonsense reasoning over
the information contained in the image and world knowledge. Recently,
large-scale Vision-and-Language Pre-trained Models (VLPMs) have been the
mainstream approach to VQA tasks due to their superior performance. The
standard practice is to fine-tune large-scale VLPMs pre-trained on huge
general-domain datasets using the domain-specific VQA datasets. However, in
reality, the application domain can change over time, necessitating VLPMs to
continually learn and adapt to new domains without forgetting previously
acquired knowledge. Most existing continual learning (CL) research concentrates
on unimodal tasks, whereas a more practical application scenario, i.e, CL on
cross-domain VQA, has not been studied. Motivated by this, we introduce
CL-CrossVQA, a rigorous Continual Learning benchmark for Cross-domain Visual
Question Answering, through which we conduct extensive experiments on 4 VLPMs,
4 CL approaches, and 5 VQA datasets from different domains. In addition, by
probing the forgetting phenomenon of the intermediate layers, we provide
insights into how model architecture affects CL performance, why CL approaches
can help mitigate forgetting in VLPMs to some extent, and how to design CL
approaches suitable for VLPMs in this challenging continual learning
environment. To facilitate future work on CL for cross-domain VQA, we will
release our datasets and code.
Related papers
- Task Progressive Curriculum Learning for Robust Visual Question Answering [6.2175732887853545]
We show for the first time that robust Visual Question Answering is attainable by simply enhancing the training strategy.
Our proposed approach, Task Progressive Curriculum Learning (TPCL), breaks the main VQA problem into smaller, easier tasks.
We demonstrate TPCL effectiveness through a comprehensive evaluation on standard datasets.
arXiv Detail & Related papers (2024-11-26T10:29:47Z) - Exploring Language Model Generalization in Low-Resource Extractive QA [57.14068405860034]
We investigate Extractive Question Answering (EQA) with Large Language Models (LLMs) under domain drift.
We devise a series of experiments to empirically explain the performance gap.
arXiv Detail & Related papers (2024-09-27T05:06:43Z) - Advancing Cross-domain Discriminability in Continual Learning of Vision-Language Models [24.22859657019636]
RAIL is a regression-based adapter to learn from a sequence of domains in a non-forgetting manner.
It preserves the VLM's zero-shot ability on unseen domains without any reference data.
Experiment results affirm RAIL's state-of-the-art performance in both X-TAIL and existing Multi-domain Task-Incremental Learning settings.
arXiv Detail & Related papers (2024-06-27T03:48:57Z) - VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning [12.450293825734313]
Large language models (LLMs) famously exhibit emergent in-context learning (ICL)
This study introduces a benchmark VL-ICL Bench for multimodal in-context learning.
We evaluate the abilities of state-of-the-art VLLMs against this benchmark suite.
arXiv Detail & Related papers (2024-03-19T21:31:56Z) - CoLeCLIP: Open-Domain Continual Learning via Joint Task Prompt and Vocabulary Learning [38.063942750061585]
We introduce a novel approach, CoLeCLIP, that learns an open-domain CL model based on CLIP.
CoLeCLIP outperforms state-of-the-art methods for open-domain CL under both task- and class-incremental learning settings.
arXiv Detail & Related papers (2024-03-15T12:28:21Z) - POP: Prompt Of Prompts for Continual Learning [59.15888651733645]
Continual learning (CL) aims to mimic the human ability to learn new concepts without catastrophic forgetting.
We show that a foundation model equipped with POP learning is able to outperform classic CL methods by a significant margin.
arXiv Detail & Related papers (2023-06-14T02:09:26Z) - Generalized Few-Shot Continual Learning with Contrastive Mixture of
Adapters [59.82088750033897]
We set up a Generalized FSCL (GFSCL) protocol involving both class- and domain-incremental situations.
We find that common continual learning methods have poor generalization ability on unseen domains.
In this way, we propose a rehearsal-free framework based on Vision Transformer (ViT) named Contrastive Mixture of Adapters (CMoA)
arXiv Detail & Related papers (2023-02-12T15:18:14Z) - ConStruct-VL: Data-Free Continual Structured VL Concepts Learning [57.86651057895222]
We introduce the first Continual Data-Free Structured VL Concepts Learning (ConStruct-VL) benchmark.
We propose a data-free method comprised of a new approach of Adrial Pseudo-Replay (APR) which generates adversarial reminders of past tasks from past task models.
We show this approach outperforms all data-free methods by as much as 7% while even matching some levels of experience-replay.
arXiv Detail & Related papers (2022-11-17T18:57:03Z) - Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA
Task [12.74065821307626]
VQA is an ambitious task aiming to answer any image-related question.
It is hard to build such a system once for all since the needs of users are continuously updated.
We propose a real-data-free replay-based method tailored for CL on VQA, named Scene Graph as Prompt for Replay.
arXiv Detail & Related papers (2022-08-24T12:00:02Z) - The CLEAR Benchmark: Continual LEArning on Real-World Imagery [77.98377088698984]
Continual learning (CL) is widely regarded as crucial challenge for lifelong AI.
We introduce CLEAR, the first continual image classification benchmark dataset with a natural temporal evolution of visual concepts.
We find that a simple unsupervised pre-training step can already boost state-of-the-art CL algorithms.
arXiv Detail & Related papers (2022-01-17T09:09:09Z) - Found a Reason for me? Weakly-supervised Grounded Visual Question
Answering using Capsules [85.98177341704675]
The problem of grounding VQA tasks has seen an increased attention in the research community recently.
We propose a visual capsule module with a query-based selection mechanism of capsule features.
We show that integrating the proposed capsule module in existing VQA systems significantly improves their performance on the weakly supervised grounding task.
arXiv Detail & Related papers (2021-05-11T07:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.