Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning
- URL: http://arxiv.org/abs/2507.21588v1
- Date: Tue, 29 Jul 2025 08:42:36 GMT
- Title: Progressive Homeostatic and Plastic Prompt Tuning for Audio-Visual Multi-Task Incremental Learning
- Authors: Jiong Yin, Liang Li, Jiehua Zhang, Yuhan Gao, Chenggang Yan, Xichun Sheng,
- Abstract summary: We introduce a three-stage Progressive Homeostatic and Plastic audio-visual prompt (PHP) method.<n>In the shallow phase, we design the task-shared modality aggregating adapter to foster cross-task and cross-modal audio-visual representation learning.<n>In the middle phase, we propose the task-specific modality-shared dynamic generating adapter, which constructs prompts that are tailored to individual tasks.<n>In the deep phase, we introduce the task-specific modality-independent prompts to further refine the understand ability.
- Score: 23.22385310060951
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Audio-visual multi-task incremental learning aims to continuously learn from multiple audio-visual tasks without the need for joint training on all tasks. The challenge of the problem is how to preserve the old task knowledge while facilitating the learning of new task with previous experiences. To address these challenges, we introduce a three-stage Progressive Homeostatic and Plastic audio-visual prompt (PHP) method. In the shallow phase, we design the task-shared modality aggregating adapter to foster cross-task and cross-modal audio-visual representation learning to enhance shared understanding between tasks. In the middle phase, we propose the task-specific modality-shared dynamic generating adapter, which constructs prompts that are tailored to individual tasks while remaining general across modalities, which balances the models ability to retain knowledge against forgetting with its potential for versatile multi-task transferability. In the deep phase, we introduce the task-specific modality-independent prompts to further refine the understand ability by targeting individual information for each task and modality. By incorporating these three phases, PHP retains task-specific prompts while adapting shared parameters for new tasks to effectively balance knowledge sharing and specificity. Our method achieves SOTA performance in different orders of four tasks (AVE, AVVP, AVS and AVQA). Our code can be available at https://github.com/ENJOY-Yin-jiong/PHP.
Related papers
- Is Visual in-Context Learning for Compositional Medical Tasks within Reach? [68.56630652862293]
In this paper, we explore the potential of visual in-context learning to enable a single model to handle multiple tasks.<n>We introduce a novel method for training in-context learners using a synthetic compositional task generation engine.
arXiv Detail & Related papers (2025-07-01T15:32:23Z) - TransPrompt v2: A Transferable Prompting Framework for Cross-task Text
Classification [37.824031151922604]
We propose TransPrompt v2, a novel transferable prompting framework for few-shot learning across similar or distant text classification tasks.
For learning across similar tasks, we employ a multi-task meta-knowledge acquisition (MMA) procedure to train a meta-learner.
For learning across distant tasks, we inject the task type descriptions into the prompt, and capture the intra-type and inter-type prompt embeddings.
arXiv Detail & Related papers (2023-08-29T04:16:57Z) - Task-Attentive Transformer Architecture for Continual Learning of
Vision-and-Language Tasks Using Knowledge Distillation [18.345183818638475]
Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks.
We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks.
Our approach is scalable learning to a large number of tasks because it requires little memory and time overhead.
arXiv Detail & Related papers (2023-03-25T10:16:53Z) - Multitask Vision-Language Prompt Tuning [103.5967011236282]
We propose multitask vision-language prompt tuning (MV)
MV incorporates cross-task knowledge into prompt tuning for vision-language models.
Results in 20 vision tasks demonstrate that the proposed approach outperforms all single-task baseline prompt tuning methods.
arXiv Detail & Related papers (2022-11-21T18:41:44Z) - Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering [43.07139534653485]
We present Answer-Me, a task-aware multi-task framework.
We pre-train a vision-language joint model, which is multi-task as well.
Results show state-of-the-art performance, zero-shot generalization, robustness to forgetting, and competitive single-task results.
arXiv Detail & Related papers (2022-05-02T14:53:13Z) - Modular Adaptive Policy Selection for Multi-Task Imitation Learning
through Task Division [60.232542918414985]
Multi-task learning often suffers from negative transfer, sharing information that should be task-specific.
This is done by using proto-policies as modules to divide the tasks into simple sub-behaviours that can be shared.
We also demonstrate its ability to autonomously divide the tasks into both shared and task-specific sub-behaviours.
arXiv Detail & Related papers (2022-03-28T15:53:17Z) - Variational Multi-Task Learning with Gumbel-Softmax Priors [105.22406384964144]
Multi-task learning aims to explore task relatedness to improve individual tasks.
We propose variational multi-task learning (VMTL), a general probabilistic inference framework for learning multiple related tasks.
arXiv Detail & Related papers (2021-11-09T18:49:45Z) - Towards More Generalizable One-shot Visual Imitation Learning [81.09074706236858]
A general-purpose robot should be able to master a wide range of tasks and quickly learn a novel one by leveraging past experiences.
One-shot imitation learning (OSIL) approaches this goal by training an agent with (pairs of) expert demonstrations.
We push for a higher level of generalization ability by investigating a more ambitious multi-task setup.
arXiv Detail & Related papers (2021-10-26T05:49:46Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.