Decoupled Prompt-Adapter Tuning for Continual Activity Recognition
- URL: http://arxiv.org/abs/2407.14811v1
- Date: Sat, 20 Jul 2024 08:56:04 GMT
- Title: Decoupled Prompt-Adapter Tuning for Continual Activity Recognition
- Authors: Di Fu, Thanh Vinh Vo, Haozhe Ma, Tze-Yun Leong,
- Abstract summary: Action recognition technology plays a vital role in enhancing security through surveillance systems, enabling better patient monitoring in healthcare, and facilitating seamless human-AI collaboration in domains such as manufacturing and assistive technologies.
We propose Decoupled Prompt-Adapter Tuning (DPAT), a novel framework that integrates adapters for capturing spatial-temporal information and learnable prompts for mitigating catastrophic forgetting through a decoupled training strategy.
DPAT consistently achieves state-of-the-art performance across several challenging action recognition benchmarks, thus demonstrating the effectiveness of our model in the domain of continual action recognition.
- Score: 6.224769485481242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Action recognition technology plays a vital role in enhancing security through surveillance systems, enabling better patient monitoring in healthcare, providing in-depth performance analysis in sports, and facilitating seamless human-AI collaboration in domains such as manufacturing and assistive technologies. The dynamic nature of data in these areas underscores the need for models that can continuously adapt to new video data without losing previously acquired knowledge, highlighting the critical role of advanced continual action recognition. To address these challenges, we propose Decoupled Prompt-Adapter Tuning (DPAT), a novel framework that integrates adapters for capturing spatial-temporal information and learnable prompts for mitigating catastrophic forgetting through a decoupled training strategy. DPAT uniquely balances the generalization benefits of prompt tuning with the plasticity provided by adapters in pretrained vision models, effectively addressing the challenge of maintaining model performance amidst continuous data evolution without necessitating extensive finetuning. DPAT consistently achieves state-of-the-art performance across several challenging action recognition benchmarks, thus demonstrating the effectiveness of our model in the domain of continual action recognition.
Related papers
- Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment [2.1374208474242815]
Video Anomaly Detection (VAD) identifies unusual activities in video streams, a key technology with broad applications ranging from surveillance to healthcare.
Tackling VAD in real-life settings poses significant challenges due to the dynamic nature of human actions, environmental variations, and domain shifts.
Online learning is a potential strategy to mitigate this issue by allowing models to adapt to new information continuously.
arXiv Detail & Related papers (2024-04-29T14:47:32Z) - MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications.
Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders.
We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z) - Enhancing Network Intrusion Detection Performance using Generative Adversarial Networks [0.25163931116642785]
We propose a novel approach for enhancing the performance of an NIDS through the integration of Generative Adversarial Networks (GANs)
GANs generate synthetic network traffic data that closely mimics real-world network behavior.
Our findings show that the integration of GANs into NIDS can lead to enhancements in intrusion detection performance for attacks with limited training data.
arXiv Detail & Related papers (2024-04-11T04:01:15Z) - Adaptive Affinity-Based Generalization For MRI Imaging Segmentation Across Resource-Limited Settings [1.5703963908242198]
This paper introduces a novel relation-based knowledge framework by seamlessly combining adaptive affinity-based and kernel-based distillation.
To validate our innovative approach, we conducted experiments on publicly available multi-source prostate MRI data.
arXiv Detail & Related papers (2024-04-03T13:35:51Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Data Quality Aware Approaches for Addressing Model Drift of Semantic
Segmentation Models [1.6385815610837167]
This study investigates two prominent quality aware strategies to combat model drift.
The former leverages image quality assessment metrics to meticulously select high-quality training data, improving the model robustness.
The latter makes use of learned vectors feature from existing models to guide the selection of future data, aligning it with the model's prior knowledge.
arXiv Detail & Related papers (2024-02-11T18:01:52Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - WiFi-TCN: Temporal Convolution for Human Interaction Recognition based
on WiFi signal [4.0773490083614075]
Wi-Fi based human activity recognition has gained considerable interest in recent times.
A challenge associated with Wi-Fi-based HAR is the significant decline in performance when the scene or subject changes.
We propose a novel approach that leverages a temporal convolution network with augmentations and attention, referred to as TCN-AA.
arXiv Detail & Related papers (2023-05-21T08:37:32Z) - Confidence Attention and Generalization Enhanced Distillation for
Continuous Video Domain Adaptation [62.458968086881555]
Continuous Video Domain Adaptation (CVDA) is a scenario where a source model is required to adapt to a series of individually available changing target domains.
We propose a Confidence-Attentive network with geneRalization enhanced self-knowledge disTillation (CART) to address the challenge in CVDA.
arXiv Detail & Related papers (2023-03-18T16:40:10Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Modality Compensation Network: Cross-Modal Adaptation for Action
Recognition [77.24983234113957]
We propose a Modality Compensation Network (MCN) to explore the relationships of different modalities.
Our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning.
Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.
arXiv Detail & Related papers (2020-01-31T04:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.