Related papers: Decoupled Prompt-Adapter Tuning for Continual Activity Recognition

Decoupled Prompt-Adapter Tuning for Continual Activity Recognition

URL: http://arxiv.org/abs/2407.14811v1
Date: Sat, 20 Jul 2024 08:56:04 GMT
Title: Decoupled Prompt-Adapter Tuning for Continual Activity Recognition
Authors: Di Fu, Thanh Vinh Vo, Haozhe Ma, Tze-Yun Leong,
Abstract summary: Action recognition technology plays a vital role in enhancing security through surveillance systems, enabling better patient monitoring in healthcare, and facilitating seamless human-AI collaboration in domains such as manufacturing and assistive technologies. We propose Decoupled Prompt-Adapter Tuning (DPAT), a novel framework that integrates adapters for capturing spatial-temporal information and learnable prompts for mitigating catastrophic forgetting through a decoupled training strategy. DPAT consistently achieves state-of-the-art performance across several challenging action recognition benchmarks, thus demonstrating the effectiveness of our model in the domain of continual action recognition.
Score: 6.224769485481242
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Action recognition technology plays a vital role in enhancing security through surveillance systems, enabling better patient monitoring in healthcare, providing in-depth performance analysis in sports, and facilitating seamless human-AI collaboration in domains such as manufacturing and assistive technologies. The dynamic nature of data in these areas underscores the need for models that can continuously adapt to new video data without losing previously acquired knowledge, highlighting the critical role of advanced continual action recognition. To address these challenges, we propose Decoupled Prompt-Adapter Tuning (DPAT), a novel framework that integrates adapters for capturing spatial-temporal information and learnable prompts for mitigating catastrophic forgetting through a decoupled training strategy. DPAT uniquely balances the generalization benefits of prompt tuning with the plasticity provided by adapters in pretrained vision models, effectively addressing the challenge of maintaining model performance amidst continuous data evolution without necessitating extensive finetuning. DPAT consistently achieves state-of-the-art performance across several challenging action recognition benchmarks, thus demonstrating the effectiveness of our model in the domain of continual action recognition.

Related papers

SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations [68.9300049150948]
We address a fundamental challenge in Reinforcement Learning from Interaction Demonstration (RLID)<n>Existing data collection approaches yield sparse, disconnected, and noisy trajectories that fail to capture the full spectrum of possible skill variations and transitions.<n>We present two data augmentation techniques: a Stitched Trajectory Graph (STG) that discovers potential transitions between demonstration skills, and a State Transition Field (STF) that establishes unique connections for arbitrary states within the demonstration neighborhood.
arXiv Detail & Related papers (2025-05-04T13:00:29Z)
Ancestral Mamba: Enhancing Selective Discriminant Space Model with Online Visual Prototype Learning for Efficient and Robust Discriminant Approach [5.755715236558973]
Ancestral Mamba is a novel approach that integrates online prototype learning into a selective discriminant space model. APA enables the model to continuously adapt its prototypes, building upon ancestral knowledge to tackle new challenges. MF acts as a targeted feedback mechanism, focusing on challenging classes and refining their representations.
arXiv Detail & Related papers (2025-03-26T08:36:05Z)
Emotion Recognition with CLIP and Sequential Learning [5.66758879852618]
We present our innovative methodology for tackling the Valence-Arousal (VA) Estimation Challenge, the Expression Recognition Challenge, and the Action Unit (AU) Detection Challenge. Our approach introduces a novel framework aimed at enhancing continuous emotion recognition.
arXiv Detail & Related papers (2025-03-13T01:02:06Z)
A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance. We propose a simple yet effective data augmentation approach by leveraging advancements in generative models. Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z)
Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence [60.37934652213881]
Domain Adaptation (DA) facilitates knowledge transfer from a source domain to a related target domain. This paper investigates a practical DA paradigm, namely Source data-Free Active Domain Adaptation (SFADA), where source data becomes inaccessible during adaptation. We present learn from the learnt (LFTL), a novel paradigm for SFADA to leverage the learnt knowledge from the source pretrained model and actively iterated models without extra overhead.
arXiv Detail & Related papers (2024-07-26T17:51:58Z)
Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment [2.1374208474242815]
Video Anomaly Detection (VAD) identifies unusual activities in video streams, a key technology with broad applications ranging from surveillance to healthcare. Tackling VAD in real-life settings poses significant challenges due to the dynamic nature of human actions, environmental variations, and domain shifts. Online learning is a potential strategy to mitigate this issue by allowing models to adapt to new information continuously.
arXiv Detail & Related papers (2024-04-29T14:47:32Z)
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild [81.32127423981426]
Multimodal emotion recognition based on audio and video data is important for real-world applications. Recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders. We propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders.
arXiv Detail & Related papers (2024-04-13T13:39:26Z)
Enhancing Network Intrusion Detection Performance using Generative Adversarial Networks [0.25163931116642785]
We propose a novel approach for enhancing the performance of an NIDS through the integration of Generative Adversarial Networks (GANs) GANs generate synthetic network traffic data that closely mimics real-world network behavior. Our findings show that the integration of GANs into NIDS can lead to enhancements in intrusion detection performance for attacks with limited training data.
arXiv Detail & Related papers (2024-04-11T04:01:15Z)
Adaptive Affinity-Based Generalization For MRI Imaging Segmentation Across Resource-Limited Settings [1.5703963908242198]
This paper introduces a novel relation-based knowledge framework by seamlessly combining adaptive affinity-based and kernel-based distillation. To validate our innovative approach, we conducted experiments on publicly available multi-source prostate MRI data.
arXiv Detail & Related papers (2024-04-03T13:35:51Z)
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets. We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability. Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z)
Data Quality Aware Approaches for Addressing Model Drift of Semantic Segmentation Models [1.6385815610837167]
This study investigates two prominent quality aware strategies to combat model drift. The former leverages image quality assessment metrics to meticulously select high-quality training data, improving the model robustness. The latter makes use of learned vectors feature from existing models to guide the selection of future data, aligning it with the model's prior knowledge.
arXiv Detail & Related papers (2024-02-11T18:01:52Z)
WiFi-TCN: Temporal Convolution for Human Interaction Recognition based on WiFi signal [4.0773490083614075]
Wi-Fi based human activity recognition has gained considerable interest in recent times. A challenge associated with Wi-Fi-based HAR is the significant decline in performance when the scene or subject changes. We propose a novel approach that leverages a temporal convolution network with augmentations and attention, referred to as TCN-AA.
arXiv Detail & Related papers (2023-05-21T08:37:32Z)
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction. One of the main challenges in SER is data scarcity. We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z)
Modality Compensation Network: Cross-Modal Adaptation for Action Recognition [77.24983234113957]
We propose a Modality Compensation Network (MCN) to explore the relationships of different modalities. Our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning. Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.
arXiv Detail & Related papers (2020-01-31T04:51:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.