POET: Prompt Offset Tuning for Continual Human Action Adaptation
- URL: http://arxiv.org/abs/2504.18059v1
- Date: Fri, 25 Apr 2025 04:11:24 GMT
- Title: POET: Prompt Offset Tuning for Continual Human Action Adaptation
- Authors: Prachi Garg, Joseph K J, Vineeth N Balasubramanian, Necati Cihan Camgoz, Chengde Wan, Kenrick Kin, Weiguang Si, Shugao Ma, Fernando De La Torre,
- Abstract summary: We aim to provide users and developers with the capability to personalize their experience by adding new action classes to their device models continually.<n>We formalize this as privacy-aware few-shot continual action recognition.<n>We propose a novel-temporal learnable prompt tuning approach, and are the first to apply such prompt tuning to Graph Neural Networks.
- Score: 61.63831623094721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As extended reality (XR) is redefining how users interact with computing devices, research in human action recognition is gaining prominence. Typically, models deployed on immersive computing devices are static and limited to their default set of classes. The goal of our research is to provide users and developers with the capability to personalize their experience by adding new action classes to their device models continually. Importantly, a user should be able to add new classes in a low-shot and efficient manner, while this process should not require storing or replaying any of user's sensitive training data. We formalize this problem as privacy-aware few-shot continual action recognition. Towards this end, we propose POET: Prompt-Offset Tuning. While existing prompt tuning approaches have shown great promise for continual learning of image, text, and video modalities; they demand access to extensively pretrained transformers. Breaking away from this assumption, POET demonstrates the efficacy of prompt tuning a significantly lightweight backbone, pretrained exclusively on the base class data. We propose a novel spatio-temporal learnable prompt offset tuning approach, and are the first to apply such prompt tuning to Graph Neural Networks. We contribute two new benchmarks for our new problem setting in human action recognition: (i) NTU RGB+D dataset for activity recognition, and (ii) SHREC-2017 dataset for hand gesture recognition. We find that POET consistently outperforms comprehensive benchmarks. Source code at https://github.com/humansensinglab/POET-continual-action-recognition.
Related papers
- PEARL: Input-Agnostic Prompt Enhancement with Negative Feedback Regulation for Class-Incremental Learning [17.819582979803286]
Class-incremental learning (CIL) aims to continuously introduce novel categories into a classification system without forgetting previously learned ones.<n> Prompt learning has been adopted in CIL for its ability to adjust data distribution to better align with pre-trained knowledge.<n>This paper critically examines the limitations of existing methods from the perspective of prompt learning.
arXiv Detail & Related papers (2024-12-14T17:13:30Z) - Adaptive Retention & Correction: Test-Time Training for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.<n>We name our approach Adaptive Retention & Correction (ARC)<n>ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [65.57123249246358]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT.<n>On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt.<n>On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z) - Efficient Adaptive Human-Object Interaction Detection with
Concept-guided Memory [64.11870454160614]
We propose an efficient Adaptive HOI Detector with Concept-guided Memory (ADA-CM)
ADA-CM has two operating modes. The first mode makes it tunable without learning new parameters in a training-free paradigm.
Our proposed method achieves competitive results with state-of-the-art on the HICO-DET and V-COCO datasets with much less training time.
arXiv Detail & Related papers (2023-09-07T13:10:06Z) - Remind of the Past: Incremental Learning with Analogical Prompts [30.333352182303038]
We design an analogy-making mechanism to remap the new data into the old class by prompt tuning.
It mimics the feature distribution of the target old class on the old model using only samples of new classes.
The learnt prompts are further used to estimate and counteract the representation shift caused by fine-tuning for the historical prototypes.
arXiv Detail & Related papers (2023-03-24T10:18:28Z) - PIVOT: Prompting for Video Continual Learning [50.80141083993668]
We introduce PIVOT, a novel method that leverages extensive knowledge in pre-trained models from the image domain.
Our experiments show that PIVOT improves state-of-the-art methods by a significant 27% on the 20-task ActivityNet setup.
arXiv Detail & Related papers (2022-12-09T13:22:27Z) - CODA-Prompt: COntinual Decomposed Attention-based Prompting for
Rehearsal-Free Continual Learning [30.676509834338884]
Computer vision models suffer from a phenomenon known as catastrophic forgetting when learning novel concepts from continuously shifting training data.
We propose prompting approaches as an alternative to data-rehearsal.
We show that we outperform the current SOTA method DualPrompt on established benchmarks by as much as 4.5% in average final accuracy.
arXiv Detail & Related papers (2022-11-23T18:57:11Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - ActionCLIP: A New Paradigm for Video Action Recognition [14.961103794667341]
We provide a new perspective on action recognition by attaching importance to the semantic information of label texts.
We propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, prompt and fine-tune"
arXiv Detail & Related papers (2021-09-17T11:21:34Z) - Incremental Real-Time Personalization in Human Activity Recognition
Using Domain Adaptive Batch Normalization [1.160208922584163]
Human Activity Recognition (HAR) from devices like smartphone accelerometers is a fundamental problem in ubiquitous computing.
Previous work has addressed this challenge by personalizing general recognition models to the unique motion pattern of a new user in a static batch setting.
Our work addresses all of these challenges by proposing an unsupervised online domain adaptation algorithm.
arXiv Detail & Related papers (2020-05-25T15:49:10Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.