PLAR: Prompt Learning for Action Recognition
- URL: http://arxiv.org/abs/2305.12437v2
- Date: Wed, 15 Nov 2023 02:59:32 GMT
- Title: PLAR: Prompt Learning for Action Recognition
- Authors: Xijun Wang, Ruiqi Xian, Tianrui Guan, Dinesh Manocha
- Abstract summary: We present a new general learning approach, Prompt Learning for Action Recognition (PLAR)
Our approach is designed to predict the action label by helping the models focus on the descriptions or instructions associated with actions in the input videos.
We observe a 3.110-7.2% accuracy improvement on the aerial multi-agent dataset Okutamam and a 1.0-3.6% improvement on the ground camera single-agent dataset Something Something V2.
- Score: 56.57236976757388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a new general learning approach, Prompt Learning for Action
Recognition (PLAR), which leverages the strengths of prompt learning to guide
the learning process. Our approach is designed to predict the action label by
helping the models focus on the descriptions or instructions associated with
actions in the input videos. Our formulation uses various prompts, including
learnable prompts, auxiliary visual information, and large vision models to
improve the recognition performance. In particular, we design a learnable
prompt method that learns to dynamically generate prompts from a pool of prompt
experts under different inputs. By sharing the same objective with the task,
our proposed PLAR can optimize prompts that guide the model's predictions while
explicitly learning input-invariant (prompt experts pool) and input-specific
(data-dependent) prompt knowledge. We evaluate our approach on datasets
consisting of both ground camera videos and aerial videos, and scenes with
single-agent and multi-agent actions. In practice, we observe a 3.17-10.2%
accuracy improvement on the aerial multi-agent dataset Okutamam and a 1.0-3.6%
improvement on the ground camera single-agent dataset Something Something V2.
We plan to release our code on the WWW.
Related papers
- Understanding the Multi-modal Prompts of the Pre-trained Vision-Language
Model [15.828023370166411]
We conduct a direct analysis of the multi-modal prompts by asking the following questions.
$(i)$ How do the learned multi-modal prompts improve the recognition performance?
$(ii)$ What do the multi-modal prompts learn?
arXiv Detail & Related papers (2023-12-18T04:49:03Z) - APoLLo: Unified Adapter and Prompt Learning for Vision Language Models [58.9772868980283]
We present APoLLo, a unified multi-modal approach that combines Adapter and Prompt learning for Vision-Language models.
APoLLo achieves a relative gain up to 6.03% over MaPLe (SOTA) on novel classes for 10 diverse image recognition datasets.
arXiv Detail & Related papers (2023-12-04T01:42:09Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - Exploring Effective Factors for Improving Visual In-Context Learning [56.14208975380607]
In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models.
This paper shows that prompt selection and prompt fusion are two major factors that have a direct impact on the inference performance of visual context learning.
We propose a simple framework prompt-SelF for visual in-context learning.
arXiv Detail & Related papers (2023-04-10T17:59:04Z) - Dynamic Prompting: A Unified Framework for Prompt Tuning [33.175097465669374]
We present a unified dynamic prompt (DP) tuning strategy that dynamically determines different factors of prompts based on specific tasks and instances.
Experimental results underscore the significant performance improvement achieved by dynamic prompt tuning across a wide range of tasks.
We establish the universal applicability of our approach under full-data, few-shot, and multitask scenarios.
arXiv Detail & Related papers (2023-03-06T06:04:46Z) - Prompt-Learning for Fine-Grained Entity Typing [40.983849729537795]
We investigate the application of prompt-learning on fine-grained entity typing in fully supervised, few-shot and zero-shot scenarios.
We propose a self-supervised strategy that carries out distribution-level optimization in prompt-learning to automatically summarize the information of entity types.
arXiv Detail & Related papers (2021-08-24T09:39:35Z) - ALICE: Active Learning with Contrastive Natural Language Explanations [69.03658685761538]
We propose Active Learning with Contrastive Explanations (ALICE) to improve data efficiency in learning.
ALICE learns to first use active learning to select the most informative pairs of label classes to elicit contrastive natural language explanations.
It extracts knowledge from these explanations using a semantically extracted knowledge.
arXiv Detail & Related papers (2020-09-22T01:02:07Z) - Memory-augmented Dense Predictive Coding for Video Representation
Learning [103.69904379356413]
We propose a new architecture and learning framework Memory-augmented Predictive Coding (MemDPC) for the task.
We investigate visual-only self-supervised video representation learning from RGB frames, or from unsupervised optical flow, or both.
In all cases, we demonstrate state-of-the-art or comparable performance over other approaches with orders of magnitude fewer training data.
arXiv Detail & Related papers (2020-08-03T17:57:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.