Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition
- URL: http://arxiv.org/abs/2504.10079v1
- Date: Mon, 14 Apr 2025 10:23:22 GMT
- Title: Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition
- Authors: Hongyu Qu, Ling Xing, Rui Yan, Yazhou Yao, Guo-Sen Xie, Xiangbo Shu,
- Abstract summary: Few-shot action recognition (FSAR) aims to recognize novel action categories with few exemplars.<n>We propose HR2G-shot, a Hierarchical Relation-augmented Representation Generalization framework for FSAR.<n>It unifies three types of relation modeling (inter-frame, inter-video, and inter-task) to learn task-specific temporal patterns from a holistic view.
- Score: 53.02634128715853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot action recognition (FSAR) aims to recognize novel action categories with few exemplars. Existing methods typically learn frame-level representations independently for each video by designing various inter-frame temporal modeling strategies. However, they neglect explicit relation modeling between videos and tasks, thus failing to capture shared temporal patterns across videos and reuse temporal knowledge from historical tasks. In light of this, we propose HR2G-shot, a Hierarchical Relation-augmented Representation Generalization framework for FSAR, which unifies three types of relation modeling (inter-frame, inter-video, and inter-task) to learn task-specific temporal patterns from a holistic view. In addition to conducting inter-frame temporal interactions, we further devise two components to respectively explore inter-video and inter-task relationships: i) Inter-video Semantic Correlation (ISC) performs cross-video frame-level interactions in a fine-grained manner, thereby capturing task-specific query features and learning intra- and inter-class temporal correlations among support features; ii) Inter-task Knowledge Transfer (IKT) retrieves and aggregates relevant temporal knowledge from the bank, which stores diverse temporal patterns from historical tasks. Extensive experiments on five benchmarks show that HR2G-shot outperforms current top-leading FSAR methods.
Related papers
- AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification [8.516886985159928]
Dyadic social relationships are shaped by shared spatial and temporal experiences.<n>Current computational methods for modeling these relationships face three major challenges.<n>We propose AsyReC, a multimodal graph-based framework for asymmetric dyadic relationship classification.
arXiv Detail & Related papers (2025-04-07T12:52:23Z) - DreamRelation: Relation-Centric Video Customization [33.65405972817795]
Video customization refers to the creation of personalized videos that depict user-specified relations between two subjects.<n>While existing methods can personalize subject appearances and motions, they still struggle with complex video customization.<n>We propose DreamRelation, a novel approach capturing a small set of videos, leveraging two key components: Decoupling Learning and Dynamics Enhancement.
arXiv Detail & Related papers (2025-03-10T17:58:03Z) - Relational Temporal Graph Reasoning for Dual-task Dialogue Language
Understanding [39.76268402567324]
Dual-task dialog understanding language aims to tackle two correlative dialog language understanding tasks simultaneously via their inherent correlations.
We put forward a new framework, whose core is relational temporal graph reasoning.
Our models outperform state-of-the-art models by a large margin.
arXiv Detail & Related papers (2023-06-15T13:19:08Z) - Weakly-Supervised Spatio-Temporal Anomaly Detection in Surveillance
Video [128.41392860714635]
We introduce Weakly-Supervised Snoma-Temporally Detection (WSSTAD) in surveillance video.
WSSTAD aims to localize a-temporal tube (i.e. sequence of bounding boxes at consecutive times) that encloses abnormal event.
We propose a dual-branch network which takes as input proposals with multi-granularities in both spatial-temporal domains.
arXiv Detail & Related papers (2021-08-09T06:11:14Z) - Modeling long-term interactions to enhance action recognition [81.09859029964323]
We propose a new approach to under-stand actions in egocentric videos that exploits the semantics of object interactions at both frame and temporal levels.
We use a region-based approach that takes as input a primary region roughly corresponding to the user hands and a set of secondary regions potentially corresponding to the interacting objects.
The proposed approach outperforms the state-of-the-art in terms of action recognition on standard benchmarks.
arXiv Detail & Related papers (2021-04-23T10:08:15Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - Learning Modality Interaction for Temporal Sentence Localization and
Event Captioning in Videos [76.21297023629589]
We propose a novel method for learning pairwise modality interactions in order to better exploit complementary information for each pair of modalities in videos.
Our method turns out to achieve state-of-the-art performances on four standard benchmark datasets.
arXiv Detail & Related papers (2020-07-28T12:40:59Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.