VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity Recognition
- URL: http://arxiv.org/abs/2502.09967v1
- Date: Fri, 14 Feb 2025 07:49:06 GMT
- Title: VicKAM: Visual Conceptual Knowledge Guided Action Map for Weakly Supervised Group Activity Recognition
- Authors: Zhuming Wang, Yihao Zheng, Jiarui Li, Yaofei Wu, Yan Huang, Zun Li, Lifang Wu, Liang Wang,
- Abstract summary: Existing weakly supervised group activity recognition methods rely on object detectors or attention mechanisms to capture key areas automatically.
We propose a novel framework named Visual Conceptual Knowledge Guided Action Map (VicKAM)
VicKAM effectively captures the locations of individual actions and integrates them with action semantics for weakly supervised group activity recognition.
- Score: 14.701516591822358
- License:
- Abstract: Existing weakly supervised group activity recognition methods rely on object detectors or attention mechanisms to capture key areas automatically. However, they overlook the semantic information associated with captured areas, which may adversely affect the recognition performance. In this paper, we propose a novel framework named Visual Conceptual Knowledge Guided Action Map (VicKAM) which effectively captures the locations of individual actions and integrates them with action semantics for weakly supervised group activity recognition.It generates individual action prototypes from training set as visual conceptual knowledge to bridge action semantics and visual representations. Guided by this knowledge, VicKAM produces action maps that indicate the likelihood of each action occurring at various locations, based on image correlation theorem. It further augments individual action maps using group activity related statistical information, representing individual action distribution under different group activities, to establish connections between action maps and specific group activities. The augmented action map is incorporated with action semantic representations for group activity recognition.Extensive experiments on two public benchmarks, the Volleyball and the NBA datasets, demonstrate the effectiveness of our proposed method, even in cases of limited training data. The code will be released later.
Related papers
- An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z) - AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition [51.24321348668037]
Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes.
Previous methods rely on manually annotated detection boxes in training and inference, hindering further practical deployment.
We propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes.
arXiv Detail & Related papers (2024-05-04T01:53:22Z) - Group Activity Recognition using Unreliable Tracked Pose [8.592249538742527]
Group activity recognition in video is a complex task due to the need for a model to recognise the actions of all individuals in the video.
We introduce an innovative deep learning-based group activity recognition approach called Rendered Pose based Group Activity Recognition System (RePGARS)
arXiv Detail & Related papers (2024-01-06T17:36:13Z) - Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes [23.284478293459856]
Action-slot is a slot attention-based approach that learns visual action-centric representations.
Our key idea is to design action slots that are capable of paying attention to regions where atomic activities occur.
To address the limitation, we collect a synthetic dataset called TACO, which is four times larger than OATS.
arXiv Detail & Related papers (2023-11-29T05:28:05Z) - Automatic Interaction and Activity Recognition from Videos of Human
Manual Demonstrations with Application to Anomaly Detection [0.0]
This paper exploits Scene Graphs to extract key interaction features from image sequences while simultaneously motion patterns and context.
The method introduces event-based automatic video segmentation and clustering, which allow for the grouping of similar events and detect if a monitored activity is executed correctly.
arXiv Detail & Related papers (2023-04-19T16:15:23Z) - Skeleton-Based Mutually Assisted Interacted Object Localization and
Human Action Recognition [111.87412719773889]
We propose a joint learning framework for "interacted object localization" and "human action recognition" based on skeleton data.
Our method achieves the best or competitive performance with the state-of-the-art methods for human action recognition.
arXiv Detail & Related papers (2021-10-28T10:09:34Z) - Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z) - Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised
Semantic Segmentation [88.49669148290306]
We propose a novel weakly supervised multi-task framework called AuxSegNet to leverage saliency detection and multi-label image classification as auxiliary tasks.
Inspired by their similar structured semantics, we also propose to learn a cross-task global pixel-level affinity map from the saliency and segmentation representations.
The learned cross-task affinity can be used to refine saliency predictions and propagate CAM maps to provide improved pseudo labels for both tasks.
arXiv Detail & Related papers (2021-07-25T11:39:58Z) - Home Action Genome: Cooperative Compositional Action Understanding [33.69990813932372]
Existing research on action recognition treats activities as monolithic events occurring in videos.
Cooperative Compositional Action Understanding (CCAU) is a cooperative learning framework for hierarchical action recognition.
We demonstrate the utility of co-learning compositions in few-shot action recognition by achieving 28.6% mAP with just a single sample.
arXiv Detail & Related papers (2021-05-11T17:42:47Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.