Related papers: SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic

SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic

URL: http://arxiv.org/abs/2503.06522v1
Date: Sun, 09 Mar 2025 08:53:32 GMT
Title: SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic
Authors: Yuchen Yang, Wei Wang, Yifei Liu, Linfeng Dong, Hao Wu, Mingxin Zhang, Zhihang Zhong, Xiao Sun,
Abstract summary: Group Activity Understanding is predominantly studied as Group Activity Recognition task.<n>SGA-INTERACT is the first 3D skeleton-based benchmark for group activity understanding.<n>One2Many is a novel framework that employs a pretrained 3D skeleton backbone for unified individual feature extraction.
Score: 14.754589492355423
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Group Activity Understanding is predominantly studied as Group Activity Recognition (GAR) task. However, existing GAR benchmarks suffer from coarse-grained activity vocabularies and the only data form in single-view, which hinder the evaluation of state-of-the-art algorithms. To address these limitations, we introduce SGA-INTERACT, the first 3D skeleton-based benchmark for group activity understanding. It features complex activities inspired by basketball tactics, emphasizing rich spatial interactions and long-term dependencies. SGA-INTERACT introduces Temporal Group Activity Localization (TGAL) task, extending group activity understanding to untrimmed sequences, filling the gap left by GAR as a standalone task. In addition to the benchmark, we propose One2Many, a novel framework that employs a pretrained 3D skeleton backbone for unified individual feature extraction. This framework aligns with the feature extraction paradigm in RGB-based methods, enabling direct evaluation of RGB-based models on skeleton-based benchmarks. We conduct extensive evaluations on SGA-INTERACT using two skeleton-based methods, three RGB-based methods, and a proposed baseline within the One2Many framework. The general low performance of baselines highlights the benchmark's challenges, motivating advancements in group activity understanding.

Related papers

Towards Open-World Human Action Segmentation Using Graph Convolutional Networks [6.167678490008973]
Most existing learning-based methods excel in closed-world action segmentation.<n>We propose a structured framework for detecting and segmenting unseen actions.<n>We evaluate our framework on two challenging human-object recognition datasets.
arXiv Detail & Related papers (2025-07-01T14:00:39Z)
3D Skeleton-Based Action Recognition: A Review [60.0580120274659]
3D skeleton-based action recognition has become a prominent topic in the field of computer vision.<n>Previous reviews have predominantly adopted a model-oriented perspective, often neglecting the fundamental steps involved in skeleton-based action recognition.<n>This review aims to address these limitations by presenting a comprehensive, task-oriented framework for understanding skeleton-based action recognition.
arXiv Detail & Related papers (2025-06-01T09:04:12Z)
Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph [4.075741925017479]
Group Activity Recognition aims to understand collective activities from videos.<n>Existing solutions rely on the RGB modality, which encounters challenges such as background variations.<n>We design a panoramic graph that incorporates multi-person skeletons and objects to encapsulate group activity.
arXiv Detail & Related papers (2024-07-28T13:57:03Z)
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition [57.97930719585095]
We introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. Our approach is evaluated on various skeleton/language backbones and three large-scale datasets. The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains.
arXiv Detail & Related papers (2024-06-19T08:22:32Z)
Towards More Practical Group Activity Detection: A New Benchmark and Model [61.39427407758131]
Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video. We present a new dataset, dubbed Caf'e, which presents more practical scenarios and metrics. We also propose a new GAD model that deals with an unknown number of groups and latent group members efficiently and effectively.
arXiv Detail & Related papers (2023-12-05T16:48:17Z)
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition [45.419756454791674]
This paper introduces a novel approach to Social Group Activity (SoGAR) using Self-supervised Transformers. Our objective ensures that features extracted from contrasting views were consistent across self-temporal domains. Our proposed SoGAR method achieved state-of-the-art results on three group activity recognition benchmarks.
arXiv Detail & Related papers (2023-04-27T03:41:15Z)
DECOMPL: Decompositional Learning with Attention Pooling for Group Activity Recognition from a Single Volleyball Image [3.6144103736375857]
Group Activity Recognition (GAR) aims to detect the activity performed by multiple actors in a scene. We propose a novel GAR technique for volleyball videos, DECOMPL, which consists of two complementary branches. In the visual branch, it extracts the features using attention pooling in a selective way. In the coordinate branch, it considers the current configuration of the actors and extracts spatial information from the box coordinates.
arXiv Detail & Related papers (2023-03-11T16:30:51Z)
Learning Rational Subgoals from Demonstrations and Instructions [71.86713748450363]
We present a framework for learning useful subgoals that support efficient long-term planning to achieve novel goals. At the core of our framework is a collection of rational subgoals (RSGs), which are essentially binary classifiers over the environmental states. Given a goal description, the learned subgoals and the derived dependencies facilitate off-the-shelf planning algorithms, such as A* and RRT.
arXiv Detail & Related papers (2023-03-09T18:39:22Z)
Attentive pooling for Group Activity Recognition [23.241686027269928]
In group activity recognition, hierarchical framework is widely adopted to represent the relationships between individuals and their corresponding group. We propose a new contextual pooling scheme, named attentive pooling, which enables the weighted information transition from individual actions to group activity.
arXiv Detail & Related papers (2022-08-31T13:26:39Z)
Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage. We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets. By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z)
COMPOSER: Compositional Learning of Group Activity in Videos [33.526331969279106]
Group Activity Recognition (GAR) detects the activity performed by a group of actors in a short video clip. We propose COMPOSER, a Multiscale Transformer based architecture that performs attention-based reasoning over tokens at each scale. COMPOSER achieves a new state-of-the-art 94.5% accuracy with the keypoint-only modality.
arXiv Detail & Related papers (2021-12-11T01:25:46Z)
Spatio-temporal Relation Modeling for Few-shot Action Recognition [100.3999454780478]
We propose a few-shot action recognition framework, STRM, which enhances class-specific featureriminability while simultaneously learning higher-order temporal representations. Our approach achieves an absolute gain of 3.5% in classification accuracy, as compared to the best existing method in the literature.
arXiv Detail & Related papers (2021-12-09T18:59:14Z)
Social Adaptive Module for Weakly-supervised Group Activity Recognition [143.68241396839062]
This paper presents a new task named weakly-supervised group activity recognition (GAR) It differs from conventional GAR tasks in that only video-level labels are available, yet the important persons within each frame are not provided even in the training data. This eases us to collect and annotate a large-scale NBA dataset and thus raise new challenges to GAR.
arXiv Detail & Related papers (2020-07-18T16:40:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.