SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic
- URL: http://arxiv.org/abs/2503.06522v1
- Date: Sun, 09 Mar 2025 08:53:32 GMT
- Title: SGA-INTERACT: A 3D Skeleton-based Benchmark for Group Activity Understanding in Modern Basketball Tactic
- Authors: Yuchen Yang, Wei Wang, Yifei Liu, Linfeng Dong, Hao Wu, Mingxin Zhang, Zhihang Zhong, Xiao Sun,
- Abstract summary: Group Activity Understanding is predominantly studied as Group Activity Recognition task.<n>SGA-INTERACT is the first 3D skeleton-based benchmark for group activity understanding.<n>One2Many is a novel framework that employs a pretrained 3D skeleton backbone for unified individual feature extraction.
- Score: 14.754589492355423
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Group Activity Understanding is predominantly studied as Group Activity Recognition (GAR) task. However, existing GAR benchmarks suffer from coarse-grained activity vocabularies and the only data form in single-view, which hinder the evaluation of state-of-the-art algorithms. To address these limitations, we introduce SGA-INTERACT, the first 3D skeleton-based benchmark for group activity understanding. It features complex activities inspired by basketball tactics, emphasizing rich spatial interactions and long-term dependencies. SGA-INTERACT introduces Temporal Group Activity Localization (TGAL) task, extending group activity understanding to untrimmed sequences, filling the gap left by GAR as a standalone task. In addition to the benchmark, we propose One2Many, a novel framework that employs a pretrained 3D skeleton backbone for unified individual feature extraction. This framework aligns with the feature extraction paradigm in RGB-based methods, enabling direct evaluation of RGB-based models on skeleton-based benchmarks. We conduct extensive evaluations on SGA-INTERACT using two skeleton-based methods, three RGB-based methods, and a proposed baseline within the One2Many framework. The general low performance of baselines highlights the benchmark's challenges, motivating advancements in group activity understanding.
Related papers
- Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph [4.075741925017479]
Group Activity Recognition aims to understand collective activities from videos.<n>Existing solutions rely on the RGB modality, which encounters challenges such as background variations.<n>We design a panoramic graph that incorporates multi-person skeletons and objects to encapsulate group activity.
arXiv Detail & Related papers (2024-07-28T13:57:03Z) - Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition [57.97930719585095]
We introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales.
Our approach is evaluated on various skeleton/language backbones and three large-scale datasets.
The results showcase the universality and superior performance of PURLS, surpassing prior skeleton-based solutions and standard baselines from other domains.
arXiv Detail & Related papers (2024-06-19T08:22:32Z) - Towards More Practical Group Activity Detection: A New Benchmark and Model [61.39427407758131]
Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video.
We present a new dataset, dubbed Caf'e, which presents more practical scenarios and metrics.
We also propose a new GAD model that deals with an unknown number of groups and latent group members efficiently and effectively.
arXiv Detail & Related papers (2023-12-05T16:48:17Z) - SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition [45.419756454791674]
This paper introduces a novel approach to Social Group Activity (SoGAR) using Self-supervised Transformers.
Our objective ensures that features extracted from contrasting views were consistent across self-temporal domains.
Our proposed SoGAR method achieved state-of-the-art results on three group activity recognition benchmarks.
arXiv Detail & Related papers (2023-04-27T03:41:15Z) - DECOMPL: Decompositional Learning with Attention Pooling for Group
Activity Recognition from a Single Volleyball Image [3.6144103736375857]
Group Activity Recognition (GAR) aims to detect the activity performed by multiple actors in a scene.
We propose a novel GAR technique for volleyball videos, DECOMPL, which consists of two complementary branches.
In the visual branch, it extracts the features using attention pooling in a selective way.
In the coordinate branch, it considers the current configuration of the actors and extracts spatial information from the box coordinates.
arXiv Detail & Related papers (2023-03-11T16:30:51Z) - Learning Rational Subgoals from Demonstrations and Instructions [71.86713748450363]
We present a framework for learning useful subgoals that support efficient long-term planning to achieve novel goals.
At the core of our framework is a collection of rational subgoals (RSGs), which are essentially binary classifiers over the environmental states.
Given a goal description, the learned subgoals and the derived dependencies facilitate off-the-shelf planning algorithms, such as A* and RRT.
arXiv Detail & Related papers (2023-03-09T18:39:22Z) - Attentive pooling for Group Activity Recognition [23.241686027269928]
In group activity recognition, hierarchical framework is widely adopted to represent the relationships between individuals and their corresponding group.
We propose a new contextual pooling scheme, named attentive pooling, which enables the weighted information transition from individual actions to group activity.
arXiv Detail & Related papers (2022-08-31T13:26:39Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - COMPOSER: Compositional Learning of Group Activity in Videos [33.526331969279106]
Group Activity Recognition (GAR) detects the activity performed by a group of actors in a short video clip.
We propose COMPOSER, a Multiscale Transformer based architecture that performs attention-based reasoning over tokens at each scale.
COMPOSER achieves a new state-of-the-art 94.5% accuracy with the keypoint-only modality.
arXiv Detail & Related papers (2021-12-11T01:25:46Z) - Spatio-temporal Relation Modeling for Few-shot Action Recognition [100.3999454780478]
We propose a few-shot action recognition framework, STRM, which enhances class-specific featureriminability while simultaneously learning higher-order temporal representations.
Our approach achieves an absolute gain of 3.5% in classification accuracy, as compared to the best existing method in the literature.
arXiv Detail & Related papers (2021-12-09T18:59:14Z) - Social Adaptive Module for Weakly-supervised Group Activity Recognition [143.68241396839062]
This paper presents a new task named weakly-supervised group activity recognition (GAR)
It differs from conventional GAR tasks in that only video-level labels are available, yet the important persons within each frame are not provided even in the training data.
This eases us to collect and annotate a large-scale NBA dataset and thus raise new challenges to GAR.
arXiv Detail & Related papers (2020-07-18T16:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.