3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation Approach
- URL: http://arxiv.org/abs/2408.16638v1
- Date: Thu, 29 Aug 2024 15:42:06 GMT
- Title: 3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation Approach
- Authors: Ryota Tanaka, Tomohiro Suzuki, Keisuke Fujii,
- Abstract summary: In figure skating, technical judgments are performed by watching skaters' 3D movements, and its part of the judging procedure can be regarded as a Temporal Action (TAS) task.
There is a lack of datasets and effective methods for TAS tasks requiring 3D pose data.
In this study, we first created the FS-Jump3D dataset of complex and dynamic figure skating jumps using optical markerless motion capture.
We also propose a new fine-grained figure skating jump TAS dataset annotation method with which TAS models can learn jump procedures.
- Score: 5.453385501324681
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Understanding human actions from videos is essential in many domains, including sports. In figure skating, technical judgments are performed by watching skaters' 3D movements, and its part of the judging procedure can be regarded as a Temporal Action Segmentation (TAS) task. TAS tasks in figure skating that automatically assign temporal semantics to video are actively researched. However, there is a lack of datasets and effective methods for TAS tasks requiring 3D pose data. In this study, we first created the FS-Jump3D dataset of complex and dynamic figure skating jumps using optical markerless motion capture. We also propose a new fine-grained figure skating jump TAS dataset annotation method with which TAS models can learn jump procedures. In the experimental results, we validated the usefulness of 3D pose features as input and the fine-grained dataset for the TAS model in figure skating. FS-Jump3D Dataset is available at https://github.com/ryota-skating/FS-Jump3D.
Related papers
- YourSkatingCoach: A Figure Skating Video Benchmark for Fine-Grained Element Analysis [10.444961818248624]
dataset contains 454 videos of jump elements, the detected skater skeletons in each video, along with the gold labels of the start and ending frames of each jump, together as a video benchmark for figure skating.
We propose air time detection, a novel motion analysis task, the goal of which is to accurately detect the duration of the air time of a jump.
To verify the generalizability of the fine-grained labels, we apply the same process to other sports as cross-sports tasks but for coarse-grained task action classification.
arXiv Detail & Related papers (2024-10-27T12:52:28Z) - S2O: Static to Openable Enhancement for Articulated 3D Objects [20.310491257189422]
We introduce the static to openable (S2O) task which creates interactive articulated 3D objects from static counterparts.
We formulate a unified framework to tackle this task, and curate a challenging dataset of openable 3D objects.
arXiv Detail & Related papers (2024-09-27T16:34:13Z) - TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - Reconstructing Animatable Categories from Videos [65.14948977749269]
Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging.
We present RAC that builds category 3D models from monocular videos while disentangling variations over instances and motion over time.
We show that 3D models of humans, cats, and dogs can be learned from 50-100 internet videos.
arXiv Detail & Related papers (2023-05-10T17:56:21Z) - Generating Continual Human Motion in Diverse 3D Scenes [56.70255926954609]
We introduce a method to synthesize animator guided human motion across 3D scenes.
We decompose the continual motion synthesis problem into walking along paths and transitioning in and out of the actions specified by the keypoints.
Our model can generate long sequences of diverse actions such as grabbing, sitting and leaning chained together.
arXiv Detail & Related papers (2023-04-04T18:24:22Z) - Unsupervised Volumetric Animation [54.52012366520807]
We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects.
Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos.
We show our model can obtain animatable 3D objects from a single volume or few images.
arXiv Detail & Related papers (2023-01-26T18:58:54Z) - Accidental Turntables: Learning 3D Pose by Watching Objects Turn [36.71437141704823]
We propose a technique for learning single-view 3D object pose estimation models by utilizing a new source of data -- in-the-wild videos where objects turn.
We show that classical structure-from-motion algorithms, coupled with the recent advances in instance detection and feature matching, provides surprisingly accurate relative 3D pose estimation on such videos.
arXiv Detail & Related papers (2022-12-13T00:46:26Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z) - SportsCap: Monocular 3D Human Motion Capture and Fine-grained
Understanding in Challenging Sports Videos [40.19723456533343]
We propose SportsCap -- the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input.
Our approach utilizes the semantic and temporally structured sub-motion prior in the embedding space for motion capture and understanding.
Based on such hybrid motion information, we introduce a multi-stream spatial-temporal Graph Convolutional Network(ST-GCN) to predict the fine-grained semantic action attributes.
arXiv Detail & Related papers (2021-04-23T07:52:03Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Temporal-Aware Self-Supervised Learning for 3D Hand Pose and Mesh
Estimation in Videos [32.12879364117658]
Estimating 3D hand pose directly from RGB images is challenging but has gained steady progress recently bytraining deep models with annotated 3D poses.
We propose a new framework of training3D pose estimation models from RGB images without usingexplicit 3D annotations.
arXiv Detail & Related papers (2020-12-06T07:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.