Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition
- URL: http://arxiv.org/abs/2109.09166v1
- Date: Sun, 19 Sep 2021 16:59:37 GMT
- Title: Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition
- Authors: Xiaodan Hu, Narendra Ahuja
- Abstract summary: We propose a Hierarchical Dance Video Recognition framework (HDVR)
HDVR estimates 2D pose sequences, tracks dancers, and then simultaneously estimates corresponding 3D poses and 3D-to-2D imaging parameters.
From the estimated 3D pose sequence, HDVR extracts body part movements, and therefrom dance genre.
- Score: 13.289339907084424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dance experts often view dance as a hierarchy of information, spanning
low-level (raw images, image sequences), mid-levels (human poses and bodypart
movements), and high-level (dance genre). We propose a Hierarchical Dance Video
Recognition framework (HDVR). HDVR estimates 2D pose sequences, tracks dancers,
and then simultaneously estimates corresponding 3D poses and 3D-to-2D imaging
parameters, without requiring ground truth for 3D poses. Unlike most methods
that work on a single person, our tracking works on multiple dancers, under
occlusions. From the estimated 3D pose sequence, HDVR extracts body part
movements, and therefrom dance genre. The resulting hierarchical dance
representation is explainable to experts. To overcome noise and interframe
correspondence ambiguities, we enforce spatial and temporal motion smoothness
and photometric continuity over time. We use an LSTM network to extract 3D
movement subsequences from which we recognize the dance genre. For experiments,
we have identified 154 movement types, of 16 body parts, and assembled a new
University of Illinois Dance (UID) Dataset, containing 1143 video clips of 9
genres covering 30 hours, annotated with movement and genre labels. Our
experimental results demonstrate that our algorithms outperform the
state-of-the-art 3D pose estimation methods, which also enhances our dance
recognition performance.
Related papers
- VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference [7.5565058831496055]
Current state-of-the-art visual pose estimation algorithms struggle to produce accurate monocular 4D poses.
We propose VioPose: a novel multimodal network that hierarchically estimates dynamics.
Our architecture is shown to produce accurate pose sequences, facilitating precise motion analysis, and outperforms SoTA.
arXiv Detail & Related papers (2024-11-19T20:57:15Z) - DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis [49.614150163184064]
Dance camera movements involve both continuous sequences of variable lengths and sudden changes to simulate the switching of multiple cameras.
We propose to integrate cinematography knowledge by formulating this task as a three-stage process: animator detection, synthesis, and tween function prediction.
Following this formulation, we design a novel end-to-end dance camera framework textbfDanceCamAnimator, which imitates human animation procedures and shows powerful-based controllability with variable lengths.
arXiv Detail & Related papers (2024-09-23T11:20:44Z) - EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams [59.77837807004765]
This paper introduces a new problem, i.e., 3D human motion capture from an egocentric monocular event camera with a fisheye lens.
Event streams have high temporal resolution and provide reliable cues for 3D human motion capture under high-speed human motions and rapidly changing illumination.
Our EE3D demonstrates robustness and superior 3D accuracy compared to existing solutions while supporting real-time 3D pose update rates of 140Hz.
arXiv Detail & Related papers (2024-04-12T17:59:47Z) - DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance [50.01162760878841]
We present DCM, a new multi-modal 3D dataset that combines camera movement with dance motion and music audio.
This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community.
We propose DanceCamera3D, a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy.
arXiv Detail & Related papers (2024-03-20T15:24:57Z) - TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z) - Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic
Memory [92.81383016482813]
We propose a novel music-to-dance framework, Bailando, for driving 3D characters to dance following a piece of music.
We introduce an actor-critic Generative Pre-trained Transformer (GPT) that composes units to a fluent dance coherent to the music.
Our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-24T13:06:43Z) - DanceFormer: Music Conditioned 3D Dance Generation with Parametric
Motion Transformer [23.51701359698245]
In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction.
We propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators.
Experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances.
arXiv Detail & Related papers (2021-03-18T12:17:38Z) - Learn to Dance with AIST++: Music Conditioned 3D Dance Generation [28.623222697548456]
We present a transformer-based learning framework for 3D dance generation conditioned on music.
We also propose a new dataset of paired 3D motion and music called AIST++, which we reconstruct from the AIST multi-view dance videos.
arXiv Detail & Related papers (2021-01-21T18:59:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.