Related papers: Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition

Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition

URL: http://arxiv.org/abs/2109.09166v1
Date: Sun, 19 Sep 2021 16:59:37 GMT
Title: Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition
Authors: Xiaodan Hu, Narendra Ahuja
Abstract summary: We propose a Hierarchical Dance Video Recognition framework (HDVR) HDVR estimates 2D pose sequences, tracks dancers, and then simultaneously estimates corresponding 3D poses and 3D-to-2D imaging parameters. From the estimated 3D pose sequence, HDVR extracts body part movements, and therefrom dance genre.
Score: 13.289339907084424
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dance experts often view dance as a hierarchy of information, spanning low-level (raw images, image sequences), mid-levels (human poses and bodypart movements), and high-level (dance genre). We propose a Hierarchical Dance Video Recognition framework (HDVR). HDVR estimates 2D pose sequences, tracks dancers, and then simultaneously estimates corresponding 3D poses and 3D-to-2D imaging parameters, without requiring ground truth for 3D poses. Unlike most methods that work on a single person, our tracking works on multiple dancers, under occlusions. From the estimated 3D pose sequence, HDVR extracts body part movements, and therefrom dance genre. The resulting hierarchical dance representation is explainable to experts. To overcome noise and interframe correspondence ambiguities, we enforce spatial and temporal motion smoothness and photometric continuity over time. We use an LSTM network to extract 3D movement subsequences from which we recognize the dance genre. For experiments, we have identified 154 movement types, of 16 body parts, and assembled a new University of Illinois Dance (UID) Dataset, containing 1143 video clips of 9 genres covering 30 hours, annotated with movement and genre labels. Our experimental results demonstrate that our algorithms outperform the state-of-the-art 3D pose estimation methods, which also enhances our dance recognition performance.

Related papers

X-Dancer: Expressive Music to Human Dance Video Generation [26.544761204917336]
X-Dancer is a novel zero-shot music-driven image animation pipeline. It creates diverse and long-range lifelike human dance videos from a single static image.
arXiv Detail & Related papers (2025-02-24T18:47:54Z)
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses [57.17501809717155]
We present DreamDance, a novel method for animating human images using only skeleton pose sequences as conditional inputs. Our key insight is that human images naturally exhibit multiple levels of correlation. We construct the TikTok-Dance5K dataset, comprising 5K high-quality dance videos with detailed frame annotations.
arXiv Detail & Related papers (2024-11-30T08:42:13Z)
VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference [7.5565058831496055]
Current state-of-the-art visual pose estimation algorithms struggle to produce accurate monocular 4D poses. We propose VioPose: a novel multimodal network that hierarchically estimates dynamics. Our architecture is shown to produce accurate pose sequences, facilitating precise motion analysis, and outperforms SoTA.
arXiv Detail & Related papers (2024-11-19T20:57:15Z)
DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis [49.614150163184064]
Dance camera movements involve both continuous sequences of variable lengths and sudden changes to simulate the switching of multiple cameras. We propose to integrate cinematography knowledge by formulating this task as a three-stage process: animator detection, synthesis, and tween function prediction. Following this formulation, we design a novel end-to-end dance camera framework textbfDanceCamAnimator, which imitates human animation procedures and shows powerful-based controllability with variable lengths.
arXiv Detail & Related papers (2024-09-23T11:20:44Z)
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams [59.77837807004765]
This paper introduces a new problem, i.e., 3D human motion capture from an egocentric monocular event camera with a fisheye lens. Event streams have high temporal resolution and provide reliable cues for 3D human motion capture under high-speed human motions and rapidly changing illumination. Our EE3D demonstrates robustness and superior 3D accuracy compared to existing solutions while supporting real-time 3D pose update rates of 140Hz.
arXiv Detail & Related papers (2024-04-12T17:59:47Z)
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance [50.01162760878841]
We present DCM, a new multi-modal 3D dataset that combines camera movement with dance motion and music audio. This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community. We propose DanceCamera3D, a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy.
arXiv Detail & Related papers (2024-03-20T15:24:57Z)
TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities. Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z)
BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis. We focus on breakdancing which features acrobatic moves and tangled postures. Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z)
Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory [92.81383016482813]
We propose a novel music-to-dance framework, Bailando, for driving 3D characters to dance following a piece of music. We introduce an actor-critic Generative Pre-trained Transformer (GPT) that composes units to a fluent dance coherent to the music. Our proposed framework achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-03-24T13:06:43Z)
DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer [23.51701359698245]
In this paper, we reformulate it by a two-stage process, ie, a key pose generation and then an in-between parametric motion curve prediction. We propose a large-scale music conditioned 3D dance dataset, called PhantomDance, that is accurately labeled by experienced animators. Experiments demonstrate that the proposed method, even trained by existing datasets, can generate fluent, performative, and music-matched 3D dances.
arXiv Detail & Related papers (2021-03-18T12:17:38Z)
Learn to Dance with AIST++: Music Conditioned 3D Dance Generation [28.623222697548456]
We present a transformer-based learning framework for 3D dance generation conditioned on music. We also propose a new dataset of paired 3D motion and music called AIST++, which we reconstruct from the AIST multi-view dance videos.
arXiv Detail & Related papers (2021-01-21T18:59:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.