Component attention network for multimodal dance improvisation
recognition
- URL: http://arxiv.org/abs/2310.05938v1
- Date: Thu, 24 Aug 2023 15:04:30 GMT
- Title: Component attention network for multimodal dance improvisation
recognition
- Authors: Jia Fu, Jiarui Tan, Wenjie Yin, Sepideh Pashami, M{\aa}rten Bj\"orkman
- Abstract summary: This paper explores the application and performance of multimodal fusion methods for human motion recognition in the context of dance improvisation.
We propose an attention-based model, component attention network (CANet), for multimodal fusion on three levels: 1) feature fusion with CANet, 2) model fusion with CANet and graph convolutional network (GCN), and 3) late fusion with a voting strategy.
- Score: 4.706373333495905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dance improvisation is an active research topic in the arts. Motion analysis
of improvised dance can be challenging due to its unique dynamics. Data-driven
dance motion analysis, including recognition and generation, is often limited
to skeletal data. However, data of other modalities, such as audio, can be
recorded and benefit downstream tasks. This paper explores the application and
performance of multimodal fusion methods for human motion recognition in the
context of dance improvisation. We propose an attention-based model, component
attention network (CANet), for multimodal fusion on three levels: 1) feature
fusion with CANet, 2) model fusion with CANet and graph convolutional network
(GCN), and 3) late fusion with a voting strategy. We conduct thorough
experiments to analyze the impact of each modality in different fusion methods
and distinguish critical temporal or component features. We show that our
proposed model outperforms the two baseline methods, demonstrating its
potential for analyzing improvisation in dance.
Related papers
- Listen to Rhythm, Choose Movements: Autoregressive Multimodal Dance Generation via Diffusion and Mamba with Decoupled Dance Dataset [8.721362823189077]
Listen to Rhythm, Choose Movements (LRCM) is a multimodal-guided diffusion framework supporting both diverse input modalities and autoregressive dance motion generation.<n>We will release the full dataset, and pretrained models publicly upon acceptance.
arXiv Detail & Related papers (2026-01-06T14:59:22Z) - UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework [54.337290937468175]
We propose UniMo, an autoregressive model for joint modeling of 2D human videos and 3D human motions within a unified framework.<n>We show that our method simultaneously generates corresponding videos and motions while performing accurate motion capture.
arXiv Detail & Related papers (2025-12-03T16:03:18Z) - DanceChat: Large Language Model-Guided Music-to-Dance Generation [8.455652926559427]
Music-to-dance generation aims to synthesize human dance motion conditioned on musical input.<n>We introduce DanceChat, a Large Language Model (LLM)-guided music-to-dance generation approach.
arXiv Detail & Related papers (2025-06-12T11:03:47Z) - Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment [87.20240797625648]
We introduce a novel task within the field of 3D dance generation, termed dance accompaniment.
It requires the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm.
We propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements.
arXiv Detail & Related papers (2024-03-27T17:57:02Z) - LM2D: Lyrics- and Music-Driven Dance Synthesis [28.884929875333846]
LM2D is designed to create dance conditioned on both music and lyrics in one diffusion generation step.
We introduce the first 3D dance-motion dataset that encompasses both music and lyrics, obtained with pose estimation technologies.
The results demonstrate LM2D is able to produce realistic and diverse dance matching both lyrics and music.
arXiv Detail & Related papers (2024-03-14T13:59:04Z) - TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration [75.37311932218773]
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
Our approach can generate realistic and coherent dance movements conditioned on both text and music while maintaining comparable performance with the two single modalities.
arXiv Detail & Related papers (2023-04-05T12:58:33Z) - BRACE: The Breakdancing Competition Dataset for Dance Motion Synthesis [123.73677487809418]
We introduce a new dataset aiming to challenge common assumptions in dance motion synthesis.
We focus on breakdancing which features acrobatic moves and tangled postures.
Our efforts produced the BRACE dataset, which contains over 3 hours and 30 minutes of densely annotated poses.
arXiv Detail & Related papers (2022-07-20T18:03:54Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Music-to-Dance Generation with Optimal Transport [48.92483627635586]
We propose a Music-to-Dance with Optimal Transport Network (MDOT-Net) for learning to generate 3D dance choreographs from music.
We introduce an optimal transport distance for evaluating the authenticity of the generated dance distribution and a Gromov-Wasserstein distance to measure the correspondence between the dance distribution and the input music.
arXiv Detail & Related papers (2021-12-03T09:37:26Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - Transflower: probabilistic autoregressive dance generation with
multimodal attention [31.308435764603658]
We present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context.
Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers.
arXiv Detail & Related papers (2021-06-25T20:14:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.