PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
- URL: http://arxiv.org/abs/2406.09326v1
- Date: Thu, 13 Jun 2024 17:05:23 GMT
- Title: PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
- Authors: Qijun Gan, Song Wang, Shengtao Wu, Jianke Zhu,
- Abstract summary: We construct a piano-hand motion generation benchmark to guide hand movements and fingerings for piano playing.
To this end, we collect an annotated dataset, PianoMotion10M, consisting of 116 hours of piano playing videos from a bird's-eye view with 10 million annotated hand poses.
- Score: 15.21347897534943
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, artificial intelligence techniques for education have been received increasing attentions, while it still remains an open problem to design the effective music instrument instructing systems. Although key presses can be directly derived from sheet music, the transitional movements among key presses require more extensive guidance in piano performance. In this work, we construct a piano-hand motion generation benchmark to guide hand movements and fingerings for piano playing. To this end, we collect an annotated dataset, PianoMotion10M, consisting of 116 hours of piano playing videos from a bird's-eye view with 10 million annotated hand poses. We also introduce a powerful baseline model that generates hand motions from piano audios through a position predictor and a position-guided gesture generator. Furthermore, a series of evaluation metrics are designed to assess the performance of the baseline model, including motion similarity, smoothness, positional accuracy of left and right hands, and overall fidelity of movement distribution. Despite that piano key presses with respect to music scores or audios are already accessible, PianoMotion10M aims to provide guidance on piano fingering for instruction purposes. The dataset and source code can be accessed at https://agnjason.github.io/PianoMotion-page.
Related papers
- FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance [15.909113091360206]
Hand motion models with the sophistication to accurately recreate piano playing have a wide range of applications in character animation, embodied AI, biomechanics, and VR/AR.
In this paper, we construct a first-of-its-kind large-scale dataset that contains approximately 10 hours of 3D hand motion and audio from 15 elite-level pianists playing 153 pieces of classical music.
arXiv Detail & Related papers (2024-10-08T08:21:05Z) - RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands [57.64308229980045]
We introduce the Robot Piano 1 Million dataset, containing bi-manual robot piano playing motion data of more than one million trajectories.
We formulate finger placements as an optimal transport problem, thus, enabling automatic annotation of vast amounts of unlabeled songs.
Benchmarking existing imitation learning approaches shows that such approaches reach state-of-the-art robot piano playing performance by leveraging RP1M.
arXiv Detail & Related papers (2024-08-20T17:56:52Z) - Modeling Bends in Popular Music Guitar Tablatures [49.64902130083662]
Tablature notation is widely used in popular music to transcribe and share guitar musical content.
This paper focuses on bends, which enable to progressively shift the pitch of a note, therefore circumventing physical limitations of the discrete fretted fingerboard.
Experiments are performed on a corpus of 932 lead guitar tablatures of popular music and show that a decision tree successfully predicts bend occurrences with an F1 score of 0.71 anda limited amount of false positive predictions.
arXiv Detail & Related papers (2023-08-22T07:50:58Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - At Your Fingertips: Extracting Piano Fingering Instructions from Videos [45.643494669796866]
We consider the AI task of automating the extraction of fingering information from videos.
We show how to perform this task with high-accuracy using a combination of deep-learning modules.
We run the resulting system on 90 videos, resulting in high-quality piano fingering information of 150K notes.
arXiv Detail & Related papers (2023-03-07T09:09:13Z) - Visual motion analysis of the player's finger [3.299672391663527]
This work is about the extraction of the motion of fingers, in their three articulations, of a keyboard player from a video sequence.
The relevance of the problem involves several aspects, in fact, the extraction of the movements of the fingers may be used to compute the keystroke efficiency and individual joint contributions.
arXiv Detail & Related papers (2023-02-24T10:14:13Z) - Towards Learning to Play Piano with Dexterous Hands and Touch [79.48656721563795]
We demonstrate how an agent can learn directly from machine-readable music score to play the piano with dexterous hands on a simulated piano.
We achieve this by using a touch-augmented reward and a novel curriculum of tasks.
arXiv Detail & Related papers (2021-06-03T17:59:31Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.