Score Transformer: Generating Musical Score from Note-level
Representation
- URL: http://arxiv.org/abs/2112.00355v1
- Date: Wed, 1 Dec 2021 09:08:01 GMT
- Title: Score Transformer: Generating Musical Score from Note-level
Representation
- Authors: Masahiro Suzuki
- Abstract summary: We train the Transformer model to transcribe note-level representation into appropriate music notation.
We also explore an effective notation-level token representation to work with the model.
- Score: 2.3554584457413483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explore the tokenized representation of musical scores
using the Transformer model to automatically generate musical scores. Thus far,
sequence models have yielded fruitful results with note-level (MIDI-equivalent)
symbolic representations of music. Although the note-level representations can
comprise sufficient information to reproduce music aurally, they cannot contain
adequate information to represent music visually in terms of notation. Musical
scores contain various musical symbols (e.g., clef, key signature, and notes)
and attributes (e.g., stem direction, beam, and tie) that enable us to visually
comprehend musical content. However, automated estimation of these elements has
yet to be comprehensively addressed. In this paper, we first design score token
representation corresponding to the various musical elements. We then train the
Transformer model to transcribe note-level representation into appropriate
music notation. Evaluations of popular piano scores show that the proposed
method significantly outperforms existing methods on all 12 musical aspects
that were investigated. We also explore an effective notation-level token
representation to work with the model and determine that our proposed
representation produces the steadiest results.
Related papers
- End-to-end Piano Performance-MIDI to Score Conversion with Transformers [26.900974153235456]
We present an end-to-end deep learning approach that constructs detailed musical scores directly from real-world piano performance-MIDI files.
We introduce a modern transformer-based architecture with a novel tokenized representation for symbolic music data.
Our method is also the first to directly predict notational details like trill marks or stem direction from performance data.
arXiv Detail & Related papers (2024-09-30T20:11:37Z) - Toward a More Complete OMR Solution [49.74172035862698]
Optical music recognition aims to convert music notation into digital formats.
One approach to tackle OMR is through a multi-stage pipeline, where the system first detects visual music notation elements in the image.
We introduce a music object detector based on YOLOv8, which improves detection performance.
Second, we introduce a supervised training pipeline that completes the notation assembly stage based on detection output.
arXiv Detail & Related papers (2024-08-31T01:09:12Z) - Cluster and Separate: a GNN Approach to Voice and Staff Prediction for Score Engraving [5.572472212662453]
This paper approaches the problem of separating the notes from a quantized symbolic music piece (e.g., a MIDI file) into multiple voices and staves.
We propose an end-to-end system based on graph neural networks that notes that belong to the same chord and connect them with edges if they are part of a voice.
arXiv Detail & Related papers (2024-07-15T14:36:13Z) - Impact of time and note duration tokenizations on deep learning symbolic
music modeling [0.0]
We analyze the common tokenization methods and experiment with time and note duration representations.
We demonstrate that explicit information leads to better results depending on the task.
arXiv Detail & Related papers (2023-10-12T16:56:37Z) - RMSSinger: Realistic-Music-Score based Singing Voice Synthesis [56.51475521778443]
RMS-SVS aims to generate high-quality singing voices given realistic music scores with different note types.
We propose RMSSinger, the first RMS-SVS method, which takes realistic music scores as input.
In RMSSinger, we introduce word-level modeling to avoid the time-consuming phoneme duration annotation and the complicated phoneme-level mel-note alignment.
arXiv Detail & Related papers (2023-05-18T03:57:51Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Signal-domain representation of symbolic music for learning embedding
spaces [2.28438857884398]
We introduce a novel representation of symbolic music data, which transforms a polyphonic score into a continuous signal.
We show that our signal-like representation leads to better reconstruction and disentangled features.
arXiv Detail & Related papers (2021-09-08T06:36:02Z) - MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data.
MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z) - Embeddings as representation for symbolic music [0.0]
A representation technique that allows encoding music in a way that contains musical meaning would improve the results of any model trained for computer music tasks.
In this paper, we experiment with embeddings to represent musical notes from 3 different variations of a dataset and analyze if the model can capture useful musical patterns.
arXiv Detail & Related papers (2020-05-19T13:04:02Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.