Signal-domain representation of symbolic music for learning embedding
spaces
- URL: http://arxiv.org/abs/2109.03454v1
- Date: Wed, 8 Sep 2021 06:36:02 GMT
- Title: Signal-domain representation of symbolic music for learning embedding
spaces
- Authors: Mathieu Prang (IRCAM), Philippe Esling
- Abstract summary: We introduce a novel representation of symbolic music data, which transforms a polyphonic score into a continuous signal.
We show that our signal-like representation leads to better reconstruction and disentangled features.
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key aspect of machine learning models lies in their ability to learn
efficient intermediate features. However, the input representation plays a
crucial role in this process, and polyphonic musical scores remain a
particularly complex type of information. In this paper, we introduce a novel
representation of symbolic music data, which transforms a polyphonic score into
a continuous signal. We evaluate the ability to learn meaningful features from
this representation from a musical point of view. Hence, we introduce an
evaluation method relying on principled generation of synthetic data. Finally,
to test our proposed representation we conduct an extensive benchmark against
recent polyphonic symbolic representations. We show that our signal-like
representation leads to better reconstruction and disentangled features. This
improvement is reflected in the metric properties and in the generation ability
of the space learned from our signal-like representation according to music
theory properties.
Related papers
- Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation [0.0]
Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI.
This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music.
arXiv Detail & Related papers (2024-08-27T12:34:41Z) - Impact of time and note duration tokenizations on deep learning symbolic
music modeling [0.0]
We analyze the common tokenization methods and experiment with time and note duration representations.
We demonstrate that explicit information leads to better results depending on the task.
arXiv Detail & Related papers (2023-10-12T16:56:37Z) - Cadence Detection in Symbolic Classical Music using Graph Neural
Networks [7.817685358710508]
We present a graph representation of symbolic scores as an intermediate means to solve the cadence detection task.
We approach cadence detection as an imbalanced node classification problem using a Graph Convolutional Network.
Our experiments suggest that graph convolution can learn non-local features that assist in cadence detection, freeing us from the need of having to devise specialized features that encode non-local context.
arXiv Detail & Related papers (2022-08-31T12:39:57Z) - Towards Disentangled Speech Representations [65.7834494783044]
We construct a representation learning task based on joint modeling of ASR and TTS.
We seek to learn a representation of audio that disentangles that part of the speech signal that is relevant to transcription from that part which is not.
We show that enforcing these properties during training improves WER by 24.5% relative on average for our joint modeling task.
arXiv Detail & Related papers (2022-08-28T10:03:55Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Score Transformer: Generating Musical Score from Note-level
Representation [2.3554584457413483]
We train the Transformer model to transcribe note-level representation into appropriate music notation.
We also explore an effective notation-level token representation to work with the model.
arXiv Detail & Related papers (2021-12-01T09:08:01Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Vector-Quantized Timbre Representation [53.828476137089325]
This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features.
We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution.
We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments.
arXiv Detail & Related papers (2020-07-13T12:35:45Z) - Embeddings as representation for symbolic music [0.0]
A representation technique that allows encoding music in a way that contains musical meaning would improve the results of any model trained for computer music tasks.
In this paper, we experiment with embeddings to represent musical notes from 3 different variations of a dataset and analyze if the model can capture useful musical patterns.
arXiv Detail & Related papers (2020-05-19T13:04:02Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.