Self-Supervised Hierarchical Metrical Structure Modeling
- URL: http://arxiv.org/abs/2210.17183v1
- Date: Mon, 31 Oct 2022 10:05:19 GMT
- Title: Self-Supervised Hierarchical Metrical Structure Modeling
- Authors: Junyan Jiang and Gus Xia
- Abstract summary: We propose a novel method to model hierarchical metrical structures for both symbolic music and audio signals.
The model trains and inferences on beat-aligned music signals and predicts an 8-layer hierarchical metrical tree from beat, measure to the section level.
All demos, source code and pre-trained models are publicly available on GitHub.
- Score: 3.167685495996986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel method to model hierarchical metrical structures for both
symbolic music and audio signals in a self-supervised manner with minimal
domain knowledge. The model trains and inferences on beat-aligned music signals
and predicts an 8-layer hierarchical metrical tree from beat, measure to the
section level. The training procedural does not require any hierarchical
metrical labeling except for beats, purely relying on the nature of metrical
regularity and inter-voice consistency as inductive biases. We show in
experiments that the method achieves comparable performance with supervised
baselines on multiple metrical structure analysis tasks on both symbolic music
and audio signals. All demos, source code and pre-trained models are publicly
available on GitHub.
Related papers
- Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models [5.736540322759929]
We make the first attempt to model a full music piece under the realization of compositional hierarchy.
High-level languages reveal whole-song form, phrase, and cadence, whereas the low-level languages focus on notes, chords, and their local patterns.
Experiments and analysis show that our model is capable of generating full-piece music with recognizable global verse-chorus structure and cadences.
arXiv Detail & Related papers (2024-05-16T08:48:23Z) - Structure-informed Positional Encoding for Music Generation [0.0]
We propose a structure-informed positional encoding framework for music generation with Transformers.
We test them on two symbolic music generation tasks: next-timestep prediction and accompaniment generation.
Our methods improve the melodic and structural consistency of the generated pieces.
arXiv Detail & Related papers (2024-02-20T13:41:35Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data.
We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints.
Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z) - Learning Hierarchical Metrical Structure Beyond Measures [3.7294116330265394]
hierarchical structure annotations are helpful for music information retrieval and computer musicology.
We propose a data-driven approach to automatically extract hierarchical metrical structures from scores.
We show by experiments that the proposed method performs better than the rule-based approach under different orchestration settings.
arXiv Detail & Related papers (2022-09-21T11:08:52Z) - Learning with Neighbor Consistency for Noisy Labels [69.83857578836769]
We present a method for learning from noisy labels that leverages similarities between training examples in feature space.
We evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, Clothing1M, mini-ImageNet-Red) noise.
arXiv Detail & Related papers (2022-02-04T15:46:27Z) - Noisy Labels Can Induce Good Representations [53.47668632785373]
We study how architecture affects learning with noisy labels.
We show that training with noisy labels can induce useful hidden representations, even when the model generalizes poorly.
This finding leads to a simple method to improve models trained on noisy labels.
arXiv Detail & Related papers (2020-12-23T18:58:05Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - A Framework for Generative and Contrastive Learning of Audio
Representations [2.8935588665357077]
We present a framework for contrastive learning for audio representations in a self supervised frame work without access to ground truth labels.
We also explore generative models based on state of the art transformer based architectures for learning latent spaces for audio signals.
Our system achieves considerable performance, compared to a fully supervised method, with access to ground truth labels to train the neural network model.
arXiv Detail & Related papers (2020-10-22T05:52:32Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Continuous Melody Generation via Disentangled Short-Term Representations
and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context.
Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song.
Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.