Related papers: Generating Music with a Self-Correcting Non-Chronological Autoregressive Model

Generating Music with a Self-Correcting Non-Chronological Autoregressive Model

URL: http://arxiv.org/abs/2008.08927v1
Date: Tue, 18 Aug 2020 20:36:47 GMT
Title: Generating Music with a Self-Correcting Non-Chronological Autoregressive Model
Authors: Wayne Chi, Prachi Kumar, Suri Yaddanapudi, Rahul Suresh, Umut Isik
Abstract summary: We describe a novel approach for generating music using a self-correcting, non-chronological, autoregressive model. We represent music as a sequence of edit events, each of which denotes either the addition or removal of a note. During inference, we generate one edit event at a time using direct ancestral sampling.
Score: 6.289267097017553
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We describe a novel approach for generating music using a self-correcting, non-chronological, autoregressive model. We represent music as a sequence of edit events, each of which denotes either the addition or removal of a note---even a note previously generated by the model. During inference, we generate one edit event at a time using direct ancestral sampling. Our approach allows the model to fix previous mistakes such as incorrectly sampled notes and prevent accumulation of errors which autoregressive models are prone to have. Another benefit is a finer, note-by-note control during human and AI collaborative composition. We show through quantitative metrics and human survey evaluation that our approach generates better results than orderless NADE and Gibbs sampling approaches.

Related papers

Retrieval Augmented Anomaly Detection (RAAD): Nimble Model Adjustment Without Retraining [3.037546128667634]
We introduce Retrieval Augmented Anomaly Detection, a novel method taking inspiration from Retrieval Augmented Generation. Human annotated examples are sent to a vector store, which can modify model outputs on the very next processed batch for model inference.
arXiv Detail & Related papers (2025-02-26T20:17:16Z)
Autoregressive Speech Synthesis without Vector Quantization [135.4776759536272]
We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS) MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition.
arXiv Detail & Related papers (2024-07-11T14:36:53Z)
Serenade: A Model for Human-in-the-loop Automatic Chord Estimation [1.6385815610837167]
We show that a human-in-the-loop approach improves harmonic analysis performance over a model-only approach. We evaluate our model on a dataset of popular music and show that, with this human-in-the-loop approach, harmonic analysis performance improves over a model-only approach.
arXiv Detail & Related papers (2023-10-17T11:31:29Z)
Incomplete Utterance Rewriting as Sequential Greedy Tagging [0.0]
We introduce speaker-aware embedding to model speaker variation. Our model achieves optimal results on all nine restoration scores while having other metric scores comparable to previous state-of-the-art models.
arXiv Detail & Related papers (2023-07-08T04:05:04Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Deep Graph Reprogramming [112.34663053130073]
"Deep graph reprogramming" is a model reusing task tailored for graph neural networks (GNNs) We propose an innovative Data Reprogramming paradigm alongside a Model Reprogramming paradigm.
arXiv Detail & Related papers (2023-04-28T02:04:29Z)
DiffusER: Discrete Diffusion via Edit-based Reconstruction [88.62707047517914]
DiffusER is an edit-based generative model for text based on denoising diffusion models. It can rival autoregressive models on several tasks spanning machine translation, summarization, and style transfer. It can also perform other varieties of generation that standard autoregressive models are not well-suited for.
arXiv Detail & Related papers (2022-10-30T16:55:23Z)
Variable-Length Music Score Infilling via XLNet and Musically Specialized Positional Encoding [37.725607373307646]
This paper proposes a new self-attention based model for music score infilling. It generates a polyphonic music sequence that fills in the gap between given past and future contexts.
arXiv Detail & Related papers (2021-08-11T07:07:21Z)
Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series. Our model parameterizes mean and variance for each time-stamp with flexible neural networks. We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z)
Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization [48.55126268721948]
We present a generative adversarial network (GAN)-based model for unconditional generation of the mel-spectrograms of singing voices. We employ a hierarchical architecture in the generator to induce some structure in the temporal dimension. We evaluate the performance of the new model not only for generating singing voices, but also for generating speech voices.
arXiv Detail & Related papers (2020-05-18T08:35:16Z)
Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context. Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song. Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.