Expressive Music Data Processing and Generation
- URL: http://arxiv.org/abs/2503.11896v1
- Date: Fri, 14 Mar 2025 21:56:07 GMT
- Title: Expressive Music Data Processing and Generation
- Authors: Jingwei Liu,
- Abstract summary: Musical expressivity and coherence are indispensable in music composition and performance.<n>We introduce a listening-based data-processing technique that captures the expressivity in musical performance.
- Score: 4.549093083765949
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Musical expressivity and coherence are indispensable in music composition and performance, while often neglected in modern AI generative models. In this work, we introduce a listening-based data-processing technique that captures the expressivity in musical performance. This technique derived from Weber's law reflects the human perceptual truth of listening and preserves musical subtlety and expressivity in the training input. To facilitate musical coherence, we model the output interdependencies among multiple arguments in the music data such as pitch, duration, velocity, etc. in the neural networks based on the probabilistic chain rule. In practice, we decompose the multi-output sequential model into single-output submodels and condition previously sampled outputs on the subsequent submodels to induce conditional distributions. Finally, to select eligible sequences from all generations, a tentative measure based on the output entropy was proposed. The entropy sequence is set as a criterion to select predictable and stable generations, which is further studied under the context of informational aesthetic measures to quantify musical pleasure and information gain along the music tendency.
Related papers
- Naturalistic Music Decoding from EEG Data via Latent Diffusion Models [14.882764251306094]
This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data.<n>We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics.
arXiv Detail & Related papers (2024-05-15T03:26:01Z) - A Survey of Music Generation in the Context of Interaction [3.6522809408725223]
Machine learning has been successfully used to compose and generate music, both melodies and polyphonic pieces.
Most of these models are not suitable for human-machine co-creation through live interaction.
arXiv Detail & Related papers (2024-02-23T12:41:44Z) - Generative Pre-training for Speech with Flow Matching [81.59952572752248]
We pre-trained a generative model, named SpeechFlow, on 60k hours of untranscribed speech with Flow Matching and masked conditions.
Experiment results show the pre-trained generative model can be fine-tuned with task-specific data to match or surpass existing expert models on speech enhancement, separation, and synthesis.
arXiv Detail & Related papers (2023-10-25T03:40:50Z) - Deep Generative Models of Music Expectation [2.900810893770134]
We propose to use modern deep probabilistic generative models in the form of a Diffusion Model to compute an approximate likelihood of a musical input sequence.
Unlike prior work, such a generative model parameterized by deep neural networks is able to learn complex non-linear features directly from a training set itself.
We show that pre-trained diffusion models indeed yield musical surprisal values which exhibit a negative quadratic relationship with measured subject 'liking' ratings.
arXiv Detail & Related papers (2023-10-05T12:25:39Z) - ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models [67.66825818489406]
This paper introduces a text-to-waveform music generation model, underpinned by the utilization of diffusion models.
Our methodology hinges on the innovative incorporation of free-form textual prompts as conditional factors to guide the waveform generation process.
We demonstrate that our generated music in the waveform domain outperforms previous works by a large margin in terms of diversity, quality, and text-music relevance.
arXiv Detail & Related papers (2023-02-09T06:27:09Z) - SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers [50.90457644954857]
In this work, we apply diffusion models to approach sequence-to-sequence text generation.
We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation.
Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
arXiv Detail & Related papers (2022-12-20T15:16:24Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Comparision Of Adversarial And Non-Adversarial LSTM Music Generative
Models [2.569647910019739]
This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data.
The evaluation indicates that adversarial training produces more aesthetically pleasing music.
arXiv Detail & Related papers (2022-11-01T20:23:49Z) - Deep Music Information Dynamics [1.6143012623830792]
We introduce a novel framework that combines two parallel streams - a low rate latent representation stream and a higher rate information dynamics derived from the musical data itself.
Motivated by rate-distortion theories of human cognition we propose a framework for exploring possible relations between imaginary anticipations existing in the listener's mind and information dynamics of the musical surface itself.
arXiv Detail & Related papers (2021-02-01T19:59:59Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - Melody-Conditioned Lyrics Generation with SeqGANs [81.2302502902865]
We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN)
We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
arXiv Detail & Related papers (2020-10-28T02:35:40Z) - Continuous Melody Generation via Disentangled Short-Term Representations
and Structural Conditions [14.786601824794369]
We present a model for composing melodies given a user specified symbolic scenario combined with a previous music context.
Our model is capable of generating long melodies by regarding 8-beat note sequences as basic units, and shares consistent rhythm pattern structure with another specific song.
Results show that the music generated by our model tends to have salient repetition structures, rich motives, and stable rhythm patterns.
arXiv Detail & Related papers (2020-02-05T06:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.