Investigating Personalization Methods in Text to Music Generation
- URL: http://arxiv.org/abs/2309.11140v1
- Date: Wed, 20 Sep 2023 08:36:34 GMT
- Title: Investigating Personalization Methods in Text to Music Generation
- Authors: Manos Plitsis, Theodoros Kouzelis, Georgios Paraskevopoulos, Vassilis
Katsouros, Yannis Panagakis
- Abstract summary: Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods.
For evaluation, we construct a novel dataset with prompts and music clips.
Our analysis shows that similarity metrics are in accordance with user preferences and that current personalization approaches tend to learn rhythmic music constructs more easily than melody.
- Score: 21.71190700761388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we investigate the personalization of text-to-music diffusion
models in a few-shot setting. Motivated by recent advances in the computer
vision domain, we are the first to explore the combination of pre-trained
text-to-audio diffusers with two established personalization methods. We
experiment with the effect of audio-specific data augmentation on the overall
system performance and assess different training strategies. For evaluation, we
construct a novel dataset with prompts and music clips. We consider both
embedding-based and music-specific metrics for quantitative evaluation, as well
as a user study for qualitative evaluation. Our analysis shows that similarity
metrics are in accordance with user preferences and that current
personalization approaches tend to learn rhythmic music constructs more easily
than melody. The code, dataset, and example material of this study are open to
the research community.
Related papers
- Audio-to-Score Conversion Model Based on Whisper methodology [0.0]
This thesis innovatively introduces the "Orpheus' Score", a custom notation system that converts music information into tokens.
Experiments show that compared to traditional algorithms, the model has significantly improved accuracy and performance.
arXiv Detail & Related papers (2024-10-22T17:31:37Z) - Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation [3.8570045844185237]
We present Stem-JEPA, a novel Joint-Embedding Predictive Architecture (JEPA) trained on a multi-track dataset.
Our model comprises two networks: an encoder and a predictor, which are jointly trained to predict the embeddings of compatible stems.
We evaluate our model's performance on a retrieval task on the MUSDB18 dataset, testing its ability to find the missing stem from a mix.
arXiv Detail & Related papers (2024-08-05T14:34:40Z) - Evaluating Co-Creativity using Total Information Flow [6.3289703660543495]
Co-creativity in music refers to two or more musicians or musical agents interacting with one another by composing or improvising music.
We propose a method to compute the information flow using pre-trained generative models as entropy estimators.
arXiv Detail & Related papers (2024-02-09T22:15:39Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Unsupervised Neural Stylistic Text Generation using Transfer learning
and Adapters [66.17039929803933]
We propose a novel transfer learning framework which updates only $0.3%$ of model parameters to learn style specific attributes for response generation.
We learn style specific attributes from the PERSONALITY-CAPTIONS dataset.
arXiv Detail & Related papers (2022-10-07T00:09:22Z) - Actions Speak Louder than Listening: Evaluating Music Style Transfer
based on Editing Experience [4.986422167919228]
We propose an editing test to evaluate users' editing experience of music generation models in a systematic way.
Results on two target styles indicate that the improvement over the baseline model can be reflected by the editing test quantitatively.
arXiv Detail & Related papers (2021-10-25T12:20:30Z) - Structure-Aware Audio-to-Score Alignment using Progressively Dilated
Convolutional Neural Networks [8.669338893753885]
The identification of structural differences between a music performance and the score is a challenging yet integral step of audio-to-score alignment.
We present a novel method to detect such differences using progressively dilated convolutional neural networks.
arXiv Detail & Related papers (2021-01-31T05:14:58Z) - A framework to compare music generative models using automatic
evaluation metrics extended to rhythm [69.2737664640826]
This paper takes the framework proposed in a previous research that did not consider rhythm to make a series of design decisions, then, rhythm support is added to evaluate the performance of two RNN memory cells in the creation of monophonic music.
The model considers the handling of music transposition and the framework evaluates the quality of the generated pieces using automatic quantitative metrics based on geometry which have rhythm support added as well.
arXiv Detail & Related papers (2021-01-19T15:04:46Z) - Sequence Generation using Deep Recurrent Networks and Embeddings: A
study case in music [69.2737664640826]
This paper evaluates different types of memory mechanisms (memory cells) and analyses their performance in the field of music composition.
A set of quantitative metrics is presented to evaluate the performance of the proposed architecture automatically.
arXiv Detail & Related papers (2020-12-02T14:19:19Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.