Subjective Evaluation of Deep Learning Models for Symbolic Music
Composition
- URL: http://arxiv.org/abs/2203.14641v1
- Date: Mon, 28 Mar 2022 10:56:55 GMT
- Title: Subjective Evaluation of Deep Learning Models for Symbolic Music
Composition
- Authors: Carlos Hernandez-Olivan, Jorge Abadias Puyuelo and Jose R. Beltran
- Abstract summary: We propose a subjective method to evaluate AI-based music composition systems.
We ask questions related to basic music principles to different levels of users based on their musical experience and knowledge.
- Score: 1.1677169430445211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models are typically evaluated to measure and compare their
performance on a given task. The metrics that are commonly used to evaluate
these models are standard metrics that are used for different tasks. In the
field of music composition or generation, the standard metrics used in other
fields have no clear meaning in terms of music theory. In this paper, we
propose a subjective method to evaluate AI-based music composition systems by
asking questions related to basic music principles to different levels of users
based on their musical experience and knowledge. We use this method to compare
state-of-the-art models for music composition with deep learning. We give the
results of this evaluation method and we compare the responses of each user
level for each evaluated model.
Related papers
- Benchmarking Music Generation Models and Metrics via Human Preference Studies [18.95453617434051]
We generate 6k songs using 12 state-of-the-art models and conduct a survey of 15k pairwise audio comparisons with 2.5k human participants.<n>To the best of our knowledge, this work is the first to rank current state-of-the-art music generation models and metrics based on human preference.
arXiv Detail & Related papers (2025-06-23T20:01:29Z) - Stem-JEPA: A Joint-Embedding Predictive Architecture for Musical Stem Compatibility Estimation [3.8570045844185237]
We present Stem-JEPA, a novel Joint-Embedding Predictive Architecture (JEPA) trained on a multi-track dataset.
Our model comprises two networks: an encoder and a predictor, which are jointly trained to predict the embeddings of compatible stems.
We evaluate our model's performance on a retrieval task on the MUSDB18 dataset, testing its ability to find the missing stem from a mix.
arXiv Detail & Related papers (2024-08-05T14:34:40Z) - Towards Explainable and Interpretable Musical Difficulty Estimation: A Parameter-efficient Approach [49.2787113554916]
Estimating music piece difficulty is important for organizing educational music collections.
Our work employs explainable descriptors for difficulty estimation in symbolic music representations.
Our approach, evaluated in piano repertoire categorized in 9 classes, achieved 41.4% accuracy independently, with a mean squared error (MSE) of 1.7.
arXiv Detail & Related papers (2024-08-01T11:23:42Z) - Investigating Personalization Methods in Text to Music Generation [21.71190700761388]
Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods.
For evaluation, we construct a novel dataset with prompts and music clips.
Our analysis shows that similarity metrics are in accordance with user preferences and that current personalization approaches tend to learn rhythmic music constructs more easily than melody.
arXiv Detail & Related papers (2023-09-20T08:36:34Z) - A Comprehensive Survey for Evaluation Methodologies of AI-Generated
Music [14.453416870193072]
This study aims to comprehensively evaluate the subjective, objective, and combined methodologies for assessing AI-generated music.
Ultimately, this study provides a valuable reference for unifying generative AI in the field of music evaluation.
arXiv Detail & Related papers (2023-08-26T02:44:33Z) - Learning Evaluation Models from Large Language Models for Sequence Generation [61.8421748792555]
We propose a three-stage evaluation model training method that utilizes large language models to generate labeled data for model-based metric development.
Experimental results on the SummEval benchmark demonstrate that CSEM can effectively train an evaluation model without human-labeled data.
arXiv Detail & Related papers (2023-08-08T16:41:16Z) - FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets [69.91340332545094]
We introduce FLASK, a fine-grained evaluation protocol for both human-based and model-based evaluation.
We experimentally observe that the fine-graininess of evaluation is crucial for attaining a holistic view of model performance.
arXiv Detail & Related papers (2023-07-20T14:56:35Z) - Model-Based Deep Learning: On the Intersection of Deep Learning and
Optimization [101.32332941117271]
Decision making algorithms are used in a multitude of different applications.
Deep learning approaches that use highly parametric architectures tuned from data without relying on mathematical models are becoming increasingly popular.
Model-based optimization and data-centric deep learning are often considered to be distinct disciplines.
arXiv Detail & Related papers (2022-05-05T13:40:08Z) - Boosting the Learning for Ranking Patterns [6.142272540492935]
This paper formulates the problem of learning pattern ranking functions as a multi-criteria decision making problem.
Our approach aggregates different interestingness measures into a single weighted linear ranking function, using an interactive learning procedure.
Experiments conducted on well-known datasets show that our approach significantly reduces the running time and returns precise pattern ranking.
arXiv Detail & Related papers (2022-03-05T10:22:44Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z) - Deep Learning Based Text Classification: A Comprehensive Review [75.8403533775179]
We provide a review of more than 150 deep learning based models for text classification developed in recent years.
We also provide a summary of more than 40 popular datasets widely used for text classification.
arXiv Detail & Related papers (2020-04-06T02:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.