Do Music Generation Models Encode Music Theory?
- URL: http://arxiv.org/abs/2410.00872v1
- Date: Tue, 1 Oct 2024 17:06:30 GMT
- Title: Do Music Generation Models Encode Music Theory?
- Authors: Megan Wei, Michael Freeman, Chris Donahue, Chen Sun,
- Abstract summary: We introduce SynTheory, a synthetic MIDI and audio music theory dataset consisting of tempos, time signatures, notes, intervals, scales, chords, and chord progressions concepts.
We then propose a framework to probe for these music theory concepts in music foundation models and assess how strongly they encode these concepts within their internal representations.
Our findings suggest that music theory concepts are discernible within foundation models and that the degree to which they are detectable varies by model size and layer.
- Score: 10.987131058422742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Music foundation models possess impressive music generation capabilities. When people compose music, they may infuse their understanding of music into their work, by using notes and intervals to craft melodies, chords to build progressions, and tempo to create a rhythmic feel. To what extent is this true of music generation models? More specifically, are fundamental Western music theory concepts observable within the "inner workings" of these models? Recent work proposed leveraging latent audio representations from music generation models towards music information retrieval tasks (e.g. genre classification, emotion recognition), which suggests that high-level musical characteristics are encoded within these models. However, probing individual music theory concepts (e.g. tempo, pitch class, chord quality) remains under-explored. Thus, we introduce SynTheory, a synthetic MIDI and audio music theory dataset, consisting of tempos, time signatures, notes, intervals, scales, chords, and chord progressions concepts. We then propose a framework to probe for these music theory concepts in music foundation models (Jukebox and MusicGen) and assess how strongly they encode these concepts within their internal representations. Our findings suggest that music theory concepts are discernible within foundation models and that the degree to which they are detectable varies by model size and layer.
Related papers
- MusicFlow: Cascaded Flow Matching for Text Guided Music Generation [53.63948108922333]
MusicFlow is a cascaded text-to-music generation model based on flow matching.
We leverage masked prediction as the training objective, enabling the model to generalize to other tasks such as music infilling and continuation.
arXiv Detail & Related papers (2024-10-27T15:35:41Z) - Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music.
This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Graph-based Polyphonic Multitrack Music Generation [9.701208207491879]
This paper introduces a novel graph representation for music and a deep Variational Autoencoder that generates the structure and the content of musical graphs separately.
By separating the structure and content of musical graphs, it is possible to condition generation by specifying which instruments are played at certain times.
arXiv Detail & Related papers (2023-07-27T15:18:50Z) - A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data.
The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z) - Models of Music Cognition and Composition [0.0]
We first motivate why music is relevant to cognitive scientists and give an overview of the approaches to computational modelling of music cognition.
We then review literature on the various models of music perception, including non-computational models, computational non-cognitive models and computational cognitive models.
arXiv Detail & Related papers (2022-08-14T16:27:59Z) - MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training [97.91071692716406]
Symbolic music understanding refers to the understanding of music from the symbolic data.
MusicBERT is a large-scale pre-trained model for music understanding.
arXiv Detail & Related papers (2021-06-10T10:13:05Z) - Music Embedding: A Tool for Incorporating Music Theory into
Computational Music Applications [0.3553493344868413]
It is important to digitally represent music in a music theoretic and concise manner.
Existing approaches for representing music are ineffective in terms of utilizing music theory.
arXiv Detail & Related papers (2021-04-24T04:32:45Z) - Learning to Generate Music With Sentiment [1.8275108630751844]
This paper presents a generative Deep Learning model that can be directed to compose music with a given sentiment.
Besides music generation, the same model can be used for sentiment analysis of symbolic music.
arXiv Detail & Related papers (2021-03-09T03:16:52Z) - Using a Bi-directional LSTM Model with Attention Mechanism trained on
MIDI Data for Generating Unique Music [0.25559196081940677]
This paper proposes a bi-directional LSTM model with attention mechanism capable of generating similar type of music based on MIDI data.
The music generated by the model follows the theme/style of the music the model is trained on.
arXiv Detail & Related papers (2020-11-02T06:43:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.