Exploring Musical Roots: Applying Audio Embeddings to Empower Influence
Attribution for a Generative Music Model
- URL: http://arxiv.org/abs/2401.14542v1
- Date: Thu, 25 Jan 2024 22:20:42 GMT
- Title: Exploring Musical Roots: Applying Audio Embeddings to Empower Influence
Attribution for a Generative Music Model
- Authors: Julia Barnett, Hugo Flores Garcia, Bryan Pardo
- Abstract summary: We develop a methodology to identify similar pieces of music audio in a manner that is useful for understanding training data attribution.
We compare the effect of applying CLMR and CLAP embeddings to similarity measurement in a set of 5 million audio clips used to train VampNet.
This work is to incorporate automated influence attribution into generative modeling, which promises to let model creators and users move from ignorant appropriation to informed creation.
- Score: 6.476298483207895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Every artist has a creative process that draws inspiration from previous
artists and their works. Today, "inspiration" has been automated by generative
music models. The black box nature of these models obscures the identity of the
works that influence their creative output. As a result, users may
inadvertently appropriate, misuse, or copy existing artists' works. We
establish a replicable methodology to systematically identify similar pieces of
music audio in a manner that is useful for understanding training data
attribution. A key aspect of our approach is to harness an effective music
audio similarity measure. We compare the effect of applying CLMR and CLAP
embeddings to similarity measurement in a set of 5 million audio clips used to
train VampNet, a recent open source generative music model. We validate this
approach with a human listening study. We also explore the effect that
modifications of an audio example (e.g., pitch shifting, time stretching,
background noise) have on similarity measurements. This work is foundational to
incorporating automated influence attribution into generative modeling, which
promises to let model creators and users move from ignorant appropriation to
informed creation. Audio samples that accompany this paper are available at
https://tinyurl.com/exploring-musical-roots.
Related papers
- Combining audio control and style transfer using latent diffusion [1.705371629600151]
In this paper, we aim to unify explicit control and style transfer within a single model.
Our model can generate audio matching a timbre target, while specifying structure either with explicit controls or through another audio example.
We show that our method can generate cover versions of complete musical pieces by transferring rhythmic and melodic content to the style of a target audio in a different genre.
arXiv Detail & Related papers (2024-07-31T23:27:27Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - MusicRL: Aligning Music Generation to Human Preferences [62.44903326718772]
MusicRL is the first music generation system finetuned from human feedback.
We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences.
We train MusicRL-U, the first text-to-music model that incorporates human feedback at scale.
arXiv Detail & Related papers (2024-02-06T18:36:52Z) - StemGen: A music generation model that listens [9.489938613869864]
We present an alternative paradigm for producing music generation models that can listen and respond to musical context.
We describe how such a model can be constructed using a non-autoregressive, transformer-based model architecture.
The resulting model reaches the audio quality of state-of-the-art text-conditioned models, as well as exhibiting strong musical coherence with its context.
arXiv Detail & Related papers (2023-12-14T08:09:20Z) - Video2Music: Suitable Music Generation from Videos using an Affective
Multimodal Transformer model [32.801213106782335]
We develop a generative music AI framework, Video2Music, that can match a provided video.
In a thorough experiment, we show that our proposed framework can generate music that matches the video content in terms of emotion.
arXiv Detail & Related papers (2023-11-02T03:33:00Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - LyricJam Sonic: A Generative System for Real-Time Composition and
Musical Improvisation [13.269034230828032]
LyricJam Sonic is a novel tool for musicians to rediscover previous recordings, re-contextualize them with other recordings, and create original live music compositions in real-time.
A bi-modal AI-driven approach uses generated lyric lines to find matching audio clips from the artist's past studio recordings.
The intent is to keep the artists in a state of creative flow to music creation rather than taking them into an analytical/critical state of deliberately searching for past audio segments.
arXiv Detail & Related papers (2022-10-27T17:27:58Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Learning to Generate Music With Sentiment [1.8275108630751844]
This paper presents a generative Deep Learning model that can be directed to compose music with a given sentiment.
Besides music generation, the same model can be used for sentiment analysis of symbolic music.
arXiv Detail & Related papers (2021-03-09T03:16:52Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.