Related papers: The Name-Free Gap: Policy-Aware Stylistic Control in Music Generation

The Name-Free Gap: Policy-Aware Stylistic Control in Music Generation

URL: http://arxiv.org/abs/2509.00654v1
Date: Sun, 31 Aug 2025 01:27:16 GMT
Title: The Name-Free Gap: Policy-Aware Stylistic Control in Music Generation
Authors: Ashwin Nagarajan, Hao-Wen Dong,
Abstract summary: We study whether lightweight, human-readable descriptors can provide a policy-robust alternative for stylistic control.<n>We evaluate two artists: Billie Eilish (vocal pop) and Ludovico Einaudi (instrumental piano)<n>Results show that artist names are the strongest control signal across both artists, while name-free descriptors recover much of this effect.
Score: 4.654067937895813
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-music models capture broad attributes such as instrumentation or mood, but fine-grained stylistic control remains an open challenge. Existing stylization methods typically require retraining or specialized conditioning, which complicates reproducibility and limits policy compliance when artist names are restricted. We study whether lightweight, human-readable modifiers sampled from a large language model can provide a policy-robust alternative for stylistic control. Using MusicGen-small, we evaluate two artists: Billie Eilish (vocal pop) and Ludovico Einaudi (instrumental piano). For each artist, we use fifteen reference excerpts and evaluate matched seeds under three conditions: baseline prompts, artist-name prompts, and five descriptor sets. All prompts are generated using a large language model. Evaluation uses both VGGish and CLAP embeddings with distributional and per-clip similarity measures, including a new min-distance attribution metric. Results show that artist names are the strongest control signal across both artists, while name-free descriptors recover much of this effect. This highlights that existing safeguards such as the restriction of artist names in music generation prompts may not fully prevent style imitation. Cross-artist transfers reduce alignment, showing that descriptors encode targeted stylistic cues. We also present a descriptor table across ten contemporary artists to illustrate the breadth of the tokens. Together these findings define the name-free gap, the controllability difference between artist-name prompts and policy-compliant descriptors, shown through a reproducible evaluation protocol for prompt-level controllability.

Related papers

Towards Effective Negation Modeling in Joint Audio-Text Models for Music [3.7723788828505125]
Joint audio-text models struggle with semantic phenomena such as negation.<n>We introduce negation through text augmentation and a dissimilarity-based contrastive loss.<n>We propose two protocols that frame negation modeling as retrieval and binary classification tasks.
arXiv Detail & Related papers (2026-01-20T13:06:48Z)
Identifying Prompted Artist Names from Generated Images [59.34482128911978]
A common and controversial use of text-to-image models is to generate pictures by explicitly naming artists.<n>We introduce a benchmark for prompted-artist recognition.<n>The dataset contains 1.95M images covering 110 artists.
arXiv Detail & Related papers (2025-07-24T17:59:44Z)
ArtistAuditor: Auditing Artist Style Pirate in Text-to-Image Generation Models [61.55816738318699]
We propose a novel method for data-use auditing in the text-to-image generation model.<n>ArtistAuditor employs a style extractor to obtain the multi-granularity style representations and treats artworks as samplings of an artist's style.<n>The experimental results on six combinations of models and datasets show that ArtistAuditor can achieve high AUC values.
arXiv Detail & Related papers (2025-04-17T16:15:38Z)
Towards Estimating Personal Values in Song Lyrics [5.170818712089796]
Most music widely consumed in Western Countries contains song lyrics, with U.S. samples reporting almost all of their song libraries contain lyrics. In this project, we take a perspectivist approach, guided by social science theory, to gathering annotations, estimating their quality, and aggregating them. We then compare aggregated ratings to estimates based on pre-trained sentence/word embedding models by employing a validated value dictionary.
arXiv Detail & Related papers (2024-08-22T19:22:55Z)
Synthetic Lyrics Detection Across Languages and Genres [4.987546582439803]
Large language models (LLMs) to generate music content, particularly lyrics, has gained in popularity.<n>Previous research has explored content detection in various domains, but no work has focused on the text modality, lyrics, in music.<n>We curated a diverse dataset of real and synthetic lyrics from multiple languages, music genres, and artists.<n>We performed a thorough evaluation of existing synthetic text detection approaches on lyrics, a previously unexplored data type.<n>Following both music and industrial constraints, we examined how well these approaches generalize across languages, scale with data availability, handle multilingual language content, and perform on novel genres in few-shot settings
arXiv Detail & Related papers (2024-06-21T15:19:21Z)
Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models [47.19481598385283]
ArtSavant is a tool to determine the unique style of an artist by comparing it to a reference dataset of works from WikiArt. We then perform a large-scale empirical study to provide quantitative insight on the prevalence of artistic style copying across 3 popular text-to-image generative models.
arXiv Detail & Related papers (2024-04-11T17:59:43Z)
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization [57.86083349873154]
Text-to-image customization aims to synthesize text-driven images for the given subjects. Existing works follow the pseudo-word paradigm, i.e., represent the given subjects as pseudo-words and then compose them with the given text. We present RealCustom that, for the first time, disentangles similarity from controllability by precisely limiting subject influence to relevant parts only.
arXiv Detail & Related papers (2024-03-01T12:12:09Z)
GATSY: Graph Attention Network for Music Artist Similarity [4.84315398254578]
GATSY is a novel recommendation system built upon graph attention networks and driven by a clusterized embedding of artists.<n>This paper introduces GATSY, a novel recommendation system built upon graph attention networks and driven by a clusterized embedding of artists.
arXiv Detail & Related papers (2023-11-01T16:36:19Z)
Unsupervised Melody-to-Lyric Generation [91.29447272400826]
We propose a method for generating high-quality lyrics without training on any aligned melody-lyric data. We leverage the segmentation and rhythm alignment between melody and lyrics to compile the given melody into decoding constraints. Our model can generate high-quality lyrics that are more on-topic, singable, intelligible, and coherent than strong baselines.
arXiv Detail & Related papers (2023-05-30T17:20:25Z)
Unsupervised Melody-Guided Lyrics Generation [84.22469652275714]
We propose to generate pleasantly listenable lyrics without training on melody-lyric aligned data. We leverage the crucial alignments between melody and lyrics and compile the given melody into constraints to guide the generation process.
arXiv Detail & Related papers (2023-05-12T20:57:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.