Related papers: Steering Autoregressive Music Generation with Recursive Feature Machines

Steering Autoregressive Music Generation with Recursive Feature Machines

URL: http://arxiv.org/abs/2510.19127v1
Date: Tue, 21 Oct 2025 23:23:14 GMT
Title: Steering Autoregressive Music Generation with Recursive Feature Machines
Authors: Daniel Zhao, Daniel Beaglehole, Taylor Berg-Kirkpatrick, Julian McAuley, Zachary Novack,
Abstract summary: MusicRFM is a framework that adapts Recursive Feature Machines (RFMs) to enable fine-grained, interpretable control over frozen, pre-trained music models.<n>RFMs analyze a model's internal gradients to produce interpretable "concept directions"<n>We present advanced mechanisms for this control, including dynamic, time-varying schedules and methods for the simultaneous enforcement of multiple musical properties.
Score: 43.475981527010276
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Controllable music generation remains a significant challenge, with existing methods often requiring model retraining or introducing audible artifacts. We introduce MusicRFM, a framework that adapts Recursive Feature Machines (RFMs) to enable fine-grained, interpretable control over frozen, pre-trained music models by directly steering their internal activations. RFMs analyze a model's internal gradients to produce interpretable "concept directions", or specific axes in the activation space that correspond to musical attributes like notes or chords. We first train lightweight RFM probes to discover these directions within MusicGen's hidden states; then, during inference, we inject them back into the model to guide the generation process in real-time without per-step optimization. We present advanced mechanisms for this control, including dynamic, time-varying schedules and methods for the simultaneous enforcement of multiple musical properties. Our method successfully navigates the trade-off between control and generation quality: we can increase the accuracy of generating a target musical note from 0.23 to 0.82, while text prompt adherence remains within approximately 0.02 of the unsteered baseline, demonstrating effective control with minimal impact on prompt fidelity. We release code to encourage further exploration on RFMs in the music domain.

Related papers

EditGen: Harnessing Cross-Attention Control for Instruction-Based Auto-Regressive Audio Editing [54.10773655199149]
We investigate leveraging cross-attention control for efficient audio editing within auto-regressive models.<n>Inspired by image editing methodologies, we develop a Prompt-to-Prompt-like approach that guides edits through cross and self-attention mechanisms.
arXiv Detail & Related papers (2025-07-15T08:44:11Z)
Fine-Grained control over Music Generation with Activation Steering [0.0]
We present a method for fine-grained control over music generation through inference-time interventions on an autoregressive generative music transformer called MusicGen.<n>Our approach enables timbre transfer, style transfer, and genre fusion by steering the residual stream using weights of linear probes trained on it.
arXiv Detail & Related papers (2025-06-11T23:02:39Z)
MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens. We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z)
DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time. We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z)
Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls [5.597394612661976]
Polyffusion is a diffusion model that generates polyphonic music scores by regarding music as image-like piano roll representations. We show that by using internal and external controls, Polyffusion unifies a wide range of music creation tasks.
arXiv Detail & Related papers (2023-07-19T06:36:31Z)
Anticipatory Music Transformer [60.15347393822849]
We introduce anticipation: a method for constructing a controllable generative model of a temporal point process. We focus on infilling control tasks, whereby the controls are a subset of the events themselves. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset.
arXiv Detail & Related papers (2023-06-14T16:27:53Z)
Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z)
Learning Style-Aware Symbolic Music Representations by Adversarial Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information. We introduce the first Music Adversarial Autoencoder (MusAE) Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.