Related papers: Explicit Tonal Tension Conditioning via Dual-Level Beam Search for Symbolic Music Generation

Explicit Tonal Tension Conditioning via Dual-Level Beam Search for Symbolic Music Generation

URL: http://arxiv.org/abs/2511.19342v1
Date: Mon, 24 Nov 2025 17:41:04 GMT
Title: Explicit Tonal Tension Conditioning via Dual-Level Beam Search for Symbolic Music Generation
Authors: Maral Ebrahimzadeh, Gilberto Bernardes, Sebastian Stober,
Abstract summary: State-of-the-art symbolic music generation models have recently achieved remarkable output quality.<n>We propose a novel approach that integrates a computational tonal tension model into a Transformer framework.
Score: 3.033196534183858
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State-of-the-art symbolic music generation models have recently achieved remarkable output quality, yet explicit control over compositional features, such as tonal tension, remains challenging. We propose a novel approach that integrates a computational tonal tension model, based on tonal interval vector analysis, into a Transformer framework. Our method employs a two-level beam search strategy during inference. At the token level, generated candidates are re-ranked using model probability and diversity metrics to maintain overall quality. At the bar level, a tension-based re-ranking is applied to ensure that the generated music aligns with a desired tension curve. Objective evaluations indicate that our approach effectively modulates tonal tension, and subjective listening tests confirm that the system produces outputs that align with the target tension. These results demonstrate that explicit tension conditioning through a dual-level beam search provides a powerful and intuitive tool to guide AI-generated music. Furthermore, our experiments demonstrate that our method can generate multiple distinct musical interpretations under the same tension condition.

Related papers

SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator [54.562217603802075]
We introduce Sum of Naturalness and Alignment (SONA), which employs separate projections for naturalness (authenticity) and alignment in the final layer with an inductive bias.<n>Experiments on class-conditional generation tasks show thatSONA achieves superior sample quality and conditional alignment compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-10-06T08:26:06Z)
Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space [6.12877670327196]
This paper presents a novel approach to neural instrument sound synthesis using a two-stage semi-supervised learning framework.<n>We train a pitch-timbre disentangled 2D representation of audio samples using a Variational Autoencoder.<n>We use this representation as conditioning input for a Transformer-based generative model.
arXiv Detail & Related papers (2025-10-05T20:03:30Z)
DualReward: A Dynamic Reinforcement Learning Framework for Cloze Tests Distractor Generation [0.4660328753262075]
DualReward is a novel reinforcement learning framework for automatic distractor generation in cloze tests.<n>We evaluate our approach on both passage-level (CLOTH-F) and sentence-level (MCQ) cloze test datasets.
arXiv Detail & Related papers (2025-07-16T03:39:36Z)
Scaling Self-Supervised Representation Learning for Symbolic Piano Performance [52.661197827466886]
We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions.<n>We use a comparatively smaller, high-quality subset to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings.
arXiv Detail & Related papers (2025-06-30T14:00:14Z)
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation [75.86473375730392]
SongGen is a fully open-source, single-stage auto-regressive transformer for controllable song generation.<n>It supports two output modes: mixed mode, which generates a mixture of vocals and accompaniment directly, and dual-track mode, which synthesizes them separately.<n>To foster community engagement and future research, we will release our model weights, training code, annotated data, and preprocessing pipeline.
arXiv Detail & Related papers (2025-02-18T18:52:21Z)
MuseBarControl: Enhancing Fine-Grained Control in Symbolic Music Generation through Pre-Training and Counterfactual Loss [51.85076222868963]
We introduce a pre-training task designed to link control signals directly with corresponding musical tokens. We then implement a novel counterfactual loss that promotes better alignment between the generated music and the control prompts.
arXiv Detail & Related papers (2024-07-05T08:08:22Z)
Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens. Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z)
DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z)
Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework [3.029434408969759]
We present a novel approach for calculating the positivity or negativity of a chord progression within a lead sheet. Our approach is similar to a Neural Machine Translation (NMT) problem, as we include high-level conditions in the encoder part of the sequence-to-sequence architectures. The proposed strategy is able to generate lead sheets in a controllable manner, resulting in distributions of musical attributes similar to those of the training dataset.
arXiv Detail & Related papers (2021-04-27T09:04:21Z)
Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling [5.88864611435337]
We present a framework that can learn high-level feature representations with a limited amount of data. We refer to our proposed framework as Music FaderNets, which is inspired by the fact that low-level attributes can be continuously manipulated. We demonstrate that the model successfully learns the intrinsic relationship between arousal and its corresponding low-level attributes.
arXiv Detail & Related papers (2020-07-29T16:01:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.