Related papers: A Training-Free Approach for Music Style Transfer with Latent Diffusion Models

Related papers

CleanStyle: Plug-and-Play Style Conditioning Purification for Text-to-Image Stylization [5.300721419484575]
CleanStyle is a plug-and-play framework that filters out content-related noise from the style embedding without retraining.<n>CleanStyleSVD dynamically suppresses tail components using a time-aware exponential schedule.<n>SS-CFG reuses the tail components to construct style-aware unconditional inputs.
arXiv Detail & Related papers (2026-02-24T09:33:05Z)
TeleStyle: Content-Preserving Style Transfer in Images and Videos [52.76027947278353]
We present TeleStyle, a lightweight model for both image and video stylization.<n>We curated a high-quality dataset of distinct specific styles and synthesized triplets using thousands of diverse, in-the-wild style categories.<n>TeleStyle achieves state-of-the-art performance across three core evaluation metrics: style similarity, content consistency, and aesthetic quality.
arXiv Detail & Related papers (2026-01-28T02:16:03Z)
Domain Generalizable Portrait Style Transfer [37.85739992959271]
We propose to establish dense semantic correspondence between the given input and reference portraits.<n>We obtain a warped reference semantically aligned with the input.<n>A style adapter is also designed to provide style guidance from the warped reference.
arXiv Detail & Related papers (2025-07-06T04:56:25Z)
Adaptive Accompaniment with ReaLchords [60.690020661819055]
We propose ReaLchords, an online generative model for improvising chord accompaniment to user melody.<n>We start with an online model pretrained by maximum likelihood, and use reinforcement learning to finetune the model for online use.
arXiv Detail & Related papers (2025-06-17T16:59:05Z)
Balanced Image Stylization with Style Matching Score [36.542802101359705]
Style Matching Score (SMS) is a novel optimization method for image stylization with diffusion models.<n>SMS balances style alignment and content preservation, outperforming state-of-the-art approaches.
arXiv Detail & Related papers (2025-03-10T17:58:02Z)
ImprovNet -- Generating Controllable Musical Improvisations with Iterative Corruption Refinement [6.873190001575463]
ImprovNet is a transformer-based architecture that generates expressive and controllable musical improvisations.<n>It can perform cross-genre and intra-genre improvisations, harmonize melodies with genre-specific styles, and execute short prompt continuation and infilling tasks.
arXiv Detail & Related papers (2025-02-06T21:45:38Z)
UniVST: A Unified Framework for Training-free Localized Video Style Transfer [102.52552893495475]
This paper presents UniVST, a unified framework for localized video style transfer based on diffusion models.<n>It operates without the need for training, offering a distinct advantage over existing diffusion methods that transfer style across entire videos.
arXiv Detail & Related papers (2024-10-26T05:28:02Z)
Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z)
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer [87.32518573172631]
ReSyncer fuses motion and appearance with unified training. It supports fast personalized fine-tuning, video-driven lip-syncing, the transfer of speaking styles, and even face swapping.
arXiv Detail & Related papers (2024-08-06T16:31:45Z)
Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder [57.574544285878794]
Ada-Adapter is a novel framework for few-shot style personalization of diffusion models. Our method enables efficient zero-shot style transfer utilizing a single reference image. We demonstrate the effectiveness of our approach on various artistic styles, including flat art, 3D rendering, and logo design.
arXiv Detail & Related papers (2024-07-08T02:00:17Z)
MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music. To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation) Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z)
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models [24.582948932985726]
This paper introduces a novel approach to the editing of music generated by text-to-music models. Our method transforms text editing to textitlatent space manipulation while adding an extra constraint to enforce consistency. Experimental results demonstrate superior performance over both zero-shot and certain supervised baselines in style and timbre transfer evaluations.
arXiv Detail & Related papers (2024-02-09T04:34:08Z)
ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer [57.6482608202409]
Textual style transfer is the task of transforming stylistic properties of text while preserving meaning. We introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles. We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.
arXiv Detail & Related papers (2023-08-29T17:36:02Z)
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies [32.482588500419006]
We build a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain. We propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup. In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music.
arXiv Detail & Related papers (2023-08-03T05:35:37Z)
Transfer Learning for Underrepresented Music Generation [0.9645196221785693]
We identify Iranian folk music as an example of such an OOD genre for MusicVAE, a large generative music model. We find that a combinational creativity transfer learning approach can efficiently adapt MusicVAE to an Iranian folk music dataset, indicating potential for generating underrepresented music genres in the future.
arXiv Detail & Related papers (2023-06-01T01:53:10Z)
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music. We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks'' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z)
StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator [85.40502725367506]
We propose StyleSync, an effective framework that enables high-fidelity lip synchronization. Specifically, we design a mask-guided spatial information encoding module that preserves the details of the given face. Our design also enables personalized lip-sync by introducing style space and generator refinement on only limited frames.
arXiv Detail & Related papers (2023-05-09T13:38:13Z)
ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models [67.66825818489406]
This paper introduces a text-to-waveform music generation model, underpinned by the utilization of diffusion models. Our methodology hinges on the innovative incorporation of free-form textual prompts as conditional factors to guide the waveform generation process. We demonstrate that our generated music in the waveform domain outperforms previous works by a large margin in terms of diversity, quality, and text-music relevance.
arXiv Detail & Related papers (2023-02-09T06:27:09Z)
Personalized Popular Music Generation Using Imitation and Structure [1.971709238332434]
We propose a statistical machine learning model that is able to capture and imitate the structure, melody, chord, and bass style from a given example seed song. An evaluation using 10 pop songs shows that our new representations and methods are able to create high-quality stylistic music.
arXiv Detail & Related papers (2021-05-10T23:43:00Z)
Self-Supervised VQ-VAE For One-Shot Music Style Transfer [2.6381163133447836]
We present a novel method for one-shot timbre transfer based on an extension of the vector-quantized variational autoencoder (VQ-VAE) We evaluate the method using a set of objective metrics and show that it is able to outperform selected baselines.
arXiv Detail & Related papers (2021-02-10T21:42:49Z)
Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity. Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z)
Incorporating Music Knowledge in Continual Dataset Augmentation for Music Generation [69.06413031969674]
Aug-Gen is a method of dataset augmentation for any music generation system trained on a resource-constrained domain. We apply Aug-Gen to Transformer-based chorale generation in the style of J.S. Bach, and show that this allows for longer training and results in better generative output.
arXiv Detail & Related papers (2020-06-23T21:06:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.