Self-Supervised VQ-VAE For One-Shot Music Style Transfer
- URL: http://arxiv.org/abs/2102.05749v1
- Date: Wed, 10 Feb 2021 21:42:49 GMT
- Title: Self-Supervised VQ-VAE For One-Shot Music Style Transfer
- Authors: Ond\v{r}ej C\'ifka, Alexey Ozerov, Umut \c{S}im\c{s}ekli, Ga\"el
Richard
- Abstract summary: We present a novel method for one-shot timbre transfer based on an extension of the vector-quantized variational autoencoder (VQ-VAE)
We evaluate the method using a set of objective metrics and show that it is able to outperform selected baselines.
- Score: 2.6381163133447836
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural style transfer, allowing to apply the artistic style of one image to
another, has become one of the most widely showcased computer vision
applications shortly after its introduction. In contrast, related tasks in the
music audio domain remained, until recently, largely untackled. While several
style conversion methods tailored to musical signals have been proposed, most
lack the 'one-shot' capability of classical image style transfer algorithms. On
the other hand, the results of existing one-shot audio style transfer methods
on musical inputs are not as compelling. In this work, we are specifically
interested in the problem of one-shot timbre transfer. We present a novel
method for this task, based on an extension of the vector-quantized variational
autoencoder (VQ-VAE), along with a simple self-supervised learning strategy
designed to obtain disentangled representations of timbre and pitch. We
evaluate the method using a set of objective metrics and show that it is able
to outperform selected baselines.
Related papers
- Combining audio control and style transfer using latent diffusion [1.705371629600151]
In this paper, we aim to unify explicit control and style transfer within a single model.
Our model can generate audio matching a timbre target, while specifying structure either with explicit controls or through another audio example.
We show that our method can generate cover versions of complete musical pieces by transferring rhythmic and melodic content to the style of a target audio in a different genre.
arXiv Detail & Related papers (2024-07-31T23:27:27Z) - Music Style Transfer With Diffusion Model [11.336043499372792]
This study proposes a music style transfer framework based on diffusion models (DM) and uses spectrogram-based methods to achieve multi-to-multi music style transfer.
The GuideDiff method is used to restore spectrograms to high-fidelity audio, accelerating audio generation speed and reducing noise in the generated audio.
arXiv Detail & Related papers (2024-04-23T06:22:19Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - StyTr^2: Unbiased Image Style Transfer with Transformers [59.34108877969477]
The goal of image style transfer is to render an image with artistic features guided by a style reference while maintaining the original content.
Traditional neural style transfer methods are usually biased and content leak can be observed by running several times of the style transfer process with the same reference image.
We propose a transformer-based approach, namely StyTr2, to address this critical issue.
arXiv Detail & Related papers (2021-05-30T15:57:09Z) - Single-Layer Vision Transformers for More Accurate Early Exits with Less
Overhead [88.17413955380262]
We introduce a novel architecture for early exiting based on the vision transformer architecture.
We show that our method works for both classification and regression problems.
We also introduce a novel method for integrating audio and visual modalities within early exits in audiovisual data analysis.
arXiv Detail & Related papers (2021-05-19T13:30:34Z) - Unsupervised Cross-Domain Singing Voice Conversion [105.1021715879586]
We present a wav-to-wav generative model for the task of singing voice conversion from any identity.
Our method utilizes both an acoustic model, trained for the task of automatic speech recognition, together with melody extracted features to drive a waveform-based generator.
arXiv Detail & Related papers (2020-08-06T18:29:11Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre
Transfer [34.02807083910344]
We introduce TimbreTron, a method for musical timbre transfer which applies "image" domain style transfer to a time-frequency representation of the audio signal.
We show that the Constant Q Transform representation is particularly well-suited to convolutional architectures.
arXiv Detail & Related papers (2018-11-22T17:46:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.