Nonparallel Voice Conversion with Augmented Classifier Star Generative
Adversarial Networks
- URL: http://arxiv.org/abs/2008.12604v7
- Date: Tue, 10 Nov 2020 09:57:32 GMT
- Title: Nonparallel Voice Conversion with Augmented Classifier Star Generative
Adversarial Networks
- Authors: Hirokazu Kameoka, Takuhiro Kaneko, Kou Tanaka, Nobukatsu Hojo
- Abstract summary: We previously proposed a method that allows for nonparallel voice conversion (VC) by using a variant of generative adversarial networks (GANs) called StarGAN.
The main features of our method, called StarGAN-VC, are as follows: First, it requires no parallel utterances, transcriptions, or time alignment procedures for speech generator training.
We describe three formulations of StarGAN, including a newly introduced novel StarGAN variant called "Augmented classifier StarGAN (A-StarGAN)", and compare them in a nonparallel VC task.
- Score: 41.87886753817764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We previously proposed a method that allows for nonparallel voice conversion
(VC) by using a variant of generative adversarial networks (GANs) called
StarGAN. The main features of our method, called StarGAN-VC, are as follows:
First, it requires no parallel utterances, transcriptions, or time alignment
procedures for speech generator training. Second, it can simultaneously learn
mappings across multiple domains using a single generator network and thus
fully exploit available training data collected from multiple domains to
capture latent features that are common to all the domains. Third, it can
generate converted speech signals quickly enough to allow real-time
implementations and requires only several minutes of training examples to
generate reasonably realistic-sounding speech. In this paper, we describe three
formulations of StarGAN, including a newly introduced novel StarGAN variant
called "Augmented classifier StarGAN (A-StarGAN)", and compare them in a
nonparallel VC task. We also compare them with several baseline methods.
Related papers
- Generalized One-shot Domain Adaption of Generative Adversarial Networks [72.84435077616135]
The adaption of Generative Adversarial Network (GAN) aims to transfer a pre-trained GAN to a given domain with limited training data.
We consider that the adaptation from source domain to target domain can be decoupled into two parts: the transfer of global style like texture and color, and the emergence of new entities that do not belong to the source domain.
Our core objective is to constrain the gap between the internal distributions of the reference and syntheses by sliced Wasserstein distance.
arXiv Detail & Related papers (2022-09-08T09:24:44Z) - The ReprGesture entry to the GENEA Challenge 2022 [8.081712389287903]
This paper describes the ReprGesture entry to the Generation and Evaluation of Non-verbal Behaviour for Embodied Agents (GENEA) challenge 2022.
The GENEA challenge provides the processed datasets and performs crowdsourced evaluations to compare the performance of different gesture generation systems.
arXiv Detail & Related papers (2022-08-25T14:50:50Z) - Zero-Shot Logit Adjustment [89.68803484284408]
Generalized Zero-Shot Learning (GZSL) is a semantic-descriptor-based learning technique.
In this paper, we propose a new generation-based technique to enhance the generator's effect while neglecting the improvement of the classifier.
Our experiments demonstrate that the proposed technique achieves state-of-the-art when combined with the basic generator, and it can improve various generative zero-shot learning frameworks.
arXiv Detail & Related papers (2022-04-25T17:54:55Z) - StarGAN-VC+ASR: StarGAN-based Non-Parallel Voice Conversion Regularized
by Automatic Speech Recognition [23.75478998795749]
We propose the use of automatic speech recognition to assist model training.
We show that using our proposed method, StarGAN-VC can retain more linguistic information than vanilla StarGAN-VC.
arXiv Detail & Related papers (2021-08-10T01:18:31Z) - StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators [63.85888518950824]
We present a text-driven method that allows shifting a generative model to new domains.
We show that through natural language prompts and a few minutes of training, our method can adapt a generator across a multitude of domains.
arXiv Detail & Related papers (2021-08-02T14:46:46Z) - StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for
Natural-Sounding Voice Conversion [19.74933410443264]
We present an unsupervised many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2.
Our model is trained only with 20 English speakers.
It generalizes to a variety of voice conversion tasks, such as any-to-many, cross-lingual, and singing conversion.
arXiv Detail & Related papers (2021-07-21T23:44:17Z) - DINO: A Conditional Energy-Based GAN for Domain Translation [67.9879720396872]
Domain translation is the process of transforming data from one domain to another while preserving the common semantics.
Some of the most popular domain translation systems are based on conditional generative adversarial networks.
We propose a new framework, where two networks are simultaneously trained, in a supervised manner, to perform domain translation in opposite directions.
arXiv Detail & Related papers (2021-02-18T11:52:45Z) - On Efficient Training, Controllability and Compositional Generalization
of Insertion-based Language Generators [18.98725770517241]
InsNet is an insertion-based sequence model that can be trained as efficiently as transformer decoders.
We evaluate InsNet on story generation and CleVR-CoGENT captioning.
arXiv Detail & Related papers (2021-02-12T11:05:02Z) - Many-to-Many Voice Transformer Network [55.17770019619078]
This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework.
It enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech.
arXiv Detail & Related papers (2020-05-18T04:02:08Z) - Improving GANs for Speech Enhancement [19.836041050328102]
We propose to use multiple generators chained to perform multi-stage enhancement mapping.
We demonstrate that the proposed multi-stage enhancement approach outperforms the one-stage SEGAN baseline.
arXiv Detail & Related papers (2020-01-15T19:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.