Related papers: Voice-preserving Zero-shot Multiple Accent Conversion

Voice-preserving Zero-shot Multiple Accent Conversion

URL: http://arxiv.org/abs/2211.13282v2
Date: Sat, 14 Oct 2023 06:27:13 GMT
Title: Voice-preserving Zero-shot Multiple Accent Conversion
Authors: Mumin Jin, Prashant Serai, Jilong Wu, Andros Tjandra, Vimal Manohar, Qing He
Abstract summary: An accent conversion system changes a speaker's accent but preserves that speaker's voice identity. We use adversarial learning to disentangle accent dependent features while retaining other acoustic characteristics. Our model generates audio that sound closer to the target accent and like the original speaker.
Score: 14.218374374305421
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent. For native speakers, understanding or speaking a new accent is likewise a difficult task. An accent conversion system that changes a speaker's accent but preserves that speaker's voice identity, such as timbre and pitch, has the potential for a range of applications, such as communication, language learning, and entertainment. Existing accent conversion models tend to change the speaker identity and accent at the same time. Here, we use adversarial learning to disentangle accent dependent features while retaining other acoustic characteristics. What sets our work apart from existing accent conversion models is the capability to convert an unseen speaker's utterance to multiple accents while preserving its original voice identity. Subjective evaluations show that our model generates audio that sound closer to the target accent and like the original speaker.

Related papers

Improving Pronunciation and Accent Conversion through Knowledge Distillation And Synthetic Ground-Truth from Native TTS [52.89324095217975]
Previous approaches on accent conversion mainly aimed at making non-native speech sound more native. We develop a new AC approach that not only focuses on accent conversion but also improves pronunciation of non-native accented speaker.
arXiv Detail & Related papers (2024-10-19T06:12:31Z)
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS [56.18382038512251]
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity. Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent. This paper presents a promising AC model that can convert many accents into native to overcome these issues.
arXiv Detail & Related papers (2024-09-30T19:52:10Z)
Improving Self-supervised Pre-training using Accent-Specific Codebooks [48.409296549372414]
accent-aware adaptation technique for self-supervised learning. On the Mozilla Common Voice dataset, our proposed approach outperforms all other accent-adaptation approaches.
arXiv Detail & Related papers (2024-07-04T08:33:52Z)
Accented Speech Recognition With Accent-specific Codebooks [53.288874858671576]
Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems. Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR. We propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.
arXiv Detail & Related papers (2023-10-24T16:10:58Z)
A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units [94.64927912924087]
Existing systems ignore the correlation between prosody and language content, leading to degradation of naturalness in converted speech. We devise a cascaded modular system leveraging self-supervised discrete speech units as language representation. Experiments show that our system outperforms previous approaches in naturalness, intelligibility, speaker transferability, and prosody transferability.
arXiv Detail & Related papers (2022-11-12T00:54:09Z)
Analysis of French Phonetic Idiosyncrasies for Accent Recognition [0.8602553195689513]
Differences in pronunciation, in accent and intonation of speech in general, create one of the most common problems of speech recognition. We use traditional machine learning techniques and convolutional neural networks, and show that the classical techniques are not sufficiently efficient to solve this problem. In this paper, we focus our attention on the French accent. We also identify its limitation by understanding the impact of French idiosyncrasies on its spectrograms.
arXiv Detail & Related papers (2021-10-18T10:50:50Z)
Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder [2.4975981795360847]
We propose a new method based on feature disentanglement to tackle many to many voice conversion. The method has the capability to disentangle speaker identity and linguistic content from utterances. It can convert from many source speakers to many target speakers with a single autoencoder network.
arXiv Detail & Related papers (2021-07-11T13:31:16Z)
Defending Your Voice: Adversarial Attack on Voice Conversion [70.19396655909455]
We report the first known attempt to perform adversarial attack on voice conversion. We introduce human noise imperceptible into the utterances of a speaker whose voice is to be defended. It was shown that the speaker characteristics of the converted utterances were made obviously different from those of the defended speaker.
arXiv Detail & Related papers (2020-05-18T14:51:54Z)
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data [15.114637085644057]
We show that a simple transform in speaker space can be used to control the degree of accent of a synthetic voice in a language. The same transform can be applied even to monolingual speakers.
arXiv Detail & Related papers (2020-04-10T10:01:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.