Guided Variational Autoencoder for Speech Enhancement With a Supervised
Classifier
- URL: http://arxiv.org/abs/2102.06454v1
- Date: Fri, 12 Feb 2021 11:32:48 GMT
- Title: Guided Variational Autoencoder for Speech Enhancement With a Supervised
Classifier
- Authors: Guillaume Carbajal, Julius Richter, Timo Gerkmann
- Abstract summary: We propose to guide the variational autoencoder with a supervised classifier separately trained on noisy speech.
The estimated label is a high-level categorical variable describing the speech signal.
We evaluate our method with different types of labels on real recordings of different noisy environments.
- Score: 20.28217079480463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, variational autoencoders have been successfully used to learn a
probabilistic prior over speech signals, which is then used to perform speech
enhancement. However, variational autoencoders are trained on clean speech
only, which results in a limited ability of extracting the speech signal from
noisy speech compared to supervised approaches. In this paper, we propose to
guide the variational autoencoder with a supervised classifier separately
trained on noisy speech. The estimated label is a high-level categorical
variable describing the speech signal (e.g. speech activity) allowing for a
more informed latent distribution compared to the standard variational
autoencoder. We evaluate our method with different types of labels on real
recordings of different noisy environments. Provided that the label better
informs the latent distribution and that the classifier achieves good
performance, the proposed approach outperforms the standard variational
autoencoder and a conventional neural network-based supervised approach.
Related papers
- Towards General-Purpose Text-Instruction-Guided Voice Conversion [84.78206348045428]
This paper introduces a novel voice conversion model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice"
The proposed VC model is a neural language model which processes a sequence of discrete codes, resulting in the code sequence of converted speech.
arXiv Detail & Related papers (2023-09-25T17:52:09Z) - Improving the Intent Classification accuracy in Noisy Environment [9.447108578893639]
In this paper, we investigate how environmental noise and related noise reduction techniques to address the intent classification task with end-to-end neural models.
For this task, the use of speech enhancement greatly improves the classification accuracy in noisy conditions.
arXiv Detail & Related papers (2023-03-12T06:11:44Z) - SPADE: Self-supervised Pretraining for Acoustic DisEntanglement [2.294014185517203]
We introduce a self-supervised approach to disentangle room acoustics from speech.
Our results demonstrate that our proposed approach significantly improves performance over a baseline when labeled training data is scarce.
arXiv Detail & Related papers (2023-02-03T01:36:38Z) - Introducing Semantics into Speech Encoders [91.37001512418111]
We propose an unsupervised way of incorporating semantic information from large language models into self-supervised speech encoders without labeled audio transcriptions.
Our approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts.
arXiv Detail & Related papers (2022-11-15T18:44:28Z) - Bootstrapping meaning through listening: Unsupervised learning of spoken
sentence embeddings [4.582129557845177]
This study tackles the unsupervised learning of semantic representations for spoken utterances.
We propose WavEmbed, a sequential autoencoder that predicts hidden units from a dense representation of speech.
We also propose S-HuBERT to induce meaning through knowledge distillation.
arXiv Detail & Related papers (2022-10-23T21:16:09Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised
Speech Representation Disentanglement for One-shot Voice Conversion [54.29557210925752]
One-shot voice conversion can be effectively achieved by speech representation disentanglement.
We employ vector quantization (VQ) for content encoding and introduce mutual information (MI) as the correlation metric during training.
Experimental results reflect the superiority of the proposed method in learning effective disentangled speech representations.
arXiv Detail & Related papers (2021-06-18T13:50:38Z) - Disentanglement Learning for Variational Autoencoders Applied to
Audio-Visual Speech Enhancement [20.28217079480463]
We propose an adversarial training scheme for variational autoencoders to disentangle the label from the other latent variables.
We show the benefit of the proposed disentanglement learning when a voice activity label, estimated from visual data, is used for speech enhancement.
arXiv Detail & Related papers (2021-05-19T07:42:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.