Improving Speech Enhancement through Fine-Grained Speech Characteristics
- URL: http://arxiv.org/abs/2207.00237v1
- Date: Fri, 1 Jul 2022 07:04:28 GMT
- Title: Improving Speech Enhancement through Fine-Grained Speech Characteristics
- Authors: Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe,
Bhiksha Raj
- Abstract summary: We propose a novel approach to speech enhancement aimed at improving perceptual quality and naturalness of enhanced signals.
We first identify key acoustic parameters that have been found to correlate well with voice quality.
We then propose objective functions which are aimed at reducing the difference between clean speech and enhanced speech with respect to these features.
- Score: 42.49874064240742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While deep learning based speech enhancement systems have made rapid progress
in improving the quality of speech signals, they can still produce outputs that
contain artifacts and can sound unnatural. We propose a novel approach to
speech enhancement aimed at improving perceptual quality and naturalness of
enhanced signals by optimizing for key characteristics of speech. We first
identify key acoustic parameters that have been found to correlate well with
voice quality (e.g. jitter, shimmer, and spectral flux) and then propose
objective functions which are aimed at reducing the difference between clean
speech and enhanced speech with respect to these features. The full set of
acoustic features is the extended Geneva Acoustic Parameter Set (eGeMAPS),
which includes 25 different attributes associated with perception of speech.
Given the non-differentiable nature of these feature computation, we first
build differentiable estimators of the eGeMAPS and then use them to fine-tune
existing speech enhancement systems. Our approach is generic and can be applied
to any existing deep learning based enhancement systems to further improve the
enhanced speech signals. Experimental results conducted on the Deep Noise
Suppression (DNS) Challenge dataset shows that our approach can improve the
state-of-the-art deep learning based enhancement systems.
Related papers
- uSee: Unified Speech Enhancement and Editing with Conditional Diffusion
Models [57.71199494492223]
We propose a Unified Speech Enhancement and Editing (uSee) model with conditional diffusion models to handle various tasks at the same time in a generative manner.
Our experiments show that our proposed uSee model can achieve superior performance in both speech denoising and dereverberation compared to other related generative speech enhancement models.
arXiv Detail & Related papers (2023-10-02T04:36:39Z) - PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech
Enhancement [41.872384434583466]
We propose a learning objective that formalizes differences in perceptual quality.
We identify temporal acoustic parameters that are non-differentiable.
We develop a neural network estimator that can accurately predict their time-series values.
arXiv Detail & Related papers (2023-02-16T05:17:06Z) - TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement [41.872384434583466]
We provide a differentiable estimator for four categories of low-level acoustic descriptors involving: frequency-related parameters, energy or amplitude-related parameters, spectral balance parameters, and temporal features.
We show that adding TAP as an auxiliary objective in speech enhancement produces speech with improved perceptual quality and intelligibility.
arXiv Detail & Related papers (2023-02-16T04:57:11Z) - Interactive Feature Fusion for End-to-End Noise-Robust Speech
Recognition [25.84784710031567]
We propose an interactive feature fusion network (IFF-Net) for noise-robust speech recognition.
Experimental results show that the proposed method achieves absolute word error rate (WER) reduction of 4.1% over the best baseline.
Our further analysis indicates that the proposed IFF-Net can complement some missing information in the over-suppressed enhanced feature.
arXiv Detail & Related papers (2021-10-11T13:40:07Z) - High Fidelity Speech Regeneration with Application to Speech Enhancement [96.34618212590301]
We propose a wav-to-wav generative model for speech that can generate 24khz speech in a real-time manner.
Inspired by voice conversion methods, we train to augment the speech characteristics while preserving the identity of the source.
arXiv Detail & Related papers (2021-01-31T10:54:27Z) - Gated Recurrent Fusion with Joint Training Framework for Robust
End-to-End Speech Recognition [64.9317368575585]
This paper proposes a gated recurrent fusion (GRF) method with joint training framework for robust end-to-end ASR.
The GRF algorithm is used to dynamically combine the noisy and enhanced features.
The proposed method achieves the relative character error rate (CER) reduction of 10.04% over the conventional joint enhancement and transformer method.
arXiv Detail & Related papers (2020-11-09T08:52:05Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Improving Accent Conversion with Reference Encoder and End-To-End
Text-To-Speech [23.30022534796909]
Accent conversion (AC) transforms a non-native speaker's accent into a native accent while maintaining the speaker's voice timbre.
We propose approaches to improving accent conversion applicability, as well as quality.
arXiv Detail & Related papers (2020-05-19T08:09:58Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.