TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
- URL: http://arxiv.org/abs/2302.08088v1
- Date: Thu, 16 Feb 2023 04:57:11 GMT
- Title: TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement
- Authors: Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag
Kumar, Shinji Watanabe, Bhiksha Raj
- Abstract summary: We provide a differentiable estimator for four categories of low-level acoustic descriptors involving: frequency-related parameters, energy or amplitude-related parameters, spectral balance parameters, and temporal features.
We show that adding TAP as an auxiliary objective in speech enhancement produces speech with improved perceptual quality and intelligibility.
- Score: 41.872384434583466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech enhancement models have greatly progressed in recent years, but still
show limits in perceptual quality of their speech outputs. We propose an
objective for perceptual quality based on temporal acoustic parameters. These
are fundamental speech features that play an essential role in various
applications, including speaker recognition and paralinguistic analysis. We
provide a differentiable estimator for four categories of low-level acoustic
descriptors involving: frequency-related parameters, energy or
amplitude-related parameters, spectral balance parameters, and temporal
features. Unlike prior work that looks at aggregated acoustic parameters or a
few categories of acoustic parameters, our temporal acoustic parameter (TAP)
loss enables auxiliary optimization and improvement of many fine-grain speech
characteristics in enhancement workflows. We show that adding TAPLoss as an
auxiliary objective in speech enhancement produces speech with improved
perceptual quality and intelligibility. We use data from the Deep Noise
Suppression 2020 Challenge to demonstrate that both time-domain models and
time-frequency domain models can benefit from our method.
Related papers
- What does it take to get state of the art in simultaneous speech-to-speech translation? [0.0]
We study the latency characteristics observed in simultaneous speech-to-speech model's performance.
We propose methods to minimize latency spikes and improve overall performance.
arXiv Detail & Related papers (2024-09-02T06:04:07Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Blind Acoustic Room Parameter Estimation Using Phase Features [4.473249957074495]
We propose utilizing novel phase-related features to extend recent approaches to blindly estimate the so-called "reverberation fingerprint" parameters.
The addition of these features is shown to outperform existing methods that rely solely on magnitude-based spectral features.
arXiv Detail & Related papers (2023-03-13T20:05:41Z) - PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech
Enhancement [41.872384434583466]
We propose a learning objective that formalizes differences in perceptual quality.
We identify temporal acoustic parameters that are non-differentiable.
We develop a neural network estimator that can accurately predict their time-series values.
arXiv Detail & Related papers (2023-02-16T05:17:06Z) - Improve Noise Tolerance of Robust Loss via Noise-Awareness [60.34670515595074]
We propose a meta-learning method which is capable of adaptively learning a hyper parameter prediction function, called Noise-Aware-Robust-Loss-Adjuster (NARL-Adjuster for brevity)
Four SOTA robust loss functions are attempted to be integrated with our algorithm, and comprehensive experiments substantiate the general availability and effectiveness of the proposed method in both its noise tolerance and performance.
arXiv Detail & Related papers (2023-01-18T04:54:58Z) - Improving Speech Enhancement through Fine-Grained Speech Characteristics [42.49874064240742]
We propose a novel approach to speech enhancement aimed at improving perceptual quality and naturalness of enhanced signals.
We first identify key acoustic parameters that have been found to correlate well with voice quality.
We then propose objective functions which are aimed at reducing the difference between clean speech and enhanced speech with respect to these features.
arXiv Detail & Related papers (2022-07-01T07:04:28Z) - MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality
Assessment [12.144133923535714]
This paper presents MOSRA: a non-intrusive multi-dimensional speech quality metric.
It can predict room acoustics parameters alongside the overall mean opinion score (MOS) for speech quality.
We also show that this joint training method enhances the blind estimation of room acoustics.
arXiv Detail & Related papers (2022-04-04T09:38:15Z) - Audio Impairment Recognition Using a Correlation-Based Feature
Representation [85.08880949780894]
We propose a new representation of hand-crafted features that is based on the correlation of feature pairs.
We show superior performance in terms of compact feature dimensionality and improved computational speed in the test stage.
arXiv Detail & Related papers (2020-03-22T13:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.