Stable Training of DNN for Speech Enhancement based on
Perceptually-Motivated Black-Box Cost Function
- URL: http://arxiv.org/abs/2002.05879v1
- Date: Fri, 14 Feb 2020 05:44:17 GMT
- Title: Stable Training of DNN for Speech Enhancement based on
Perceptually-Motivated Black-Box Cost Function
- Authors: Masaki Kawanaka, Yuma Koizumi, Ryoichi Miyazaki and Kohei Yatabe
- Abstract summary: Methods related to subjective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality)
Direct use of such measures for training deep neural network (DNN) is not allowed in most cases because popular OSQAs are non-differentiable with respect to DNN parameters.
We propose to use stabilization techniques borrowed from reinforcement learning to increase the score of PESQ.
- Score: 39.66350526759246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving subjective sound quality of enhanced signals is one of the most
important missions in speech enhancement. For evaluating the subjective
quality, several methods related to perceptually-motivated objective sound
quality assessment (OSQA) have been proposed such as PESQ (perceptual
evaluation of speech quality). However, direct use of such measures for
training deep neural network (DNN) is not allowed in most cases because popular
OSQAs are non-differentiable with respect to DNN parameters. Therefore, the
previous study has proposed to approximate the score of OSQAs by an auxiliary
DNN so that its gradient can be used for training the primary DNN. One problem
with this approach is instability of the training caused by the approximation
error of the score. To overcome this problem, we propose to use stabilization
techniques borrowed from reinforcement learning. The experiments, aimed to
increase the score of PESQ as an example, show that the proposed method (i) can
stably train a DNN to increase PESQ, (ii) achieved the state-of-the-art PESQ
score on a public dataset, and (iii) resulted in better sound quality than
conventional methods based on subjective evaluation.
Related papers
- HASA-net: A non-intrusive hearing-aid speech assessment network [52.83357278948373]
We propose a DNN-based hearing aid speech assessment network (HASA-Net) to predict speech quality and intelligibility scores simultaneously.
To the best of our knowledge, HASA-Net is the first work to incorporate quality and intelligibility assessments utilizing a unified DNN-based non-intrusive model for hearing aids.
Experimental results show that the predicted speech quality and intelligibility scores of HASA-Net are highly correlated to two well-known intrusive hearing-aid evaluation metrics.
arXiv Detail & Related papers (2021-11-10T14:10:13Z) - InQSS: a speech intelligibility assessment model using a multi-task
learning network [21.037410575414995]
In this study, we propose InQSS, a speech intelligibility assessment model that uses both spectrogram and scattering coefficients as input features.
The resulting model can predict not only the intelligibility scores but also the quality scores of a speech.
arXiv Detail & Related papers (2021-11-04T02:01:27Z) - Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment
Model with Cross-Domain Features [30.57631206882462]
The MOSA-Net is designed to estimate speech quality, intelligibility, and distortion assessment scores based on a test speech signal as input.
We show that the MOSA-Net can precisely predict perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (BLS) scores when tested on both noisy and enhanced speech utterances.
arXiv Detail & Related papers (2021-11-03T17:30:43Z) - Improving Character Error Rate Is Not Equal to Having Clean Speech:
Speech Enhancement for ASR Systems with Black-box Acoustic Models [1.6328866317851185]
A deep neural network (DNN)-based speech enhancement (SE) is proposed in this paper.
Our method uses two DNNs: one for speech processing and one for mimicking the output CERs derived through an acoustic model (AM)
Experimental results show that our method improved CER by 7.3% relative derived through a black-box AM although certain noise levels are kept.
arXiv Detail & Related papers (2021-10-12T12:51:53Z) - Task-Specific Normalization for Continual Learning of Blind Image
Quality Models [105.03239956378465]
We present a simple yet effective continual learning method for blind image quality assessment (BIQA)
The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability.
We assign each new IQA dataset (i.e., task) a prediction head, and load the corresponding normalization parameters to produce a quality score.
The final quality estimate is computed by black a weighted summation of predictions from all heads with a lightweight $K$-means gating mechanism.
arXiv Detail & Related papers (2021-07-28T15:21:01Z) - Being a Bit Frequentist Improves Bayesian Neural Networks [76.73339435080446]
We show that OOD-trained BNNs are competitive to, if not better than recent frequentist baselines.
This work provides strong baselines for future work in both Bayesian and frequentist UQ.
arXiv Detail & Related papers (2021-06-18T11:22:42Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z) - MetaIQA: Deep Meta-learning for No-Reference Image Quality Assessment [73.55944459902041]
This paper presents a no-reference IQA metric based on deep meta-learning.
We first collect a number of NR-IQA tasks for different distortions.
Then meta-learning is adopted to learn the prior knowledge shared by diversified distortions.
Extensive experiments demonstrate that the proposed metric outperforms the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-04-11T23:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.