SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech
Enhancement
- URL: http://arxiv.org/abs/2004.03512v2
- Date: Sat, 15 May 2021 14:04:40 GMT
- Title: SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech
Enhancement
- Authors: Robert Rehr, Timo Gerkmann
- Abstract summary: We analyze the generalization with respect to (1) the size and diversity of the training data, (2) different network architectures, and (3) the chosen features.
We show via experimental results and an analysis using t-distributed neighbor embedding (t-SNE) that the proposed SNR-NAT features yield robust and level independent results in unseen noise.
- Score: 21.346342164530967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we address the generalization of deep neural network (DNN)
based speech enhancement to unseen noise conditions for the case that training
data is limited in size and diversity. To gain more insights, we analyze the
generalization with respect to (1) the size and diversity of the training data,
(2) different network architectures, and (3) the chosen features. To address
(1), we train networks on the Hu noise corpus (limited size), the CHiME 3 noise
corpus (limited diversity) and also propose a large and diverse dataset
collected based on freely available sounds. To address (2), we compare a
fully-connected feed-forward and a long short-term memory (LSTM) architecture.
To address (3), we compare three input features, namely logarithmized noisy
periodograms, noise aware training (NAT) and the proposed signal-to-noise ratio
(SNR) based noise aware training (SNR-NAT). We confirm that rich training data
and improved network architectures help DNNs to generalize. Furthermore, we
show via experimental results and an analysis using t-distributed stochastic
neighbor embedding (t-SNE) that the proposed SNR-NAT features yield robust and
level independent results in unseen noise even with simple network
architectures and when trained on only small datasets, which is the key
contribution of this paper.
Related papers
- Dual input neural networks for positional sound source localization [19.07039703121673]
We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network.
We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture.
Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.
arXiv Detail & Related papers (2023-08-08T09:59:56Z) - Sequential Learning from Noisy Data: Data-Assimilation Meets Echo-State
Network [0.0]
A sequential training algorithm is developed for an echo-state network (ESN) by incorporating noisy observations using an ensemble Kalman filter.
The resultant Kalman-trained echo-state network (KalT-ESN) outperforms the traditionally trained ESN with least square algorithm while still being computationally cheap.
arXiv Detail & Related papers (2023-04-01T02:03:08Z) - Neural Attentive Circuits [93.95502541529115]
We introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs)
NACs learn the parameterization and a sparse connectivity of neural modules without using domain knowledge.
NACs achieve an 8x speedup at inference time while losing less than 3% performance.
arXiv Detail & Related papers (2022-10-14T18:00:07Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Neural Implicit Dictionary via Mixture-of-Expert Training [111.08941206369508]
We present a generic INR framework that achieves both data and training efficiency by learning a Neural Implicit Dictionary (NID)
Our NID assembles a group of coordinate-based Impworks which are tuned to span the desired function space.
Our experiments show that, NID can improve reconstruction of 2D images or 3D scenes by 2 orders of magnitude faster with up to 98% less input data.
arXiv Detail & Related papers (2022-07-08T05:07:19Z) - A Comparative Study on Robust Graph Neural Networks to Structural Noises [12.44737954516764]
Graph neural networks (GNNs) learn node representations by passing and aggregating messages between neighboring nodes.
GNNs could be vulnerable to structural noise because of the message passing mechanism where noise may be propagated through the entire graph.
We conduct a comprehensive and systematical comparative study on different types of robust GNNs under consistent structural noise settings.
arXiv Detail & Related papers (2021-12-11T21:01:29Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Depthwise Separable Convolutions Versus Recurrent Neural Networks for
Monaural Singing Voice Separation [17.358040670413505]
We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs)
We conduct an ablation study and examine the effect of the number of channels and layers of DWS-CNNs on the source separation performance.
Our results show that by replacing RNNs with DWS-CNNs yields an improvement of 1.20, 0.06, 0.37 dB, respectively, while using only 20.57% of the amount of parameters of the RNN architecture.
arXiv Detail & Related papers (2020-07-06T12:32:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.