Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning
- URL: http://arxiv.org/abs/2011.05591v1
- Date: Wed, 11 Nov 2020 06:32:37 GMT
- Title: Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning
- Authors: Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen, Leichao
Song
- Abstract summary: This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
- Score: 60.20150317299749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks (RNNs) have shown significant improvements in
recent years for speech enhancement. However, the model complexity and
inference time cost of RNNs are much higher than deep feed-forward neural
networks (DNNs). Therefore, these limit the applications of speech enhancement.
This paper proposes a deep time delay neural network (TDNN) for speech
enhancement with full data learning. The TDNN has excellent potential for
capturing long range temporal contexts, which utilizes a modular and
incremental design. Besides, the TDNN preserves the feed-forward structure so
that its inference cost is comparable to standard DNN. To make full use of the
training data, we propose a full data learning method for speech enhancement.
More specifically, we not only use the noisy-to-clean (input-to-target) to
train the enhanced model, but also the clean-to-clean and noise-to-silence
data. Therefore, all of the training data can be used to train the enhanced
model. Our experiments are conducted on TIMIT dataset. Experimental results
show that our proposed method could achieve a better performance than DNN and
comparable even better performance than BLSTM. Meanwhile, compared with the
BLSTM, the proposed method drastically reduce the inference time.
Related papers
- BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation [20.34272550256856]
Spiking neural networks (SNNs) mimic biological neural system to convey information via discrete spikes.
Our work achieves state-of-the-art performance for training SNNs on both static and neuromorphic datasets.
arXiv Detail & Related papers (2024-07-12T08:17:24Z) - Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network [23.434563009813218]
Spiking Neural Network (SNN) is acknowledged as the next generation of Artificial Neural Network (ANN)
We introduce a novel regularisation technique, namely Spatial-Temporal Regulariser (STR)
STR regulates the ratio between the strength of spikes and membrane potential at each timestep.
This effectively balances spatial and temporal performance during training, ultimately resulting in an Anytime Optimal Inference (AOI) SNN.
arXiv Detail & Related papers (2024-04-15T15:57:01Z) - Optimising Event-Driven Spiking Neural Network with Regularisation and
Cutoff [33.91830001268308]
Spiking neural network (SNN) offers promising improvements in computational efficiency.
Current SNN training methodologies predominantly employ a fixed timestep approach.
We propose to consider cutoff in SNN, which can terminate SNN anytime during the inference to achieve efficient inference.
arXiv Detail & Related papers (2023-01-23T16:14:09Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Adaptive-SpikeNet: Event-based Optical Flow Estimation using Spiking
Neural Networks with Learnable Neuronal Dynamics [6.309365332210523]
Spiking Neural Networks (SNNs) with their neuro-inspired event-driven processing can efficiently handle asynchronous data.
We propose an adaptive fully-spiking framework with learnable neuronal dynamics to alleviate the spike vanishing problem.
Our experiments on datasets show an average reduction of 13% in average endpoint error (AEE) compared to state-of-the-art ANNs.
arXiv Detail & Related papers (2022-09-21T21:17:56Z) - Recurrent Bilinear Optimization for Binary Neural Networks [58.972212365275595]
BNNs neglect the intrinsic bilinear relationship of real-valued weights and scale factors.
Our work is the first attempt to optimize BNNs from the bilinear perspective.
We obtain robust RBONNs, which show impressive performance over state-of-the-art BNNs on various models and datasets.
arXiv Detail & Related papers (2022-09-04T06:45:33Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z) - Exploring Pre-training with Alignments for RNN Transducer based
End-to-End Speech Recognition [39.497407288772386]
recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research.
In this work, we leverage external alignments to seed the RNN-T model.
Two different pre-training solutions are explored, referred to as encoder pre-training, and whole-network pre-training respectively.
arXiv Detail & Related papers (2020-05-01T19:00:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.