Utterance Weighted Multi-Dilation Temporal Convolutional Networks for
Monaural Speech Dereverberation
- URL: http://arxiv.org/abs/2205.08455v1
- Date: Tue, 17 May 2022 15:56:31 GMT
- Title: Utterance Weighted Multi-Dilation Temporal Convolutional Networks for
Monaural Speech Dereverberation
- Authors: William Ravenscroft and Stefan Goetze and Thomas Hain
- Abstract summary: A weighted multi-dilation depthwise-separable convolution is proposed to replace standard depthwise-separable convolutions in temporal convolutional networks (TCNs)
It is shown that this weighted multi-dilation temporal convolutional network (WD-TCN) consistently outperforms the TCN across various model configurations.
- Score: 26.94528951545861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech dereverberation is an important stage in many speech technology
applications. Recent work in this area has been dominated by deep neural
network models. Temporal convolutional networks (TCNs) are deep learning models
that have been proposed for sequence modelling in the task of dereverberating
speech. In this work a weighted multi-dilation depthwise-separable convolution
is proposed to replace standard depthwise-separable convolutions in TCN models.
This proposed convolution enables the TCN to dynamically focus on more or less
local information in its receptive field at each convolutional block in the
network. It is shown that this weighted multi-dilation temporal convolutional
network (WD-TCN) consistently outperforms the TCN across various model
configurations and using the WD-TCN model is a more parameter efficient method
to improve the performance of the model than increasing the number of
convolutional blocks. The best performance improvement over the baseline TCN is
0.55 dB scale-invariant signal-to-distortion ratio (SISDR) and the best
performing WD-TCN model attains 12.26 dB SISDR on the WHAMR dataset.
Related papers
- An Adaptive Latent Factorization of Tensors Model for Embedding Dynamic Communication Network [15.577058568902272]
The Dynamic Communication Network (DCN) describes the interactions over time among various communication nodes.
This paper proposes an Adaptive Temporal-dependent low-rank representation model (ATT)
The experimental results on four real-world DCNs demonstrate that the proposed ATT model significantly outperforms several state-of-the-art models in both prediction errors and convergence rounds.
arXiv Detail & Related papers (2024-08-29T14:40:32Z) - DCNv3: Towards Next Generation Deep Cross Network for CTR Prediction [17.19859591493946]
This paper proposes the next generation deep cross network: Deep Cross Network v3 (DCNv3), along with its two sub-networks: Linear Cross Network (LCN) and Exponential Cross Network (ECN) for CTR prediction.
Comprehensive experiments on six datasets demonstrate the effectiveness, efficiency, and interpretability of DCNv3.
arXiv Detail & Related papers (2024-07-18T09:49:13Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Deformable Temporal Convolutional Networks for Monaural Noisy
Reverberant Speech Separation [26.94528951545861]
Speech separation models are used for isolating individual speakers in many speech processing applications.
Deep learning models have been shown to lead to state-of-the-art (SOTA) results on a number of speech separation benchmarks.
One such class of models known as temporal convolutional networks (TCNs) has shown promising results for speech separation tasks.
Recent research in speech dereverberation has shown that the optimal RF of a TCN varies with the reverberation characteristics of the speech signal.
arXiv Detail & Related papers (2022-10-27T10:29:19Z) - Receptive Field Analysis of Temporal Convolutional Networks for Monaural
Speech Dereverberation [26.94528951545861]
Supervised deep learning (DL) models give state-of-the-art performance for single-channel speech dereverberation.
Temporal convolutional networks (TCNs) are commonly used for sequence modelling in speech enhancement tasks.
This paper analyses dereverberation performance depending on the model size and the receptive field of TCNs.
arXiv Detail & Related papers (2022-04-13T14:57:59Z) - TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding [60.292702363839716]
Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation.
We propose an effective temporal multi-scale (TMS) model where multi-scale branches could be efficiently designed in a speaker embedding network almost without increasing computational costs.
arXiv Detail & Related papers (2022-03-17T05:49:35Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Lip-reading with Densely Connected Temporal Convolutional Networks [61.66144695679362]
We present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words.
Our method has achieved 88.36% accuracy on the Lip Reading in the Wild dataset and 43.65% on the LRW-1000 dataset.
arXiv Detail & Related papers (2020-09-29T18:08:15Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.