Exploring Deep Hybrid Tensor-to-Vector Network Architectures for
Regression Based Speech Enhancement
- URL: http://arxiv.org/abs/2007.13024v2
- Date: Mon, 3 Aug 2020 00:07:39 GMT
- Title: Exploring Deep Hybrid Tensor-to-Vector Network Architectures for
Regression Based Speech Enhancement
- Authors: Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco
Siniscalchi, Chin-Hui Lee
- Abstract summary: We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size.
CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
- Score: 53.47564132861866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates different trade-offs between the number of model
parameters and enhanced speech qualities by employing several deep
tensor-to-vector regression models for speech enhancement. We find that a
hybrid architecture, namely CNN-TT, is capable of maintaining a good quality
performance with a reduced model parameter size. CNN-TT is composed of several
convolutional layers at the bottom for feature extraction to improve speech
quality and a tensor-train (TT) output layer on the top to reduce model
parameters. We first derive a new upper bound on the generalization power of
the convolutional neural network (CNN) based vector-to-vector regression
models. Then, we provide experimental evidence on the Edinburgh noisy speech
corpus to demonstrate that, in single-channel speech enhancement, CNN
outperforms DNN at the expense of a small increment of model sizes. Besides,
CNN-TT slightly outperforms the CNN counterpart by utilizing only 32\% of the
CNN model parameters. Besides, further performance improvement can be attained
if the number of CNN-TT parameters is increased to 44\% of the CNN model size.
Finally, our experiments of multi-channel speech enhancement on a simulated
noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture
achieves better results than both DNN and CNN models in terms of
better-enhanced speech qualities and smaller parameter sizes.
Related papers
- OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve.
We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap.
This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z) - Patching Weak Convolutional Neural Network Models through Modularization
and Composition [19.986199290508925]
A convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily.
We propose a compressed modularization approach, CNNSplitter, which decomposes a strong CNN model for $N$-class classification into $N$ smaller CNN modules.
We show that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for developing robust CNN models.
arXiv Detail & Related papers (2022-09-11T15:26:16Z) - Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on
Riemannian Gradient Descent With Illustrations of Speech Processing [74.31472195046099]
We exploit a low-rank tensor-train deep neural network (TT-DNN) to build an end-to-end deep learning pipeline, namely LR-TT-DNN.
A hybrid model combining LR-TT-DNN with a convolutional neural network (CNN) is set up to boost the performance.
Our empirical evidence demonstrates that the LR-TT-DNN and CNN+(LR-TT-DNN) models with fewer model parameters can outperform the TT-DNN and CNN+(LR-TT-DNN) counterparts.
arXiv Detail & Related papers (2022-03-11T15:55:34Z) - Exploiting Hybrid Models of Tensor-Train Networks for Spoken Command
Recognition [9.262289183808035]
This work aims to design a low complexity spoken command recognition (SCR) system.
We exploit a deep hybrid architecture of a tensor-train (TT) network to build an end-to-end SRC pipeline.
Our proposed CNN+(TT-DNN) model attains a competitive accuracy of 96.31% with 4 times fewer model parameters than the CNN model.
arXiv Detail & Related papers (2022-01-11T05:57:38Z) - Transformed CNNs: recasting pre-trained convolutional layers with
self-attention [17.96659165573821]
Vision Transformers (ViT) have emerged as a powerful alternative to convolutional networks (CNNs)
In this work, we explore the idea of reducing the time spent training these layers by initializing them as convolutional layers.
With only 50 epochs of fine-tuning, the resulting T-CNNs demonstrate significant performance gains.
arXiv Detail & Related papers (2021-06-10T14:56:10Z) - Effects of Number of Filters of Convolutional Layers on Speech
Recognition Model Accuracy [6.2698513174194215]
This paper studies the effects of Number of Filters of convolutional layers on the model prediction accuracy of CNN+RNN (Convolutional Networks adding to Recurrent Networks) for ASR Models (Automatic Speech Recognition)
Experimental results show that only when the CNN Number of Filters exceeds a certain threshold value is adding CNN to RNN able to improve the performance of the CNN+RNN speech recognition model.
arXiv Detail & Related papers (2021-02-03T23:04:38Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Multistream CNN for Robust Acoustic Modeling [17.155489701060542]
Multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recognition tasks.
We show consistent improvements against Kaldi's best TDNN-F model across various data sets.
In terms of real-time factor, multistream CNN outperforms the baseline TDNN-F by 15%.
arXiv Detail & Related papers (2020-05-21T05:26:15Z) - Tensor-to-Vector Regression for Multi-channel Speech Enhancement based
on Tensor-Train Network [53.47564132861866]
We propose a tensor-to-vector regression approach to multi-channel speech enhancement.
The key idea is to cast the conventional deep neural network (DNN) based vector-to-vector regression formulation under a tensor-train network (TTN) framework.
In 8-channel conditions, a PESQ of 3.12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3.06.
arXiv Detail & Related papers (2020-02-03T02:58:00Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.