Related papers: Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

URL: http://arxiv.org/abs/2007.13024v2
Date: Mon, 3 Aug 2020 00:07:39 GMT
Title: Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement
Authors: Jun Qi, Hu Hu, Yannan Wang, Chao-Han Huck Yang, Sabato Marco Siniscalchi, Chin-Hui Lee
Abstract summary: We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size. CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality.
Score: 53.47564132861866
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper investigates different trade-offs between the number of model parameters and enhanced speech qualities by employing several deep tensor-to-vector regression models for speech enhancement. We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size. CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality and a tensor-train (TT) output layer on the top to reduce model parameters. We first derive a new upper bound on the generalization power of the convolutional neural network (CNN) based vector-to-vector regression models. Then, we provide experimental evidence on the Edinburgh noisy speech corpus to demonstrate that, in single-channel speech enhancement, CNN outperforms DNN at the expense of a small increment of model sizes. Besides, CNN-TT slightly outperforms the CNN counterpart by utilizing only 32\% of the CNN model parameters. Besides, further performance improvement can be attained if the number of CNN-TT parameters is increased to 44\% of the CNN model size. Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.

Related papers

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve. We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap. This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z)
Patching Weak Convolutional Neural Network Models through Modularization and Composition [19.986199290508925]
A convolutional neuron network (CNN) model for classification tasks often performs unsatisfactorily. We propose a compressed modularization approach, CNNSplitter, which decomposes a strong CNN model for $N$-class classification into $N$ smaller CNN modules. We show that CNNSplitter can patch a weak CNN model through modularization and composition, thus providing a new solution for developing robust CNN models.
arXiv Detail & Related papers (2022-09-11T15:26:16Z)
Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on Riemannian Gradient Descent With Illustrations of Speech Processing [74.31472195046099]
We exploit a low-rank tensor-train deep neural network (TT-DNN) to build an end-to-end deep learning pipeline, namely LR-TT-DNN. A hybrid model combining LR-TT-DNN with a convolutional neural network (CNN) is set up to boost the performance. Our empirical evidence demonstrates that the LR-TT-DNN and CNN+(LR-TT-DNN) models with fewer model parameters can outperform the TT-DNN and CNN+(LR-TT-DNN) counterparts.
arXiv Detail & Related papers (2022-03-11T15:55:34Z)
Exploiting Hybrid Models of Tensor-Train Networks for Spoken Command Recognition [9.262289183808035]
This work aims to design a low complexity spoken command recognition (SCR) system. We exploit a deep hybrid architecture of a tensor-train (TT) network to build an end-to-end SRC pipeline. Our proposed CNN+(TT-DNN) model attains a competitive accuracy of 96.31% with 4 times fewer model parameters than the CNN model.
arXiv Detail & Related papers (2022-01-11T05:57:38Z)
Transformed CNNs: recasting pre-trained convolutional layers with self-attention [17.96659165573821]
Vision Transformers (ViT) have emerged as a powerful alternative to convolutional networks (CNNs) In this work, we explore the idea of reducing the time spent training these layers by initializing them as convolutional layers. With only 50 epochs of fine-tuning, the resulting T-CNNs demonstrate significant performance gains.
arXiv Detail & Related papers (2021-06-10T14:56:10Z)
Effects of Number of Filters of Convolutional Layers on Speech Recognition Model Accuracy [6.2698513174194215]
This paper studies the effects of Number of Filters of convolutional layers on the model prediction accuracy of CNN+RNN (Convolutional Networks adding to Recurrent Networks) for ASR Models (Automatic Speech Recognition) Experimental results show that only when the CNN Number of Filters exceeds a certain threshold value is adding CNN to RNN able to improve the performance of the CNN+RNN speech recognition model.
arXiv Detail & Related papers (2021-02-03T23:04:38Z)
ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN. We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
Multistream CNN for Robust Acoustic Modeling [17.155489701060542]
Multistream CNN is a novel neural network architecture for robust acoustic modeling in speech recognition tasks. We show consistent improvements against Kaldi's best TDNN-F model across various data sets. In terms of real-time factor, multistream CNN outperforms the baseline TDNN-F by 15%.
arXiv Detail & Related papers (2020-05-21T05:26:15Z)
Tensor-to-Vector Regression for Multi-channel Speech Enhancement based on Tensor-Train Network [53.47564132861866]
We propose a tensor-to-vector regression approach to multi-channel speech enhancement. The key idea is to cast the conventional deep neural network (DNN) based vector-to-vector regression formulation under a tensor-train network (TTN) framework. In 8-channel conditions, a PESQ of 3.12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3.06.
arXiv Detail & Related papers (2020-02-03T02:58:00Z)
Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes. We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.