Comparison of SVD and factorized TDNN approaches for speech to text
- URL: http://arxiv.org/abs/2110.07027v1
- Date: Wed, 13 Oct 2021 20:54:37 GMT
- Title: Comparison of SVD and factorized TDNN approaches for speech to text
- Authors: Jeffrey Josanne Michael, Nagendra Kumar Goel, Navneeth K, Jonas
Robertson, Shravan Mishra
- Abstract summary: This work focuses on reducing the word error rate of a hybrid HMM-DNN.
We find this architecture particularly useful for lightly reverberated environments.
- Score: 1.4291137439893344
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This work concentrates on reducing the RTF and word error rate of a hybrid
HMM-DNN. Our baseline system uses an architecture with TDNN and LSTM layers. We
find this architecture particularly useful for lightly reverberated
environments. However, these models tend to demand more computation than is
desirable. In this work, we explore alternate architectures employing singular
value decomposition (SVD) is applied to the TDNN layers to reduce the RTF, as
well as to the affine transforms of every LSTM cell. We compare this approach
with specifying bottleneck layers similar to those introduced by SVD before
training. Additionally, we reduced the search space of the decoding graph to
make it a better fit to operate in real-time applications. We report -61.57%
relative reduction in RTF and almost 1% relative decrease in WER for our
architecture trained on Fisher data along with reverberated versions of this
dataset in order to match one of our target test distributions.
Related papers
- CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
Recognition [8.302549684364195]
We propose a novel model named CIF-Transducer (CIF-T) which incorporates the Continuous Integrate-and-Fire (CIF) mechanism with the RNN-T model to achieve efficient alignment.
CIF-T achieves state-of-the-art results with lower computational overhead compared to RNN-T models.
arXiv Detail & Related papers (2023-07-26T11:59:14Z) - LD-GAN: Low-Dimensional Generative Adversarial Network for Spectral
Image Generation with Variance Regularization [72.4394510913927]
Deep learning methods are state-of-the-art for spectral image (SI) computational tasks.
GANs enable diverse augmentation by learning and sampling from the data distribution.
GAN-based SI generation is challenging since the high-dimensionality nature of this kind of data hinders the convergence of the GAN training yielding to suboptimal generation.
We propose a statistical regularization to control the low-dimensional representation variance for the autoencoder training and to achieve high diversity of samples generated with the GAN.
arXiv Detail & Related papers (2023-04-29T00:25:02Z) - Deep Learning Weight Pruning with RMT-SVD: Increasing Accuracy and
Reducing Overfitting [0.0]
The spectrum of the weight layers of a deep neural network (DNN) can be studied and understood using techniques from random matrix theory (RMT)
In this work, these RMT techniques will be used to determine which and how many singular values should be removed from the weight layers of a DNN during training, via singular value decomposition (SVD)
We show the results on a simple DNN model trained on MNIST.
arXiv Detail & Related papers (2023-03-15T23:19:45Z) - Partitioned Gradient Matching-based Data Subset Selection for
Compute-Efficient Robust ASR Training [32.68124808736473]
Partitioned Gradient Matching (PGM) is suitable for massive datasets like those used to train RNN-T.
We show that PGM achieves between 3x to 6x speedup with only a very small accuracy degradation.
arXiv Detail & Related papers (2022-10-30T17:22:57Z) - Truncated tensor Schatten p-norm based approach for spatiotemporal
traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers.
Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z) - SmoothNets: Optimizing CNN architecture design for differentially
private deep learning [69.10072367807095]
DPSGD requires clipping and noising of per-sample gradients.
This introduces a reduction in model utility compared to non-private training.
We distilled a new model architecture termed SmoothNet, which is characterised by increased robustness to the challenges of DP-SGD training.
arXiv Detail & Related papers (2022-05-09T07:51:54Z) - Tied & Reduced RNN-T Decoder [0.0]
We study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance.
Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer.
This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER)
arXiv Detail & Related papers (2021-09-15T18:19:16Z) - Why Approximate Matrix Square Root Outperforms Accurate SVD in Global
Covariance Pooling? [59.820507600960745]
We propose a new GCP meta-layer that uses SVD in the forward pass, and Pad'e Approximants in the backward propagation to compute the gradients.
The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.
arXiv Detail & Related papers (2021-05-06T08:03:45Z) - Towards Extremely Compact RNNs for Video Recognition with Fully
Decomposed Hierarchical Tucker Structure [41.41516453160845]
We propose to develop extremely compact RNN models with fully decomposed hierarchical Tucker (FDHT) structure.
Our experimental results on several popular video recognition datasets show that our proposed fully decomposed hierarchical tucker-based LSTM is extremely compact and highly efficient.
arXiv Detail & Related papers (2021-04-12T18:40:44Z) - Neural Architecture Search For LF-MMI Trained Time Delay Neural Networks [61.76338096980383]
A range of neural architecture search (NAS) techniques are used to automatically learn two types of hyper- parameters of state-of-the-art factored time delay neural networks (TDNNs)
These include the DARTS method integrating architecture selection with lattice-free MMI (LF-MMI) TDNN training.
Experiments conducted on a 300-hour Switchboard corpus suggest the auto-configured systems consistently outperform the baseline LF-MMI TDNN systems.
arXiv Detail & Related papers (2020-07-17T08:32:11Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.