Performance optimizations on deep noise suppression models
- URL: http://arxiv.org/abs/2110.04378v1
- Date: Fri, 8 Oct 2021 21:00:01 GMT
- Title: Performance optimizations on deep noise suppression models
- Authors: Jerry Chee, Sebastian Braun, Vishak Gopal, Ross Cutler
- Abstract summary: We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model.
We achieve up to a 7.25X inference speedup over the baseline, with a smooth model performance degradation.
- Score: 15.316827344680165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the role of magnitude structured pruning as an architecture search
to speed up the inference time of a deep noise suppression (DNS) model. While
deep learning approaches have been remarkably successful in enhancing audio
quality, their increased complexity inhibits their deployment in real-time
applications. We achieve up to a 7.25X inference speedup over the baseline,
with a smooth model performance degradation. Ablation studies indicate that our
proposed network re-parameterization (i.e., size per layer) is the major driver
of the speedup, and that magnitude structured pruning does comparably to
directly training a model in the smaller size. We report inference speed
because a parameter reduction does not necessitate speedup, and we measure
model quality using an accurate non-intrusive objective speech quality metric.
Related papers
- Optimization of DNN-based speaker verification model through efficient quantization technique [15.250677730668466]
Quantization of deep models offers a means to reduce both computational and memory expenses.
Our research proposes an optimization framework for the quantization of the speaker verification model.
arXiv Detail & Related papers (2024-07-12T05:03:10Z) - Neural Language Model Pruning for Automatic Speech Recognition [4.10609794373612]
We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition.
We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed.
arXiv Detail & Related papers (2023-10-05T10:01:32Z) - Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Task-Agnostic Structured Pruning of Speech Representation Models [18.555223754089905]
We propose a fine-grained attention head pruning method to compensate for the performance degradation.
Experiments on the SUPERB benchmark show that our model can achieve comparable performance to the dense model in multiple tasks.
arXiv Detail & Related papers (2023-06-02T09:11:06Z) - On Compressing Sequences for Self-Supervised Speech Models [78.62210521316081]
We study fixed-length and variable-length subsampling along the time axis in self-supervised learning.
We find that variable-length subsampling performs particularly well under low frame rates.
If we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz.
arXiv Detail & Related papers (2022-10-13T17:10:02Z) - High-dimensional Bayesian Optimization for CNN Auto Pruning with
Clustering and Rollback [4.479322015267904]
Pruning has been widely used to slim convolutional neural network (CNN) models to achieve a good trade-off between accuracy and model size.
In this work, we propose an enhanced BO agent to obtain significant acceleration for auto pruning in high-dimensional design spaces.
We validate our proposed method on ResNet, MobileNet, and VGG models, and our experiments show that the proposed method significantly improves the accuracy of BO when pruning very deep CNN models.
arXiv Detail & Related papers (2021-09-22T08:39:15Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU.
The proposed model is based on an encoder-decoder architecture with skip-connections.
It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Improving noise robust automatic speech recognition with single-channel
time-domain enhancement network [100.1041336974175]
We show that a single-channel time-domain denoising approach can significantly improve ASR performance.
We show that single-channel noise reduction can still improve ASR performance.
arXiv Detail & Related papers (2020-03-09T09:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.