Related papers: Performance optimizations on deep noise suppression models

Performance optimizations on deep noise suppression models

URL: http://arxiv.org/abs/2110.04378v1
Date: Fri, 8 Oct 2021 21:00:01 GMT
Title: Performance optimizations on deep noise suppression models
Authors: Jerry Chee, Sebastian Braun, Vishak Gopal, Ross Cutler
Abstract summary: We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model. We achieve up to a 7.25X inference speedup over the baseline, with a smooth model performance degradation.
Score: 15.316827344680165
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the role of magnitude structured pruning as an architecture search to speed up the inference time of a deep noise suppression (DNS) model. While deep learning approaches have been remarkably successful in enhancing audio quality, their increased complexity inhibits their deployment in real-time applications. We achieve up to a 7.25X inference speedup over the baseline, with a smooth model performance degradation. Ablation studies indicate that our proposed network re-parameterization (i.e., size per layer) is the major driver of the speedup, and that magnitude structured pruning does comparably to directly training a model in the smaller size. We report inference speed because a parameter reduction does not necessitate speedup, and we measure model quality using an accurate non-intrusive objective speech quality metric.

Related papers

EC-Diff: Fast and High-Quality Edge-Cloud Collaborative Inference for Diffusion Models [57.059991285047296]
hybrid edge-cloud collaborative framework was recently proposed to realize fast inference and high-quality generation.<n>Excessive cloud denoising prolongs inference time, while insufficient steps cause semantic ambiguity, leading to inconsistency in edge model output.<n>We propose EC-Diff that accelerates cloud inference through gradient-based noise estimation.<n>Our method significantly enhances generation quality compared to edge inference, while achieving up to an average $2times$ speedup in inference compared to cloud inference.
arXiv Detail & Related papers (2025-07-16T07:23:14Z)
Compressing Large Language Models with Automated Sub-Network Search [41.452512557226335]
We consider model compression for Large Language Models to reduce model size while improving downstream task performance. We phrase this as a neural architecture search problem that automatically prunes structural components. Our method achieves upto 9.85% improvement on average on 11 diverse downstream tasks, while achieving up to 22% improvement of on-device latency.
arXiv Detail & Related papers (2024-10-09T02:14:39Z)
Optimization of DNN-based speaker verification model through efficient quantization technique [15.250677730668466]
Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of the speaker verification model.
arXiv Detail & Related papers (2024-07-12T05:03:10Z)
Neural Language Model Pruning for Automatic Speech Recognition [4.10609794373612]
We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed.
arXiv Detail & Related papers (2023-10-05T10:01:32Z)
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z)
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive. We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation. Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z)
Task-Agnostic Structured Pruning of Speech Representation Models [18.555223754089905]
We propose a fine-grained attention head pruning method to compensate for the performance degradation. Experiments on the SUPERB benchmark show that our model can achieve comparable performance to the dense model in multiple tasks.
arXiv Detail & Related papers (2023-06-02T09:11:06Z)
On Compressing Sequences for Self-Supervised Speech Models [78.62210521316081]
We study fixed-length and variable-length subsampling along the time axis in self-supervised learning. We find that variable-length subsampling performs particularly well under low frame rates. If we have access to phonetic boundaries, we find no degradation in performance for an average frame rate as low as 10 Hz.
arXiv Detail & Related papers (2022-10-13T17:10:02Z)
High-dimensional Bayesian Optimization for CNN Auto Pruning with Clustering and Rollback [4.479322015267904]
Pruning has been widely used to slim convolutional neural network (CNN) models to achieve a good trade-off between accuracy and model size. In this work, we propose an enhanced BO agent to obtain significant acceleration for auto pruning in high-dimensional design spaces. We validate our proposed method on ResNet, MobileNet, and VGG models, and our experiments show that the proposed method significantly improves the accuracy of BO when pruning very deep CNN models.
arXiv Detail & Related papers (2021-09-22T08:39:15Z)
Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN) TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem. In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z)
Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
Improving noise robust automatic speech recognition with single-channel time-domain enhancement network [100.1041336974175]
We show that a single-channel time-domain denoising approach can significantly improve ASR performance. We show that single-channel noise reduction can still improve ASR performance.
arXiv Detail & Related papers (2020-03-09T09:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.