Related papers: Scalable Speech Enhancement with Dynamic Channel Pruning

Scalable Speech Enhancement with Dynamic Channel Pruning

URL: http://arxiv.org/abs/2412.17121v1
Date: Sun, 22 Dec 2024 18:21:08 GMT
Title: Scalable Speech Enhancement with Dynamic Channel Pruning
Authors: Riccardo Miccini, Clement Laroche, Tobias Piechowiak, Luca Pezzarossa,
Abstract summary: Speech Enhancement (SE) is essential for improving productivity in remote collaborative environments.<n>Deep learning models are highly effective at SE, but their computational demands make them impractical for embedded systems.<n>We introduce Dynamic Channel Pruning to the audio domain for the first time and apply it to a custom convolutional architecture for SE.
Score: 0.44998333629984877
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Speech Enhancement (SE) is essential for improving productivity in remote collaborative environments. Although deep learning models are highly effective at SE, their computational demands make them impractical for embedded systems. Furthermore, acoustic conditions can change significantly in terms of difficulty, whereas neural networks are usually static with regard to the amount of computation performed. To this end, we introduce Dynamic Channel Pruning to the audio domain for the first time and apply it to a custom convolutional architecture for SE. Our approach works by identifying unnecessary convolutional channels at runtime and saving computational resources by not computing the activations for these channels and retrieving their filters. When trained to only use 25% of channels, we save 29.6% of MACs while only causing a 0.75% drop in PESQ. Thus, DynCP offers a promising path toward deploying larger and more powerful SE solutions on resource-constrained devices.

Related papers

Self-DANA: A Resource-Efficient Channel-Adaptive Self-Supervised Approach for ECG Foundation Models [0.0]
Self-DANA is a novel, easy-to-integrate solution that makes self-supervised architectures adaptable to a reduced number of input channels.<n>It requires up to 69.3% less peak CPU memory, 34.4% less peak GPU memory, about 17% less average CPU time, and about 24% less average epoch GPU time.
arXiv Detail & Related papers (2025-07-03T20:39:30Z)
Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z)
Dynamic Sparsity Is Channel-Level Sparsity Learner [91.31071026340746]
Dynamic sparse training (DST) is a leading sparse training approach. Channel-aware dynamic sparse (Chase) seamlessly translates the promise of unstructured dynamic sparsity to channel-level sparsity. Our approach translates unstructured sparsity to channel-wise sparsity.
arXiv Detail & Related papers (2023-05-30T23:33:45Z)
Channelformer: Attention based Neural Solution for Wireless Channel Estimation and Effective Online Training [1.0499453838486013]
We propose an encoder-decoder neural architecture (called Channelformer) to achieve improved channel estimation. We employ multi-head attention in the encoder and a residual convolutional neural architecture as the decoder. We also propose an effective online training method based on the fifth generation (5G) new radio (NR) configuration for the modern communication systems.
arXiv Detail & Related papers (2023-02-08T23:18:23Z)
Efficient acoustic feature transformation in mismatched environments using a Guided-GAN [1.495380389108477]
We propose a new framework to improve automatic speech recognition systems in resource-scarce environments. We use a generative adversarial network (GAN) operating on acoustic input features to enhance the features of mismatched data. With less than one hour of data, an ASR system trained on good quality data, and evaluated on mismatched audio is improved by between 11.5% and 19.7% relative word error rate (WER)
arXiv Detail & Related papers (2022-10-03T05:33:28Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks. specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples. We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z)
Enabling Incremental Training with Forward Pass for Edge Devices [0.0]
We introduce a method using evolutionary strategy (ES) that can partially retrain the network enabling it to adapt to changes and recover after an error has occurred. This technique enables training on an inference-only hardware without the need to use backpropagation and with minimal resource overhead.
arXiv Detail & Related papers (2021-03-25T17:43:04Z)
ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF) ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z)
Resource-Efficient Speech Mask Estimation for Multi-Channel Speech Enhancement [15.361841669377776]
We provide a resource-efficient approach for multi-channel speech enhancement based on Deep Neural Networks (DNNs) In particular, we use reduced-precision DNNs for estimating a speech mask from noisy, multi-channel microphone observations. In the extreme case of binary weights and reduced precision activations, a significant reduction of execution time and memory footprint is possible.
arXiv Detail & Related papers (2020-07-22T14:58:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.