Hybrid Neural Networks for On-device Directional Hearing
- URL: http://arxiv.org/abs/2112.05893v1
- Date: Sat, 11 Dec 2021 01:29:12 GMT
- Title: Hybrid Neural Networks for On-device Directional Hearing
- Authors: Anran Wang, Maruchi Kim, Hao Zhang, Shyamnath Gollakota
- Abstract summary: DeepBeam is a hybrid model that combines traditional beamformers with a custom lightweight neural net.
Our real-time hybrid model runs in 8 ms on mobile CPUs designed for low-power wearable devices and achieves an end-to-end latency of 17.5 ms.
- Score: 15.109811993590037
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: On-device directional hearing requires audio source separation from a given
direction while achieving stringent human-imperceptible latency requirements.
While neural nets can achieve significantly better performance than traditional
beamformers, all existing models fall short of supporting low-latency causal
inference on computationally-constrained wearables. We present DeepBeam, a
hybrid model that combines traditional beamformers with a custom lightweight
neural net. The former reduces the computational burden of the latter and also
improves its generalizability, while the latter is designed to further reduce
the memory and computational overhead to enable real-time and low-latency
operations. Our evaluation shows comparable performance to state-of-the-art
causal inference models on synthetic data while achieving a 5x reduction of
model size, 4x reduction of computation per second, 5x reduction in processing
time and generalizing better to real hardware data. Further, our real-time
hybrid model runs in 8 ms on mobile CPUs designed for low-power wearable
devices and achieves an end-to-end latency of 17.5 ms.
Related papers
- Taming 3DGS: High-Quality Radiance Fields with Limited Resources [50.92437599516609]
3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering.
We tackle the challenges of training and rendering 3DGS models on a budget.
We derive faster, numerically equivalent solutions for gradient computation and attribute updates.
arXiv Detail & Related papers (2024-06-21T20:44:23Z) - Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency.
We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion.
We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z) - EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models [21.17675493267517]
Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches to compress and accelerate diffusion models.
We introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.
Our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency.
arXiv Detail & Related papers (2023-10-05T02:51:53Z) - Model-based Optimization of Superconducting Qubit Readout [59.992881941624965]
We demonstrate model-based readout optimization for superconducting qubits.
We observe 1.5% error per qubit with a 500ns end-to-end duration and minimal excess reset error from residual resonator photons.
This technique can scale to hundreds of qubits and be used to enhance the performance of error-correcting codes and near-term applications.
arXiv Detail & Related papers (2023-08-03T23:30:56Z) - Gated Compression Layers for Efficient Always-On Models [1.5612040984769857]
We propose a novel Gated Compression layer that can be applied to transform existing neural network architectures into Gated Neural Networks.
We provide results across five public image and audio datasets that demonstrate the proposed Gated Compression layer effectively stops up to 96% of negative samples, compresses 97% of positive samples, while maintaining or improving model accuracy.
arXiv Detail & Related papers (2023-03-15T22:46:22Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z) - Lightweight network towards real-time image denoising on mobile devices [26.130379174715742]
Deep convolutional neural networks have achieved great progress in image denoising tasks.
Their complicated architectures and heavy computational cost hinder their deployments on mobile devices.
We propose a mobile-friendly denoising network, namely MFDNet.
arXiv Detail & Related papers (2022-11-09T05:19:26Z) - Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech
Recognition [65.7040645560855]
We propose Q-ASR, an integer-only, zero-shot quantization scheme for ASR models.
We show negligible WER change as compared to the full-precision baseline models.
Q-ASR exhibits a large compression rate of more than 4x with small WER degradation.
arXiv Detail & Related papers (2021-03-31T06:05:40Z) - Automatic heterogeneous quantization of deep neural networks for
low-latency inference on the edge for particle detectors [5.609098985493794]
We introduce a method for designing optimally heterogeneously quantized versions of deep neural network models for minimum-energy, high-accuracy, nanosecond inference and fully automated deployment on chip.
This is crucial for the event selection procedure in proton-proton collisions at the CERN Large Hadron Collider, where resources are strictly limited and a latency of $mathcal O(1)mu$s is required.
arXiv Detail & Related papers (2020-06-15T15:07:49Z) - Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead.
We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z) - TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids [13.369813069254132]
We use model compression techniques to bridge the gap between large neural networks and battery powered hearing aid hardware.
We are the first to demonstrate their efficacy for RNN speech enhancement, using pruning and integer quantization of weights/activations.
Our model achieves a computational latency of 2.39ms, well within the 10ms target and 351$times$ better than previous work.
arXiv Detail & Related papers (2020-05-20T20:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.