Related papers: Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment

Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment

URL: http://arxiv.org/abs/2405.01004v1
Date: Thu, 2 May 2024 05:09:07 GMT
Title: Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment
Authors: Aditya Chakravarty,
Abstract summary: This study examines the effects of quantization, memory demands, and energy consumption on the performance of various ASR model inference on the NVIDIA Jetson Orin Nano. We found that changing precision from fp32 to fp16 halves the energy consumption for audio transcription across different models, with minimal performance degradation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent transformer-based ASR models have achieved word-error rates (WER) below 4%, surpassing human annotator accuracy, yet they demand extensive server resources, contributing to significant carbon footprints. The traditional server-based architecture of ASR also presents privacy concerns, alongside reliability and latency issues due to network dependencies. In contrast, on-device (edge) ASR enhances privacy, boosts performance, and promotes sustainability by effectively balancing energy use and accuracy for specific applications. This study examines the effects of quantization, memory demands, and energy consumption on the performance of various ASR model inference on the NVIDIA Jetson Orin Nano. By analyzing WER and transcription speed across models using FP32, FP16, and INT8 quantization on clean and noisy datasets, we highlight the crucial trade-offs between accuracy, speeds, quantization, energy efficiency, and memory needs. We found that changing precision from fp32 to fp16 halves the energy consumption for audio transcription across different models, with minimal performance degradation. A larger model size and number of parameters neither guarantees better resilience to noise, nor predicts the energy consumption for a given transcription load. These, along with several other findings offer novel insights for optimizing ASR systems within energy- and memory-limited environments, crucial for the development of efficient on-device ASR solutions. The code and input data needed to reproduce the results in this article are open sourced are available on [https://github.com/zzadiues3338/ASR-energy-jetson].

Related papers

Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models [8.589209709453026]
Quantization, particularly Post-Training Quantization (PTQ), offers an effective way to reduce model size and inference cost without retraining.<n>We present a benchmark of eight state-of-the-art (SOTA) PTQ methods applied to two leading edge-ASR model families, Whisper and Moonshine.<n>Our results characterize the trade-offs between efficiency and accuracy, demonstrating that even $3$-bit quantization can succeed on high capacity models.
arXiv Detail & Related papers (2025-07-10T16:00:27Z)
Hyperdimensional Intelligent Sensing for Efficient Real-Time Audio Processing on Extreme Edge [4.705504163848239]
This paper proposes a groundbreaking approach with a near-sensor model tailored for intelligent audio-sensing frameworks. Our model excels in low-energy, rapid inference, and online learning. It is highly adaptable for efficient ASIC design implementation, offering superior energy efficiency.
arXiv Detail & Related papers (2025-02-15T08:19:20Z)
FreqMixFormerV2: Lightweight Frequency-aware Mixed Transformer for Human Skeleton Action Recognition [9.963966059349731]
FreqMixForemrV2 is built upon the Frequency-aware Mixed Transformer (FreqMixFormer) for identifying subtle and discriminative actions. The proposed model achieves a superior balance between efficiency and accuracy, outperforming state-of-the-art methods with only 60% of the parameters.
arXiv Detail & Related papers (2024-12-29T23:52:40Z)
sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks [51.516451451719654]
Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient. This paper introduces a novel SNN-based Voice Activity Detection model, referred to as sVAD. It provides effective auditory feature representation through SincNet and 1D convolution, and improves noise robustness with attention mechanisms.
arXiv Detail & Related papers (2024-03-09T02:55:44Z)
Fine-Tuning Surrogate Gradient Learning for Optimal Hardware Performance in Spiking Neural Networks [1.52292571922932]
Spiking Neural Networks (SNNs) can provide tremendous energy efficiency benefits when carefully exploited in hardware. This work reveals novel insights into the impacts of training on hardware performance.
arXiv Detail & Related papers (2024-02-09T06:38:12Z)
Full-Stack Optimization for CAM-Only DNN Inference [2.0837295518447934]
This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors. We propose a novel compilation flow to optimize convolutions on APs by reducing their arithmetic intensity. Our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators.
arXiv Detail & Related papers (2024-01-23T10:27:38Z)
Design Space Exploration of Low-Bit Quantized Neural Networks for Visual Place Recognition [26.213493552442102]
Visual Place Recognition (VPR) is a critical task for performing global re-localization in visual perception systems. Recently new works have focused on the recall@1 metric as a performance measure with limited focus on resource utilization. This has resulted in methods that use deep learning models too large to deploy on low powered edge devices. We study the impact of compact convolutional network architecture design in combination with full-precision and mixed-precision post-training quantization on VPR performance.
arXiv Detail & Related papers (2023-12-14T15:24:42Z)
Multiagent Reinforcement Learning with an Attention Mechanism for Improving Energy Efficiency in LoRa Networks [52.96907334080273]
As the network scale increases, the energy efficiency of LoRa networks decreases sharply due to severe packet collisions. We propose a transmission parameter allocation algorithm based on multiagent reinforcement learning (MALoRa) Simulation results demonstrate that MALoRa significantly improves the system EE compared with baseline algorithms.
arXiv Detail & Related papers (2023-09-16T11:37:23Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
Towards Improved Room Impulse Response Estimation for Speech Recognition [53.04440557465013]
We propose a novel approach for blind room impulse response (RIR) estimation systems in the context of far-field automatic speech recognition (ASR) We first draw the connection between improved RIR estimation and improved ASR performance, as a means of evaluating neural RIR estimators. We then propose a generative adversarial network (GAN) based architecture that encodes RIR features from reverberant speech and constructs an RIR from the encoded features.
arXiv Detail & Related papers (2022-11-08T00:40:27Z)
LEAF + AIO: Edge-Assisted Energy-Aware Object Detection for Mobile Augmented Reality [77.00418462388525]
Mobile augmented reality (MAR) applications are significantly energy-guzzling. We design an edge-based energy-aware MAR system that enables MAR devices to dynamically change their configurations. Our proposed dynamic MAR configuration adaptations can minimize the per frame energy consumption of multiple MAR clients.
arXiv Detail & Related papers (2022-05-27T06:11:50Z)
Heterogeneous Reservoir Computing Models for Persian Speech Recognition [0.0]
Reservoir computing models (RC) models have been proven inexpensive to train, have vastly fewer parameters, and are compatible with emergent hardware technologies. We propose heterogeneous single and multi-layer ESNs to create non-linear transformations of the inputs that capture temporal context at different scales.
arXiv Detail & Related papers (2022-05-25T09:15:15Z)
Energy-Efficient Model Compression and Splitting for Collaborative Inference Over Time-Varying Channels [52.60092598312894]
We propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes. Our proposed solution results in minimal energy consumption and $CO$ emission compared to the considered baselines.
arXiv Detail & Related papers (2021-06-02T07:36:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.