Energy Efficiency in AI for 5G and Beyond: A DeepRx Case Study
- URL: http://arxiv.org/abs/2507.10409v2
- Date: Tue, 15 Jul 2025 13:17:11 GMT
- Title: Energy Efficiency in AI for 5G and Beyond: A DeepRx Case Study
- Authors: Amine Lbath, Ibtissam Labriji,
- Abstract summary: We evaluate the energy consumption of DeepRX, a deep learning receiver based on a fully convolutional ResNet architecture.<n>Key contribution is the application of knowledge distillation to train a compact DeepRX student model that emulates the performance of the teacher model but with reduced energy consumption.<n>Performance is measured by comparing the Bit Error Rate (BER) performance versus Signal-to-Interference & Noise Ratio (SINR) values of the distilled model and a model trained from scratch.
- Score: 0.276240219662896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study addresses the challenge of balancing energy efficiency with performance in AI/ML models, focusing on DeepRX, a deep learning receiver based on a fully convolutional ResNet architecture. We evaluate the energy consumption of DeepRX, considering factors including FLOPs/Watt and FLOPs/clock, and find consistency between estimated and actual energy usage, influenced by memory access patterns. The research extends to comparing energy dynamics during training and inference phases. A key contribution is the application of knowledge distillation (KD) to train a compact DeepRX student model that emulates the performance of the teacher model but with reduced energy consumption. We experiment with different student model sizes, optimal teacher sizes, and KD hyperparameters. Performance is measured by comparing the Bit Error Rate (BER) performance versus Signal-to-Interference & Noise Ratio (SINR) values of the distilled model and a model trained from scratch. The distilled models demonstrate a lower error floor across SINR levels, highlighting the effectiveness of KD in achieving energy-efficient AI solutions.
Related papers
- SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution [55.14432034345353]
We study key design principles for latter cascaded video super-resolution models, which are underexplored currently.<n>First, we propose two strategies to generate training pairs that better mimic the output characteristics of the base model, ensuring alignment between the VSR model and its upstream generator.<n>Second, we provide critical insights into VSR model behavior through systematic analysis of (1) timestep sampling strategies, (2) noise augmentation effects on low-resolution (LR) inputs.
arXiv Detail & Related papers (2025-06-24T17:57:26Z) - Green MLOps to Green GenOps: An Empirical Study of Energy Consumption in Discriminative and Generative AI Operations [2.2765705959685234]
This study investigates the energy consumption of Discriminative and Generative AI models within real-world MLOps pipelines.<n>We employ software-based power measurements to ensure ease of replication across diverse configurations, models, and datasets.
arXiv Detail & Related papers (2025-03-31T10:28:04Z) - DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - Exploring and Enhancing the Transfer of Distribution in Knowledge Distillation for Autoregressive Language Models [62.5501109475725]
Knowledge distillation (KD) is a technique that compresses large teacher models by training smaller student models to mimic them.
This paper introduces Online Knowledge Distillation (OKD), where the teacher network integrates small online modules to concurrently train with the student model.
OKD achieves or exceeds the performance of leading methods in various model architectures and sizes, reducing training time by up to fourfold.
arXiv Detail & Related papers (2024-09-19T07:05:26Z) - Ratio Divergence Learning Using Target Energy in Restricted Boltzmann Machines: Beyond Kullback--Leibler Divergence Learning [0.0]
We propose ratio divergence (RD) learning for discrete energy-based models.
RD learning combines the strength of both forward and reverse Kullback-Leibler divergence (KLD) learning.
Numerical experiments demonstrate that RD learning significantly outperforms other learning methods.
arXiv Detail & Related papers (2024-09-12T01:01:55Z) - DistiLLM: Towards Streamlined Distillation for Large Language Models [53.46759297929675]
DistiLLM is a more effective and efficient KD framework for auto-regressive language models.
DisiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs.
arXiv Detail & Related papers (2024-02-06T11:10:35Z) - Multiagent Reinforcement Learning with an Attention Mechanism for
Improving Energy Efficiency in LoRa Networks [52.96907334080273]
As the network scale increases, the energy efficiency of LoRa networks decreases sharply due to severe packet collisions.
We propose a transmission parameter allocation algorithm based on multiagent reinforcement learning (MALoRa)
Simulation results demonstrate that MALoRa significantly improves the system EE compared with baseline algorithms.
arXiv Detail & Related papers (2023-09-16T11:37:23Z) - Minimizing Energy Consumption of Deep Learning Models by Energy-Aware
Training [26.438415753870917]
We propose EAT, a gradient-based algorithm that aims to reduce energy consumption during model training.
We demonstrate that our energy-aware training algorithm EAT is able to train networks with a better trade-off between classification performance and energy efficiency.
arXiv Detail & Related papers (2023-07-01T15:44:01Z) - Uncovering Energy-Efficient Practices in Deep Learning Training:
Preliminary Steps Towards Green AI [8.025202812165412]
We consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage.
We examine the training stage of the deep learning pipeline from a sustainability perspective.
We highlight innovative and promising energy-efficient practices for training deep learning models.
arXiv Detail & Related papers (2023-03-24T12:48:21Z) - Energy Efficiency of Training Neural Network Architectures: An Empirical
Study [11.325530936177493]
The evaluation of Deep Learning models has traditionally focused on criteria such as accuracy, F1 score, and related measures.
The computations needed to train such models entail a large carbon footprint.
We study the relations between DL model architectures and their environmental impact in terms of energy consumed and CO$$ emissions produced during training.
arXiv Detail & Related papers (2023-02-02T09:20:54Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Reducing Capacity Gap in Knowledge Distillation with Review Mechanism
for Crowd Counting [16.65360204274379]
This paper introduces a novel review mechanism following KD models, motivated by the review mechanism of human-beings during the study.
The effectiveness of ReviewKD is demonstrated by a set of experiments over six benchmark datasets.
We also show that the suggested review mechanism can be used as a plug-and-play module to further boost the performance of a kind of heavy crowd counting models.
arXiv Detail & Related papers (2022-06-11T09:11:42Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.