Related papers: Energy-Efficient Deep Learning for Traffic Classification on Microcontrollers

Energy-Efficient Deep Learning for Traffic Classification on Microcontrollers

URL: http://arxiv.org/abs/2506.10851v1
Date: Thu, 12 Jun 2025 16:10:22 GMT
Title: Energy-Efficient Deep Learning for Traffic Classification on Microcontrollers
Authors: Adel Chehade, Edoardo Ragusa, Paolo Gastaldo, Rodolfo Zunino,
Abstract summary: We present a practical deep learning (DL) approach for energy-efficient traffic classification on resource-limited microcontrollers.<n>We develop a lightweight 1D-CNN, optimized via hardware-aware neural architecture search (HW-NAS), which achieves 96.59% accuracy on the ISCX VPN-Non-VPN dataset.<n>We evaluate real-world inference performance on two microcontrollers.
Score: 1.3124513975412255
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In this paper, we present a practical deep learning (DL) approach for energy-efficient traffic classification (TC) on resource-limited microcontrollers, which are widely used in IoT-based smart systems and communication networks. Our objective is to balance accuracy, computational efficiency, and real-world deployability. To that end, we develop a lightweight 1D-CNN, optimized via hardware-aware neural architecture search (HW-NAS), which achieves 96.59% accuracy on the ISCX VPN-NonVPN dataset with only 88.26K parameters, a 20.12K maximum tensor size, and 10.08M floating-point operations (FLOPs). Moreover, it generalizes across various TC tasks, with accuracies ranging from 94% to 99%. To enable deployment, the model is quantized to INT8, suffering only a marginal 1-2% accuracy drop relative to its Float32 counterpart. We evaluate real-world inference performance on two microcontrollers: the high-performance STM32F746G-DISCO and the cost-sensitive Nucleo-F401RE. The deployed model achieves inference latencies of 31.43ms and 115.40ms, with energy consumption of 7.86 mJ and 29.10 mJ per inference, respectively. These results demonstrate the feasibility of on-device encrypted traffic analysis, paving the way for scalable, low-power IoT security solutions.

Related papers

Efficient Traffic Classification using HW-NAS: Advanced Analysis and Optimization for Cybersecurity on Resource-Constrained Devices [1.3124513975412255]
This paper presents a hardware-efficient deep neural network (DNN) optimized through hardware-aware neural architecture search (HW-NAS)<n>It supports the classification of session-level encrypted traffic on resource-constrained Internet of Things (IoT) and edge devices.<n>The optimized model attains an accuracy of 96.59% with just 88.26K parameters, 10.08M FLOPs, and a maximum tensor size of 20.12K.
arXiv Detail & Related papers (2025-06-12T21:37:45Z)
EfficientLLM: Efficiency in Large Language Models [64.3537131208038]
Large Language Models (LLMs) have driven significant progress, yet their growing counts and context windows incur prohibitive compute, energy, and monetary costs.<n>We introduce EfficientLLM, a novel benchmark and the first comprehensive empirical study evaluating efficiency techniques for LLMs at scale.
arXiv Detail & Related papers (2025-05-20T02:27:08Z)
Accelerating TinyML Inference on Microcontrollers through Approximate Kernels [3.566060656925169]
In this work, we combine approximate computing and software kernel design to accelerate the inference of approximate CNN models on microcontrollers. Our evaluation on an STM32-Nucleo board and 2 popular CNNs trained on the CIFAR-10 dataset shows that, compared to state-of-the-art exact inference, our solutions can feature on average 21% latency reduction.
arXiv Detail & Related papers (2024-09-25T11:10:33Z)
A Precision-Optimized Fixed-Point Near-Memory Digital Processing Unit for Analog In-Memory Computing [10.992736723518036]
We propose a Near-Memory digital Processing Unit (NMPU) based on fixed-point arithmetic. It achieves competitive accuracy and higher computing throughput than previous approaches. We validate the efficacy of the NMPU by using data from an AIMC chip and demonstrate that a simulated AIMC system with the proposed NMPU outperforms existing FP16-based implementations.
arXiv Detail & Related papers (2024-02-12T10:30:45Z)
Efficient Deep Learning Models for Privacy-preserving People Counting on Low-resolution Infrared Arrays [11.363207467478134]
Infrared (IR) array sensors offer a low-cost, energy-efficient, and privacy-preserving solution for people counting. Previous work has shown that Deep Learning (DL) can yield superior performance on this task. We compare 6 different DL architectures on a novel dataset composed of IR images collected from a commercial 8x8 array.
arXiv Detail & Related papers (2023-04-12T15:29:28Z)
Ultra-low Power Deep Learning-based Monocular Relative Localization Onboard Nano-quadrotors [64.68349896377629]
This work presents a novel autonomous end-to-end system that addresses the monocular relative localization, through deep neural networks (DNNs), of two peer nano-drones. To cope with the ultra-constrained nano-drone platform, we propose a vertically-integrated framework, including dataset augmentation, quantization, and system optimizations. Experimental results show that our DNN can precisely localize a 10cm-size target nano-drone by employing only low-resolution monochrome images, up to 2m distance.
arXiv Detail & Related papers (2023-03-03T14:14:08Z)
High-Throughput, High-Performance Deep Learning-Driven Light Guide Plate Surface Visual Quality Inspection Tailored for Real-World Manufacturing Environments [75.66288398180525]
Light guide plates are essential optical components widely used in a diverse range of applications ranging from medical lighting fixtures to back-lit TV displays. In this work, we introduce a fully-integrated, high-performance deep learning-driven workflow for light guide plate surface visual quality inspection (VQI) tailored for real-world manufacturing environments. To enable automated VQI on the edge computing within the fully-integrated VQI system, a highly compact deep anti-aliased attention condenser neural network (which we name LightDefectNet) was created. Experiments show that LightDetectNet achieves a detection accuracy
arXiv Detail & Related papers (2022-12-20T20:11:11Z)
On the Tradeoff between Energy, Precision, and Accuracy in Federated Quantized Neural Networks [68.52621234990728]
Federated learning (FL) over wireless networks requires balancing between accuracy, energy efficiency, and precision. We propose a quantized FL framework that represents data with a finite level of precision in both local training and uplink transmission. Our framework can reduce energy consumption by up to 53% compared to a standard FL model.
arXiv Detail & Related papers (2021-11-15T17:00:03Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
FastFlowNet: A Lightweight Network for Fast Optical Flow Estimation [81.76975488010213]
Dense optical flow estimation plays a key role in many robotic vision tasks. Current networks often occupy large number of parameters and require heavy computation costs. Our proposed FastFlowNet works in the well-known coarse-to-fine manner with following innovations.
arXiv Detail & Related papers (2021-03-08T03:09:37Z)
Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices [20.349809458335532]
Sound event detection (SED) is a hot topic in consumer and smart city applications. Existing approaches based on Deep Neural Networks are very effective, but highly demanding in terms of memory, power, and throughput. In this paper, we explore the combination of extreme quantization to a small-print binary neural network (BNN) with the highly energy-efficient, RISC-V-based (8+1)-core GAP8 microcontroller.
arXiv Detail & Related papers (2021-01-12T12:38:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.