PlatformX: An End-to-End Transferable Platform for Energy-Efficient Neural Architecture Search
- URL: http://arxiv.org/abs/2510.08993v1
- Date: Fri, 10 Oct 2025 04:22:14 GMT
- Title: PlatformX: An End-to-End Transferable Platform for Energy-Efficient Neural Architecture Search
- Authors: Xiaolong Tu, Dawei Chen, Kyungtae Han, Onur Altintas, Haoxin Wang,
- Abstract summary: Hardware-Aware Neural Architecture (HW-NAS) has emerged as a powerful tool for designing efficient deep neural networks (DNNs) tailored to edge devices.<n>We present PlatformX, a fully automated and transferable HW-NAS framework designed to overcome limitations.
- Score: 10.727973227148114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hardware-Aware Neural Architecture Search (HW-NAS) has emerged as a powerful tool for designing efficient deep neural networks (DNNs) tailored to edge devices. However, existing methods remain largely impractical for real-world deployment due to their high time cost, extensive manual profiling, and poor scalability across diverse hardware platforms with complex, device-specific energy behavior. In this paper, we present PlatformX, a fully automated and transferable HW-NAS framework designed to overcome these limitations. PlatformX integrates four key components: (i) an energy-driven search space that expands conventional NAS design by incorporating energy-critical configurations, enabling exploration of high-efficiency architectures; (ii) a transferable kernel-level energy predictor across devices and incrementally refined with minimal on-device samples; (iii) a Pareto-based multi-objective search algorithm that balances energy and accuracy to identify optimal trade-offs; and (iv) a high-resolution runtime energy profiling system that automates on-device power measurement using external monitors without human intervention. We evaluate PlatformX across multiple mobile platforms, showing that it significantly reduces search overhead while preserving accuracy and energy fidelity. It identifies models with up to 0.94 accuracy or as little as 0.16 mJ per inference, both outperforming MobileNet-V2 in accuracy and efficiency. Code and tutorials are available at github.com/amai-gsu/PlatformX.
Related papers
- SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices [72.0937240883345]
Recent advances in diffusion transformers (DiTs) have set new standards in image generation, yet remain impractical for on-device deployment.<n>We present an efficient DiT framework tailored for mobile and edge devices that achieves transformer-level generation quality under strict resource constraints.
arXiv Detail & Related papers (2026-01-13T07:46:46Z) - Lightweight Transformer Architectures for Edge Devices in Real-Time Applications [0.0]
This survey examines lightweight transformer architectures specifically designed for edge deployment.<n>We systematically review prominent lightweight variants including MobileBERT, TinyBERT, DistilBERT, EfficientFormer, EdgeFormer, and MobileViT.<n> Experimental results demonstrate that modern lightweight transformers can achieve 75-96% of full-model accuracy while reducing model size by 4-10x and inference latency by 3-9x.
arXiv Detail & Related papers (2026-01-05T01:04:25Z) - Hardware-Aware Feature Extraction Quantisation for Real-Time Visual Odometry on FPGA Platforms [0.0]
We propose an embedded implementation of an unsupervised architecture capable of detecting and describing feature points.<n>We implemented the solution on an FPGA System-on-Chip (SoC) platform, specifically the AMD/Xilinx Zynq UltraScale+.<n>This allowed us to process 640 x 480 pixel images at up to 54 fps, outperforming state-of-the-art solutions in the field.
arXiv Detail & Related papers (2025-07-10T16:37:20Z) - ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge [17.74343318260183]
ECORE is a framework that integrates multiple dynamic routing strategies.<n>ECORE balances energy efficiency and detection performance based on object characteristics.<n>Results demonstrate that our proposed context-aware routing strategies can reduce energy consumption and latency by 35% and 49%, respectively.
arXiv Detail & Related papers (2025-07-08T14:16:14Z) - Benchmarking Energy and Latency in TinyML: A Novel Method for Resource-Constrained AI [0.0]
This work introduces an alternative benchmarking methodology that integrates energy and latency measurements.<n>To evaluate our setup, we tested the STM32N6 MCU, which includes a NPU for executing neural networks.<n>Our findings demonstrate that reducing the core voltage and clock frequency improve the efficiency of pre- and post-processing.
arXiv Detail & Related papers (2025-05-21T15:12:14Z) - Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - Multi-objective Differentiable Neural Architecture Search [58.67218773054753]
We propose a novel NAS algorithm that encodes user preferences to trade-off performance and hardware metrics.<n>Our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets.
arXiv Detail & Related papers (2024-02-28T10:09:04Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for
Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
High architectural and computational complexity can result in poor suitability for deployment on embedded devices.
Fast GraspNeXt is a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping.
arXiv Detail & Related papers (2023-04-21T18:07:14Z) - Searching for Efficient Neural Architectures for On-Device ML on Edge
TPUs [10.680700357879601]
Neural architecture search (NAS) comes to the rescue for efficiently utilizing the high compute throughput offered by on-device ML accelerators.
Existing NAS frameworks have several practical limitations in scaling to multiple tasks and different target platforms.
We provide a two-pronged approach to this challenge: (i) a neural architecture that decouples model cost evaluation, search space design, and the algorithm to rapidly target various on-device ML tasks, and (ii) search spaces crafted from group convolution based inverted bottleneck (IBN) variants.
arXiv Detail & Related papers (2022-04-09T00:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.