Related papers: HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs

HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs

URL: http://arxiv.org/abs/2602.16336v2
Date: Sat, 21 Feb 2026 09:39:22 GMT
Title: HAWX: A Hardware-Aware FrameWork for Fast and Scalable ApproXimation of DNNs
Authors: Samira Nazari, Mohammad Saeed Almasi, Mahdi Taheri, Ali Azarpeyvand, Ali Mokhtari, Ali Mahani, Christian Herglotz,
Abstract summary: This work presents HAWX, a hardware-aware scalable exploration framework that employs multi-level sensitivity scoring to guide selective integration of AxC blocks.<n> Supported by predictive models for accuracy, power, and area, HAWX accelerates the evaluation of candidate configurations.<n> Experiments across state-of-the-art DNN benchmarks such as VGG-11, ResNet-18, and EfficientNetLite demonstrate that the efficiency benefits of HAWX scale exponentially with network size.
Score: 2.0919087464519275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work presents HAWX, a hardware-aware scalable exploration framework that employs multi-level sensitivity scoring at different DNN abstraction levels (operator, filter, layer, and model) to guide selective integration of heterogeneous AxC blocks. Supported by predictive models for accuracy, power, and area, HAWX accelerates the evaluation of candidate configurations, achieving over 23* speedup in a layer-level search with two candidate approximate blocks and more than (3*106)* speedup at the filter-level search only for LeNet-5, while maintaining accuracy comparable to exhaustive search. Experiments across state-of-the-art DNN benchmarks such as VGG-11, ResNet-18, and EfficientNetLite demonstrate that the efficiency benefits of HAWX scale exponentially with network size. The HAWX hardware-aware search algorithm supports both spatial and temporal accelerator architectures, leveraging either off-the-shelf approximate components or customized designs.

Related papers

Ev-Edge: Efficient Execution of Event-based Vision Algorithms on Commodity Edge Platforms [10.104371980353973]
Ev-Edge is a framework that contains three key optimizations to boost the performance of event-based vision systems on edge platforms. On several state-of-art networks for a range of autonomous navigation tasks, Ev-Edge achieves 1.28x-2.05x improvements in latency and 1.23x-2.15x in energy.
arXiv Detail & Related papers (2024-03-23T04:44:55Z)
Flexible Channel Dimensions for Differentiable Architecture Search [50.33956216274694]
We propose a novel differentiable neural architecture search method with an efficient dynamic channel allocation algorithm. We show that the proposed framework is able to find DNN architectures that are equivalent to previous methods in task accuracy and inference latency.
arXiv Detail & Related papers (2023-06-13T15:21:38Z)
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search [73.05693037548932]
X3D work presents a new family of efficient video models by expanding a hand-crafted image architecture along multiple axes. A probabilistic neural architecture search method is adopted to efficiently search in such a large space. Evaluations on Kinetics and Something-Something-V2 benchmarks confirm our AutoX3D models outperform existing ones in accuracy up to 1.3% under similar FLOPs.
arXiv Detail & Related papers (2021-12-09T05:40:33Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
HAO: Hardware-aware neural Architecture Optimization for Efficient Inference [25.265181492143107]
We develop an integer programming algorithm to prune the design space of a neural network search algorithm. Our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
arXiv Detail & Related papers (2021-04-26T17:59:29Z)
Searching for Fast Model Families on Datacenter Accelerators [33.28421782921072]
We search for fast and accurate CNN model families for efficient inference on DC accelerators. We propose a latency-aware compound scaling (LACS) method optimizing both accuracy and latency. Our LACS discovers that network depth should grow much faster than image size and network width.
arXiv Detail & Related papers (2021-02-10T18:15:40Z)
UXNet: Searching Multi-level Feature Aggregation for 3D Medical Image Segmentation [34.8581851257193]
This paper proposes a novel NAS method for 3D medical image segmentation, named UXNet. UXNet searches both the scale-wise feature aggregation strategies as well as the block-wise operators in the encoder-decoder network. The architecture discovered by UXNet outperforms existing state-of-the-art models in terms of Dice on several public 3D medical image segmentation benchmarks.
arXiv Detail & Related papers (2020-09-16T06:50:57Z)
Binary DAD-Net: Binarized Driveable Area Detection Network for Autonomous Driving [94.40107679615618]
This paper proposes a novel binarized driveable area detection network (binary DAD-Net) It uses only binary weights and activations in the encoder, the bottleneck, and the decoder part. It outperforms state-of-the-art semantic segmentation networks on public datasets.
arXiv Detail & Related papers (2020-06-15T07:09:01Z)
Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes. The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.