Related papers: ApproXAI: Energy-Efficient Hardware Acceleration of Explainable AI using Approximate Computing

ApproXAI: Energy-Efficient Hardware Acceleration of Explainable AI using Approximate Computing

URL: http://arxiv.org/abs/2504.17929v1
Date: Thu, 24 Apr 2025 20:40:29 GMT
Title: ApproXAI: Energy-Efficient Hardware Acceleration of Explainable AI using Approximate Computing
Authors: Ayesha Siddique, Khurram Khalil, Khaza Anuarul Hoque,
Abstract summary: XAIedge is a novel framework that leverages approximate computing techniques into XAI algorithms, including integrated gradients, model distillation, and Shapley analysis.<n>XAIedge translates these algorithms into approximate matrix computations and exploits the synergy between convolution, Fourier transform, and approximate computing paradigms.<n>Our comprehensive evaluation demonstrates that XAIedge achieves a $2times$ improvement in energy efficiency compared to existing accurate XAI hardware acceleration techniques.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Explainable artificial intelligence (XAI) enhances AI system transparency by framing interpretability as an optimization problem. However, this approach often necessitates numerous iterations of computationally intensive operations, limiting its applicability in real-time scenarios. While recent research has focused on XAI hardware acceleration on FPGAs and TPU, these methods do not fully address energy efficiency in real-time settings. To address this limitation, we propose XAIedge, a novel framework that leverages approximate computing techniques into XAI algorithms, including integrated gradients, model distillation, and Shapley analysis. XAIedge translates these algorithms into approximate matrix computations and exploits the synergy between convolution, Fourier transform, and approximate computing paradigms. This approach enables efficient hardware acceleration on TPU-based edge devices, facilitating faster real-time outcome interpretations. Our comprehensive evaluation demonstrates that XAIedge achieves a $2\times$ improvement in energy efficiency compared to existing accurate XAI hardware acceleration techniques while maintaining comparable accuracy. These results highlight the potential of XAIedge to significantly advance the deployment of explainable AI in energy-constrained real-time applications.

Related papers

QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge [55.75103034526652]
We propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs.<n>Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost.<n>We design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability.
arXiv Detail & Related papers (2025-03-20T21:03:10Z)
Efficient Implementation of LinearUCB through Algorithmic Improvements and Vector Computing Acceleration for Embedded Learning Systems [0.10470286407954035]
This paper presents algorithmic and hardware techniques to implement two LinearUCB Contextual Bandits algorithms on resource-constrained embedded devices.<n>Results show notable improvements in execution time and energy consumption.
arXiv Detail & Related papers (2025-01-22T13:39:44Z)
Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA [20.629635991749808]
This paper proposes an algorithm and hardware co-design framework that can generate field-programmable gate array (FPGA)-based accelerators for efficient BayesNNs. At the algorithm level, we propose novel multi-exit dropout-based BayesNNs with reduced computational and memory overheads. At the hardware level, this paper introduces a transformation framework that can generate FPGA-based accelerators for the proposed efficient BayesNNs.
arXiv Detail & Related papers (2024-06-20T17:08:42Z)
Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
TPU as Cryptographic Accelerator [13.44836928672667]
Cryptographic schemes like Fully Homomorphic Encryption (FHE) and Zero-Knowledge Proofs (ZKPs) are often hindered by their computational complexity. This paper explores the potential of leveraging TPUs/NPUs to accelerate cryptographic multiplication, thereby enhancing the performance of FHE and ZKP schemes.
arXiv Detail & Related papers (2023-07-13T04:38:32Z)
GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus. By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z)
Hardware Acceleration of Explainable Artificial Intelligence [5.076419064097733]
We propose a simple yet efficient framework to accelerate various XAI algorithms with existing hardware accelerators. Our proposed approach can lead to real-time outcome interpretation.
arXiv Detail & Related papers (2023-05-04T19:07:29Z)
Efficient XAI Techniques: A Taxonomic Survey [40.74369038951756]
We review existing techniques of XAI acceleration into efficient non-amortized and efficient amortized methods. We analyze the limitations of an efficient XAI pipeline from the perspectives of the training phase, the deployment phase, and the use scenarios.
arXiv Detail & Related papers (2023-02-07T03:15:38Z)
Gradient Backpropagation based Feature Attribution to Enable Explainable-AI on the Edge [1.7338677787507768]
In this work, we analyze the dataflow of gradient backpropagation based feature attribution algorithms to determine the resource overhead required over inference. We develop a High-Level Synthesis (HLS) based FPGA design that is targeted for edge devices and supports three feature attribution algorithms. Our design methodology demonstrates a pathway to repurpose inference accelerators to support feature attribution with minimal overhead, thereby enabling real-time XAI on the edge.
arXiv Detail & Related papers (2022-10-19T22:58:59Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)
Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics. Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis. We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.