Related papers: Hardware Acceleration of Explainable Artificial Intelligence

Hardware Acceleration of Explainable Artificial Intelligence

URL: http://arxiv.org/abs/2305.04887v1
Date: Thu, 4 May 2023 19:07:29 GMT
Title: Hardware Acceleration of Explainable Artificial Intelligence
Authors: Zhixin Pan and Prabhat Mishra
Abstract summary: We propose a simple yet efficient framework to accelerate various XAI algorithms with existing hardware accelerators. Our proposed approach can lead to real-time outcome interpretation.
Score: 5.076419064097733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine learning (ML) is successful in achieving human-level artificial intelligence in various fields. However, it lacks the ability to explain an outcome due to its black-box nature. While recent efforts on explainable AI (XAI) has received significant attention, most of the existing solutions are not applicable in real-time systems since they map interpretability as an optimization problem, which leads to numerous iterations of time-consuming complex computations. Although there are existing hardware-based acceleration framework for XAI, they are implemented through FPGA and designed for specific tasks, leading to expensive cost and lack of flexibility. In this paper, we propose a simple yet efficient framework to accelerate various XAI algorithms with existing hardware accelerators. Specifically, this paper makes three important contributions. (1) The proposed method is the first attempt in exploring the effectiveness of Tensor Processing Unit (TPU) to accelerate XAI. (2) Our proposed solution explores the close relationship between several existing XAI algorithms with matrix computations, and exploits the synergy between convolution and Fourier transform, which takes full advantage of TPU's inherent ability in accelerating matrix computations. (3) Our proposed approach can lead to real-time outcome interpretation. Extensive experimental evaluation demonstrates that proposed approach deployed on TPU can provide drastic improvement in interpretation time (39x on average) as well as energy efficiency (69x on average) compared to existing acceleration techniques.

Related papers

ApproXAI: Energy-Efficient Hardware Acceleration of Explainable AI using Approximate Computing [0.0]
XAIedge is a novel framework that leverages approximate computing techniques into XAI algorithms, including integrated gradients, model distillation, and Shapley analysis. XAIedge translates these algorithms into approximate matrix computations and exploits the synergy between convolution, Fourier transform, and approximate computing paradigms. Our comprehensive evaluation demonstrates that XAIedge achieves a $2times$ improvement in energy efficiency compared to existing accurate XAI hardware acceleration techniques.
arXiv Detail & Related papers (2025-04-24T20:40:29Z)
Scalable Thermodynamic Second-order Optimization [0.0]
We propose a scalable algorithm for employing computers to accelerate a popular second-order thermodynamic curvature called Kron-ed approximate curvature (K-FAC) Numerical experiments show that even under significant quantization noise, the benefits of second-order optimization can be preserved. We predict substantial speedups for large-scale vision and graph problems based on realistic hardware characteristics.
arXiv Detail & Related papers (2025-02-12T17:44:40Z)
Efficient Implementation of LinearUCB through Algorithmic Improvements and Vector Computing Acceleration for Embedded Learning Systems [0.10470286407954035]
This paper presents algorithmic and hardware techniques to implement two LinearUCB Contextual Bandits algorithms on resource-constrained embedded devices. Results show notable improvements in execution time and energy consumption.
arXiv Detail & Related papers (2025-01-22T13:39:44Z)
Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI. As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios. This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z)
Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z)
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z)
TPU as Cryptographic Accelerator [13.44836928672667]
Cryptographic schemes like Fully Homomorphic Encryption (FHE) and Zero-Knowledge Proofs (ZKPs) are often hindered by their computational complexity. This paper explores the potential of leveraging TPUs/NPUs to accelerate cryptographic multiplication, thereby enhancing the performance of FHE and ZKP schemes.
arXiv Detail & Related papers (2023-07-13T04:38:32Z)
Efficient XAI Techniques: A Taxonomic Survey [40.74369038951756]
We review existing techniques of XAI acceleration into efficient non-amortized and efficient amortized methods. We analyze the limitations of an efficient XAI pipeline from the perspectives of the training phase, the deployment phase, and the use scenarios.
arXiv Detail & Related papers (2023-02-07T03:15:38Z)
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z)
Gradient Backpropagation based Feature Attribution to Enable Explainable-AI on the Edge [1.7338677787507768]
In this work, we analyze the dataflow of gradient backpropagation based feature attribution algorithms to determine the resource overhead required over inference. We develop a High-Level Synthesis (HLS) based FPGA design that is targeted for edge devices and supports three feature attribution algorithms. Our design methodology demonstrates a pathway to repurpose inference accelerators to support feature attribution with minimal overhead, thereby enabling real-time XAI on the edge.
arXiv Detail & Related papers (2022-10-19T22:58:59Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Hardware Acceleration of Explainable Machine Learning using Tensor Processing Units [3.5027291542274357]
We propose a novel framework for accelerating explainable machine learning (ML) using Processing Units (TPUs) The proposed framework exploits the synergy between matrix convolution and Fourier transform, and takes full advantage of TPU's natural ability in accelerating matrix computations. Our proposed approach is applicable across a wide variety of ML algorithms, and effective utilization of TPU-based acceleration can lead to real-time outcome interpretation.
arXiv Detail & Related papers (2021-03-22T15:11:45Z)
3-Regular 3-XORSAT Planted Solutions Benchmark of Classical and Quantum Heuristic Optimizers [0.30586855806896046]
Special-purpose hardware has emerged as an option to tackle specific computing-intensive challenges. These platforms come in many different flavors, from highly-efficient hardware implementations on digital-logic to proposals of analog hardware implementing new algorithms. In this work, we use a mapping of a specific class of linear equations whose solutions can be found efficiently, to benchmark several of these different approaches.
arXiv Detail & Related papers (2021-03-15T15:40:00Z)
Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications. We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS) Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.