Related papers: Hardware Acceleration of Explainable Machine Learning using Tensor Processing Units

Hardware Acceleration of Explainable Machine Learning using Tensor Processing Units

URL: http://arxiv.org/abs/2103.11927v1
Date: Mon, 22 Mar 2021 15:11:45 GMT
Title: Hardware Acceleration of Explainable Machine Learning using Tensor Processing Units
Authors: Zhixin Pan and Prabhat Mishra
Abstract summary: We propose a novel framework for accelerating explainable machine learning (ML) using Processing Units (TPUs) The proposed framework exploits the synergy between matrix convolution and Fourier transform, and takes full advantage of TPU's natural ability in accelerating matrix computations. Our proposed approach is applicable across a wide variety of ML algorithms, and effective utilization of TPU-based acceleration can lead to real-time outcome interpretation.
Score: 3.5027291542274357
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Machine learning (ML) is successful in achieving human-level performance in various fields. However, it lacks the ability to explain an outcome due to its black-box nature. While existing explainable ML is promising, almost all of these methods focus on formatting interpretability as an optimization problem. Such a mapping leads to numerous iterations of time-consuming complex computations, which limits their applicability in real-time applications. In this paper, we propose a novel framework for accelerating explainable ML using Tensor Processing Units (TPUs). The proposed framework exploits the synergy between matrix convolution and Fourier transform, and takes full advantage of TPU's natural ability in accelerating matrix computations. Specifically, this paper makes three important contributions. (1) To the best of our knowledge, our proposed work is the first attempt in enabling hardware acceleration of explainable ML using TPUs. (2) Our proposed approach is applicable across a wide variety of ML algorithms, and effective utilization of TPU-based acceleration can lead to real-time outcome interpretation. (3) Extensive experimental results demonstrate that our proposed approach can provide an order-of-magnitude speedup in both classification time (25x on average) and interpretation time (13x on average) compared to state-of-the-art techniques.

Related papers

ApproXAI: Energy-Efficient Hardware Acceleration of Explainable AI using Approximate Computing [0.0]
XAIedge is a novel framework that leverages approximate computing techniques into XAI algorithms, including integrated gradients, model distillation, and Shapley analysis. XAIedge translates these algorithms into approximate matrix computations and exploits the synergy between convolution, Fourier transform, and approximate computing paradigms. Our comprehensive evaluation demonstrates that XAIedge achieves a $2times$ improvement in energy efficiency compared to existing accurate XAI hardware acceleration techniques.
arXiv Detail & Related papers (2025-04-24T20:40:29Z)
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting [12.317709090608837]
We present SpecEE, a fast inference engine with speculative early exiting. SpecEE achieves 2.25x and 2.43x speedup with Llama2-7B on cloud and PC scenarios respectively.
arXiv Detail & Related papers (2025-04-11T02:38:53Z)
Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture [0.0]
The work herein consists of developing a reconfigurable dataflow TPU, called the Flex-TPU, which can dynamically change the dataflow per layer during run-time. The results show that our Flex-TPU design achieves a significant performance increase of up to 2.75x compared to conventional TPU, with only minor area and power overheads.
arXiv Detail & Related papers (2024-07-11T17:33:38Z)
Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization [0.6445087473595953]
Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning. deploying LLM inference poses challenges due to the high compute and memory requirements. We present Tender, an algorithm-hardware co-design solution that enables efficient deployment of LLM inference at low precision.
arXiv Detail & Related papers (2024-06-16T09:51:55Z)
Hardware Acceleration of Explainable Artificial Intelligence [5.076419064097733]
We propose a simple yet efficient framework to accelerate various XAI algorithms with existing hardware accelerators. Our proposed approach can lead to real-time outcome interpretation.
arXiv Detail & Related papers (2023-05-04T19:07:29Z)
Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization [9.454905560571085]
We introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. Results show that our method can achieve better performance with fewer power consumption.
arXiv Detail & Related papers (2021-10-10T17:31:27Z)
A Reinforcement Learning Environment for Polyhedral Optimizations [68.8204255655161]
We propose a shape-agnostic formulation for the space of legal transformations in the polyhedral model as a Markov Decision Process (MDP) Instead of using transformations, the formulation is based on an abstract space of possible schedules. Our generic MDP formulation enables using reinforcement learning to learn optimization policies over a wide range of loops.
arXiv Detail & Related papers (2021-04-28T12:41:52Z)
Machine Learning Force Fields [54.48599172620472]
Machine Learning (ML) has enabled numerous advances in computational chemistry. One of the most promising applications is the construction of ML-based force fields (FFs) This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them.
arXiv Detail & Related papers (2020-10-14T13:14:14Z)
A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions. Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data. Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics. Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis. We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)
IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules [40.497133083839664]
We propose IMLI: an incremental approach to MaxSAT based framework that achieves scalable runtime performance. IMLI achieves up to three orders of magnitude runtime improvement without loss of accuracy and interpretability.
arXiv Detail & Related papers (2020-01-07T05:03:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.