Hardware Acceleration of Explainable Machine Learning using Tensor
Processing Units
- URL: http://arxiv.org/abs/2103.11927v1
- Date: Mon, 22 Mar 2021 15:11:45 GMT
- Title: Hardware Acceleration of Explainable Machine Learning using Tensor
Processing Units
- Authors: Zhixin Pan and Prabhat Mishra
- Abstract summary: We propose a novel framework for accelerating explainable machine learning (ML) using Processing Units (TPUs)
The proposed framework exploits the synergy between matrix convolution and Fourier transform, and takes full advantage of TPU's natural ability in accelerating matrix computations.
Our proposed approach is applicable across a wide variety of ML algorithms, and effective utilization of TPU-based acceleration can lead to real-time outcome interpretation.
- Score: 3.5027291542274357
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Machine learning (ML) is successful in achieving human-level performance in
various fields. However, it lacks the ability to explain an outcome due to its
black-box nature. While existing explainable ML is promising, almost all of
these methods focus on formatting interpretability as an optimization problem.
Such a mapping leads to numerous iterations of time-consuming complex
computations, which limits their applicability in real-time applications. In
this paper, we propose a novel framework for accelerating explainable ML using
Tensor Processing Units (TPUs). The proposed framework exploits the synergy
between matrix convolution and Fourier transform, and takes full advantage of
TPU's natural ability in accelerating matrix computations. Specifically, this
paper makes three important contributions. (1) To the best of our knowledge,
our proposed work is the first attempt in enabling hardware acceleration of
explainable ML using TPUs. (2) Our proposed approach is applicable across a
wide variety of ML algorithms, and effective utilization of TPU-based
acceleration can lead to real-time outcome interpretation. (3) Extensive
experimental results demonstrate that our proposed approach can provide an
order-of-magnitude speedup in both classification time (25x on average) and
interpretation time (13x on average) compared to state-of-the-art techniques.
Related papers
- Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture [0.0]
The work herein consists of developing a reconfigurable dataflow TPU, called the Flex-TPU, which can dynamically change the dataflow per layer during run-time.
The results show that our Flex-TPU design achieves a significant performance increase of up to 2.75x compared to conventional TPU, with only minor area and power overheads.
arXiv Detail & Related papers (2024-07-11T17:33:38Z) - Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization [0.6445087473595953]
Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning.
deploying LLM inference poses challenges due to the high compute and memory requirements.
We present Tender, an algorithm-hardware co-design solution that enables efficient deployment of LLM inference at low precision.
arXiv Detail & Related papers (2024-06-16T09:51:55Z) - Hardware Acceleration of Explainable Artificial Intelligence [5.076419064097733]
We propose a simple yet efficient framework to accelerate various XAI algorithms with existing hardware accelerators.
Our proposed approach can lead to real-time outcome interpretation.
arXiv Detail & Related papers (2023-05-04T19:07:29Z) - Deep learning applied to computational mechanics: A comprehensive
review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail.
Both hybrid and pure machine learning (ML) methods are discussed.
History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - A Deep Learning Inference Scheme Based on Pipelined Matrix
Multiplication Acceleration Design and Non-uniform Quantization [9.454905560571085]
We introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology.
Results show that our method can achieve better performance with fewer power consumption.
arXiv Detail & Related papers (2021-10-10T17:31:27Z) - A Reinforcement Learning Environment for Polyhedral Optimizations [68.8204255655161]
We propose a shape-agnostic formulation for the space of legal transformations in the polyhedral model as a Markov Decision Process (MDP)
Instead of using transformations, the formulation is based on an abstract space of possible schedules.
Our generic MDP formulation enables using reinforcement learning to learn optimization policies over a wide range of loops.
arXiv Detail & Related papers (2021-04-28T12:41:52Z) - Machine Learning Force Fields [54.48599172620472]
Machine Learning (ML) has enabled numerous advances in computational chemistry.
One of the most promising applications is the construction of ML-based force fields (FFs)
This review gives an overview of applications of ML-FFs and the chemical insights that can be obtained from them.
arXiv Detail & Related papers (2020-10-14T13:14:14Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z) - Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics.
Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis.
We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z) - IMLI: An Incremental Framework for MaxSAT-Based Learning of
Interpretable Classification Rules [40.497133083839664]
We propose IMLI: an incremental approach to MaxSAT based framework that achieves scalable runtime performance.
IMLI achieves up to three orders of magnitude runtime improvement without loss of accuracy and interpretability.
arXiv Detail & Related papers (2020-01-07T05:03:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.