Related papers: Non-Uniform Interpolation in Integrated Gradients for Low-Latency Explainable-AI

Non-Uniform Interpolation in Integrated Gradients for Low-Latency Explainable-AI

URL: http://arxiv.org/abs/2302.11107v1
Date: Wed, 22 Feb 2023 03:03:28 GMT
Title: Non-Uniform Interpolation in Integrated Gradients for Low-Latency Explainable-AI
Authors: Ashwin Bhat, Arijit Raychowdhury
Abstract summary: Integrated Gradients (IG) is a popular XAI algorithm that attributes relevance scores to input features. There is a significant computational overhead to generate the explanation which hinders real-time XAI. We propose a novel non-uniform scheme to compute the IG attribution scores which replaces the baseline uniform optimization.
Score: 2.048335092363435
License: http://creativecommons.org/licenses/by/4.0/
Abstract: There has been a surge in Explainable-AI (XAI) methods that provide insights into the workings of Deep Neural Network (DNN) models. Integrated Gradients (IG) is a popular XAI algorithm that attributes relevance scores to input features commensurate with their contribution to the model's output. However, it requires multiple forward \& backward passes through the model. Thus, compared to a single forward-pass inference, there is a significant computational overhead to generate the explanation which hinders real-time XAI. This work addresses the aforementioned issue by accelerating IG with a hardware-aware algorithm optimization. We propose a novel non-uniform interpolation scheme to compute the IG attribution scores which replaces the baseline uniform interpolation. Our algorithm significantly reduces the total interpolation steps required without adversely impacting convergence. Experiments on the ImageNet dataset using a pre-trained InceptionV3 model demonstrate \textit{2.6-3.6}$\times$ performance speedup on GPU systems for iso-convergence. This includes the minimal \textit{0.2-3.2}\% latency overhead introduced by the pre-processing stage of computing the non-uniform interpolation step-sizes.

Related papers

Limited Memory Online Gradient Descent for Kernelized Pairwise Learning with Dynamic Averaging [18.843097436906618]
We introduce a lightweight OGD algorithm that does not require the independence of examples and generalizes to kernel pairwise learning. Our algorithm builds the gradient based on a random example and a moving average representing the past data, which results in a sub-linear regret bound with a complexity of $O(T)$. Several experiments with real-world datasets show that the complexity technique outperforms kernel and linear gradient in offline and online scenarios.
arXiv Detail & Related papers (2024-02-02T05:21:50Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Error-Aware Spatial Ensembles for Video Frame Interpolation [50.63021118973639]
Video frame(VFI) algorithms have improved considerably in recent years due to unprecedented progress in both data-driven algorithms and their implementations. Recent research has introduced advanced motion estimation or novel warping methods as the means to address challenging VFI scenarios. This work introduces such a solution. By closely examining the correlation between optical flow and IE, the paper proposes novel error prediction metrics that partition the middle frame into distinct regions corresponding to different IE levels.
arXiv Detail & Related papers (2022-07-25T16:15:38Z)
Interpolation-based Correlation Reduction Network for Semi-Supervised Graph Learning [49.94816548023729]
We propose a novel graph contrastive learning method, termed Interpolation-based Correlation Reduction Network (ICRN) In our method, we improve the discriminative capability of the latent feature by enlarging the margin of decision boundaries. By combining the two settings, we extract rich supervision information from both the abundant unlabeled nodes and the rare yet valuable labeled nodes for discnative representation learning.
arXiv Detail & Related papers (2022-06-06T14:26:34Z)
Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. We show that there is a natural synergy between these two settings. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z)
Scalable Optimal Transport in High Dimensions for Graph Distances, Embedding Alignment, and More [7.484063729015126]
We propose two effective log-linear time approximations of the cost matrix for optimal transport. These approximations enable general log-linear time algorithms for entropy-regularized OT that perform well even for the complex, high-dimensional spaces. For graph distance regression we propose the graph transport network (GTN), which combines graph neural networks (GNNs) with enhanced Sinkhorn.
arXiv Detail & Related papers (2021-07-14T17:40:08Z)
Distributed stochastic optimization with large delays [59.95552973784946]
One of the most widely used methods for solving large-scale optimization problems is distributed asynchronous gradient descent (DASGD) We show that DASGD converges to a global optimal implementation model under same delay assumptions.
arXiv Detail & Related papers (2021-07-06T21:59:49Z)
Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding [5.46121027847413]
This paper introduces a novel explainable neural network-based inter-prediction scheme. A novel training framework enables each network branch to resemble a specific fractional shift. When implemented in the context of the Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved.
arXiv Detail & Related papers (2021-06-16T16:48:01Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
Lagrangian Decomposition for Neural Network Verification [148.0448557991349]
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. We propose a novel approach based on Lagrangian Decomposition. We show that we obtain bounds comparable with off-the-shelf solvers in a fraction of their running time.
arXiv Detail & Related papers (2020-02-24T17:55:10Z)
A Lagrangian Approach to Information Propagation in Graph Neural Networks [21.077268852378385]
In this paper, we propose a novel approach to the state computation and the learning algorithm for Graph Neural Network (GNN) models. The state convergence procedure is implicitly expressed by the constraint satisfaction mechanism and does not require a separate iterative phase for each epoch of the learning procedure. In fact, the computational structure is based on the search for saddle points of the Lagrangian in the adjoint space composed of weights, neural outputs (node states) and Lagrange multipliers.
arXiv Detail & Related papers (2020-02-18T16:13:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.