Maximum Entropy Baseline for Integrated Gradients
- URL: http://arxiv.org/abs/2204.05948v1
- Date: Tue, 12 Apr 2022 17:04:42 GMT
- Title: Maximum Entropy Baseline for Integrated Gradients
- Authors: Hanxiao Tan
- Abstract summary: Integrated Gradients (IG) is one of the most popular explainability methods available.
This study proposes a new uniform baseline, i.e., the Maximum Entropy Baseline.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Integrated Gradients (IG), one of the most popular explainability methods
available, still remains ambiguous in the selection of baseline, which may
seriously impair the credibility of the explanations. This study proposes a new
uniform baseline, i.e., the Maximum Entropy Baseline, which is consistent with
the "uninformative" property of baselines defined in IG. In addition, we
propose an improved ablating evaluation approach incorporating the new
baseline, where the information conservativeness is maintained. We explain the
linear transformation invariance of IG baselines from an information
perspective. Finally, we assess the reliability of the explanations generated
by different explainability methods and different IG baselines through
extensive evaluation experiments.
Related papers
- Weighted Integrated Gradients for Feature Attribution [2.3226745625632947]
In explainable AI, Integrated Gradients (IG) is a widely adopted technique for assessing the significance of feature attributes of the input on model outputs.<n>This study argues that baselines should not be treated equivalently.<n>We introduce Weighted Integrated Gradients (WG), a novel approach that unsupervisedly evaluates baseline suitability and incorporates a strategy for selecting effective baselines.
arXiv Detail & Related papers (2025-05-06T05:36:47Z) - Tangentially Aligned Integrated Gradients for User-Friendly Explanations [5.286919475372417]
Integrated gradients are prevalent in machine learning to address the black-box problem of neural networks.
The choice of base-point is not a priori obvious and can lead to drastically different explanations.
We propose that the base-point should be chosen such that it maximises the tangential alignment of the explanation.
arXiv Detail & Related papers (2025-03-11T10:04:13Z) - A Unified Invariant Learning Framework for Graph Classification [25.35939628738617]
Invariant learning aims to recognize stable features in graph data for classification.
We introduce the Unified Invariant Learning framework for graph classification.
We present both theoretical and empirical evidence to confirm our method's ability to recognize superior stable features.
arXiv Detail & Related papers (2025-01-22T02:45:21Z) - BEE: Metric-Adapted Explanations via Baseline Exploration-Exploitation [10.15605247436119]
Two prominent challenges in explainability research involve 1) the nuanced evaluation of explanations and 2) the modeling of missing information.
We propose Baseline Exploration-Exploitation (BEE) - a path-integration method that introduces randomness to the integration process.
BEE generates a comprehensive set of explanation maps, facilitating the selection of the best-performing explanation map.
arXiv Detail & Related papers (2024-12-23T12:19:03Z) - Differentiable Information Bottleneck for Deterministic Multi-view Clustering [9.723389925212567]
We propose a new differentiable information bottleneck (DIB) method, which provides a deterministic and analytical MVC solution.
Specifically, we first propose to directly fit the mutual information of high-dimensional spaces by leveraging normalized kernel Gram matrix.
Then, based on the new mutual information measurement, a deterministic multi-view neural network with analytical gradients is explicitly trained to parameterize IB principle.
arXiv Detail & Related papers (2024-03-23T02:13:22Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - A New Baseline Assumption of Integated Gradients Based on Shaply value [31.2051113305947]
Integrated Gradients (IG) is a technique for mapping predictions back to the input features of deep neural networks (DNNs)
We argue that the standard approach of utilizing a single baseline is frequently inadequate, prompting the need for multiple baselines.
We develop a new baseline method called Shapley Integrated Gradients ( SIG), which uses proportional sampling to mirror the Shapley Value process.
arXiv Detail & Related papers (2023-10-07T14:19:07Z) - The Role of Baselines in Policy Gradient Optimization [83.42050606055822]
We show that the emphstate value baseline allows on-policy.
emphnatural policy gradient (NPG) to converge to a globally optimal.
policy at an $O (1/t) rate gradient.
We find that the primary effect of the value baseline is to textbfreduce the aggressiveness of the updates rather than their variance.
arXiv Detail & Related papers (2023-01-16T06:28:00Z) - Deterministic Decoupling of Global Features and its Application to Data
Analysis [0.0]
We propose a new formalism that is based on defining transformations on submanifolds.
Through these transformations we define a normalization that, we demonstrate, allows for decoupling differentiable features.
We apply this method in the original data domain and at the output of a filter bank to regression and classification problems based on global descriptors.
arXiv Detail & Related papers (2022-07-05T15:54:39Z) - Revisiting GANs by Best-Response Constraint: Perspective, Methodology,
and Application [49.66088514485446]
Best-Response Constraint (BRC) is a general learning framework to explicitly formulate the potential dependency of the generator on the discriminator.
We show that even with different motivations and formulations, a variety of existing GANs ALL can be uniformly improved by our flexible BRC methodology.
arXiv Detail & Related papers (2022-05-20T12:42:41Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Inference-InfoGAN: Inference Independence via Embedding Orthogonal Basis
Expansion [2.198430261120653]
Disentanglement learning aims to construct independent and interpretable latent variables in which generative models are a popular strategy.
We propose a novel GAN-based disentanglement framework via embedding Orthogonal Basis Expansion (OBE) into InfoGAN network.
Our Inference-InfoGAN achieves higher disentanglement score in terms of FactorVAE, Separated ferenceAttribute Predictability (SAP), Mutual Information Gap (MIG) and Variation Predictability (VP) metrics without model fine-tuning.
arXiv Detail & Related papers (2021-10-02T11:54:23Z) - A general sample complexity analysis of vanilla policy gradient [101.16957584135767]
Policy gradient (PG) is one of the most popular reinforcement learning (RL) problems.
"vanilla" theoretical understanding of PG trajectory is one of the most popular methods for solving RL problems.
arXiv Detail & Related papers (2021-07-23T19:38:17Z) - GenDICE: Generalized Offline Estimation of Stationary Values [108.17309783125398]
We show that effective estimation can still be achieved in important applications.
Our approach is based on estimating a ratio that corrects for the discrepancy between the stationary and empirical distributions.
The resulting algorithm, GenDICE, is straightforward and effective.
arXiv Detail & Related papers (2020-02-21T00:27:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.