Deletion and Insertion Tests in Regression Models
- URL: http://arxiv.org/abs/2205.12423v3
- Date: Wed, 23 Aug 2023 12:02:28 GMT
- Title: Deletion and Insertion Tests in Regression Models
- Authors: Naofumi Hama, Masayoshi Mase and Art B. Owen
- Abstract summary: A basic task in explainable AI (XAI) is to identify the most important features behind a prediction made by a black box function $f$.
The insertion and deletion tests of Petsiuk et al. Kernel can be used to judge the quality of algorithms that rank pixels from most to at least important for a classification.
- Score: 1.2891210250935148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A basic task in explainable AI (XAI) is to identify the most important
features behind a prediction made by a black box function $f$. The insertion
and deletion tests of Petsiuk et al. (2018) can be used to judge the quality of
algorithms that rank pixels from most to least important for a classification.
Motivated by regression problems we establish a formula for their area under
the curve (AUC) criteria in terms of certain main effects and interactions in
an anchored decomposition of $f$. We find an expression for the expected value
of the AUC under a random ordering of inputs to $f$ and propose an alternative
area above a straight line for the regression setting. We use this criterion to
compare feature importances computed by integrated gradients (IG) to those
computed by Kernel SHAP (KS) as well as LIME, DeepLIFT, vanilla gradient and
input$\times$gradient methods. KS has the best overall performance in two
datasets we consider but it is very expensive to compute. We find that IG is
nearly as good as KS while being much faster. Our comparison problems include
some binary inputs that pose a challenge to IG because it must use values
between the possible variable levels and so we consider ways to handle binary
variables in IG. We show that sorting variables by their Shapley value does not
necessarily give the optimal ordering for an insertion-deletion test. It will
however do that for monotone functions of additive models, such as logistic
regression.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit
Feedback and Unknown Transition [71.33787410075577]
We study reinforcement learning with linear function approximation, unknown transition, and adversarial losses.
We propose a new algorithm that attains an $widetildeO(dsqrtHS3K + sqrtHSAK)$ regret with high probability.
arXiv Detail & Related papers (2024-03-07T15:03:50Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - A Conditional Randomization Test for Sparse Logistic Regression in
High-Dimension [36.00360315353985]
emphCRT-logit is an algorithm that combines a variable-distillation step and a decorrelation step.
We provide a theoretical analysis of this procedure, and demonstrate its effectiveness on simulations, along with experiments on large-scale brain-imaging and genomics datasets.
arXiv Detail & Related papers (2022-05-29T09:37:16Z) - Black-Box Generalization [31.80268332522017]
We provide the first error analysis for black-box learning through derivative generalization.
We show both generalization are independent $d$, $K$ and under appropriate choices a slightly decreased learning rate.
arXiv Detail & Related papers (2022-02-14T17:14:48Z) - Planning and Learning with Adaptive Lookahead [74.39132848733847]
Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation.
Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration.
We propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate.
arXiv Detail & Related papers (2022-01-28T20:26:55Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - Learning to extrapolate using continued fractions: Predicting the
critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective.
We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z) - Statistical-Query Lower Bounds via Functional Gradients [19.5924910463796]
We show that any statistical-query algorithm with tolerance $n- (1/epsilon)b$ must use at least $2nc epsilon$ queries for some constant $b.
Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems.
arXiv Detail & Related papers (2020-06-29T05:15:32Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.