Sampling Permutations for Shapley Value Estimation
- URL: http://arxiv.org/abs/2104.12199v1
- Date: Sun, 25 Apr 2021 16:44:18 GMT
- Title: Sampling Permutations for Shapley Value Estimation
- Authors: Rory Mitchell, Joshua Cooper, Eibe Frank, Geoffrey Holmes
- Abstract summary: Game-theoretic attribution techniques based on Shapley values are used extensively to interpret machine learning models.
As the computation of Shapley values can be expressed as a summation over a set of permutations, a common approach is to sample a subset of these permutations for approximation.
Unfortunately, standard Monte Carlo sampling methods can exhibit slow convergence, and more sophisticated quasi Monte Carlo methods are not well defined on the space of permutations.
- Score: 1.0323063834827415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Game-theoretic attribution techniques based on Shapley values are used
extensively to interpret black-box machine learning models, but their exact
calculation is generally NP-hard, requiring approximation methods for
non-trivial models. As the computation of Shapley values can be expressed as a
summation over a set of permutations, a common approach is to sample a subset
of these permutations for approximation. Unfortunately, standard Monte Carlo
sampling methods can exhibit slow convergence, and more sophisticated quasi
Monte Carlo methods are not well defined on the space of permutations. To
address this, we investigate new approaches based on two classes of
approximation methods and compare them empirically. First, we demonstrate
quadrature techniques in a RKHS containing functions of permutations, using the
Mallows kernel to obtain explicit convergence rates of $O(1/n)$, improving on
$O(1/\sqrt{n})$ for plain Monte Carlo. The RKHS perspective also leads to quasi
Monte Carlo type error bounds, with a tractable discrepancy measure defined on
permutations. Second, we exploit connections between the hypersphere
$\mathbb{S}^{d-2}$ and permutations to create practical algorithms for
generating permutation samples with good properties. Experiments show the above
techniques provide significant improvements for Shapley value estimates over
existing methods, converging to a smaller RMSE in the same number of model
evaluations.
Related papers
- A Stein Gradient Descent Approach for Doubly Intractable Distributions [5.63014864822787]
We propose a novel Monte Carlo Stein variational gradient descent (MC-SVGD) approach for inference for doubly intractable distributions.
The proposed method achieves substantial computational gains over existing algorithms, while providing comparable inferential performance for the posterior distributions.
arXiv Detail & Related papers (2024-10-28T13:42:27Z) - Multi-fidelity Hamiltonian Monte Carlo [1.86413150130483]
We propose a novel two-stage Hamiltonian Monte Carlo algorithm with a surrogate model.
The accepted probability is computed in the first stage via a standard HMC proposal.
If the proposal is accepted, the posterior is evaluated in the second stage using the high-fidelity numerical solver.
arXiv Detail & Related papers (2024-05-08T13:03:55Z) - Stochastic automatic differentiation for Monte Carlo processes [1.1279808969568252]
We consider the extension of Automatic Differentiation (AD) techniques to Monte Carlo process.
We show that the Hamiltonian approach can be understood as a change of variables of the reweighting approach.
arXiv Detail & Related papers (2023-07-28T08:59:01Z) - Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits.
We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - A Scalable Second Order Method for Ill-Conditioned Matrix Completion
from Few Samples [0.0]
We propose an iterative algorithm for low-rank matrix completion.
It is able to complete very ill-conditioned matrices with a condition number of up to $10$ from few samples.
arXiv Detail & Related papers (2021-06-03T20:31:00Z) - Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC [83.48593305367523]
Hybrid Monte Carlo is a powerful Markov Chain Monte Carlo method for sampling from complex continuous distributions.
We introduce a new approach based on augmenting Monte Carlo methods with SurVAE Flows to sample from discrete distributions.
We demonstrate the efficacy of our algorithm on a range of examples from statistics, computational physics and machine learning, and observe improvements compared to alternative algorithms.
arXiv Detail & Related papers (2021-02-04T02:21:08Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z) - Sparse Orthogonal Variational Inference for Gaussian Processes [34.476453597078894]
We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points.
We show that this formulation recovers existing approximations and at the same time allows to obtain tighter lower bounds on the marginal likelihood and new variational inference algorithms.
arXiv Detail & Related papers (2019-10-23T15:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.