Revisiting Score Function Estimators for $k$-Subset Sampling
- URL: http://arxiv.org/abs/2407.16058v2
- Date: Fri, 16 Aug 2024 10:29:46 GMT
- Title: Revisiting Score Function Estimators for $k$-Subset Sampling
- Authors: Klas Wijk, Ricardo Vinuesa, Hossein Azizpour,
- Abstract summary: We show how to efficiently compute the $k$-subset distribution's score function using a discrete Fourier transform.
The resulting estimator provides both exact samples and unbiased gradient estimates.
Experiments in feature selection show results competitive with current methods, despite weaker assumptions.
- Score: 5.464421236280698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Are score function estimators an underestimated approach to learning with $k$-subset sampling? Sampling $k$-subsets is a fundamental operation in many machine learning tasks that is not amenable to differentiable parametrization, impeding gradient-based optimization. Prior work has focused on relaxed sampling or pathwise gradient estimators. Inspired by the success of score function estimators in variational inference and reinforcement learning, we revisit them within the context of $k$-subset sampling. Specifically, we demonstrate how to efficiently compute the $k$-subset distribution's score function using a discrete Fourier transform, and reduce the estimator's variance with control variates. The resulting estimator provides both exact samples and unbiased gradient estimates while also applying to non-differentiable downstream models, unlike existing methods. Experiments in feature selection show results competitive with current methods, despite weaker assumptions.
Related papers
- Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - SIMPLE: A Gradient Estimator for $k$-Subset Sampling [42.38652558807518]
In this work, we fall back to discrete $k$-subset sampling on the forward pass.
We show that our gradient estimator, SIMPLE, exhibits lower bias and variance compared to state-of-the-art estimators.
Empirical results show improved performance on learning to explain and sparse linear regression.
arXiv Detail & Related papers (2022-10-04T22:33:16Z) - Measuring the Effect of Training Data on Deep Learning Predictions via
Randomized Experiments [5.625056584412003]
We develop a principled algorithm for estimating the contribution of training data points to a deep learning model.
Our algorithm estimates the AME, a quantity that measures the expected (average) marginal effect of adding a data point to a subset of the training data.
arXiv Detail & Related papers (2022-06-20T21:27:18Z) - Gradient Estimation with Discrete Stein Operators [44.64146470394269]
We introduce a variance reduction technique based on Stein operators for discrete distributions.
Our technique achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.
arXiv Detail & Related papers (2022-02-19T02:22:23Z) - Unrolling Particles: Unsupervised Learning of Sampling Distributions [102.72972137287728]
Particle filtering is used to compute good nonlinear estimates of complex systems.
We show in simulations that the resulting particle filter yields good estimates in a wide range of scenarios.
arXiv Detail & Related papers (2021-10-06T16:58:34Z) - Storchastic: A Framework for General Stochastic Automatic
Differentiation [9.34612743192798]
We introduce Storchastic, a new framework for automatic differentiation of graphs.
Storchastic allows the modeler to choose from a wide variety of gradient estimation methods at each sampling step.
Storchastic is provably unbiased for estimation of any-order gradients, and generalizes variance reduction techniques to higher-order gradient estimates.
arXiv Detail & Related papers (2021-04-01T12:19:54Z) - Optimal Off-Policy Evaluation from Multiple Logging Policies [77.62012545592233]
We study off-policy evaluation from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling.
We find the OPE estimator for multiple loggers with minimum variance for any instance, i.e., the efficient one.
arXiv Detail & Related papers (2020-10-21T13:43:48Z) - Exploratory Landscape Analysis is Strongly Sensitive to the Sampling
Strategy [8.246980996934347]
In black-box optimization, where an explicit problem representation is not available, the feature values need to be approximated from a small number of sample points.
In this work, we analyze how the sampling method and the sample size influence the quality of the feature value approximations.
arXiv Detail & Related papers (2020-06-19T13:45:13Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - Spatially Adaptive Inference with Stochastic Feature Sampling and
Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations.
We then densely reconstruct the feature map with an efficient procedure.
The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z) - Estimating Gradients for Discrete Random Variables by Sampling without
Replacement [93.09326095997336]
We derive an unbiased estimator for expectations over discrete random variables based on sampling without replacement.
We show that our estimator can be derived as the Rao-Blackwellization of three different estimators.
arXiv Detail & Related papers (2020-02-14T14:15:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.