Monte Carlo with kernel-based Gibbs measures: Guarantees for probabilistic herding
- URL: http://arxiv.org/abs/2402.11736v2
- Date: Sat, 09 Aug 2025 17:22:16 GMT
- Title: Monte Carlo with kernel-based Gibbs measures: Guarantees for probabilistic herding
- Authors: Martin Rouault, Rémi Bardenet, Mylène Maïda,
- Abstract summary: We study a joint probability distribution over quadrature nodes, whose support tends to minimize the same worst-case error as kernel herding.<n>Our main contribution is to prove that it does outperform i.i.d Monte Carlo, in the sense of coming with a tighter concentration inequality on the worst-case integration error.
- Score: 5.502903075972815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel herding belongs to a family of deterministic quadratures that seek to minimize the worst-case integration error over a reproducing kernel Hilbert space (RKHS). These quadrature rules come with strong experimental evidence that this worst-case error decreases at a faster rate than the standard square root of the number of quadrature nodes. This conjectured fast rate is key for integrating expensive-to-evaluate functions, as in Bayesian inference of expensive models, and makes up for the increased computational cost of sampling, compared to i.i.d. or MCMC quadratures. However, there is little theoretical support for this faster-than-square-root rate, at least in the usual case where the RKHS is infinite-dimensional, while recent progress on distribution compression suggests that results on the direct minimization of worst-case integration are possible. In this paper, we study a joint probability distribution over quadrature nodes, whose support tends to minimize the same worst-case error as kernel herding. Our main contribution is to prove that it does outperform i.i.d Monte Carlo, in the sense of coming with a tighter concentration inequality on the worst-case integration error. This first step towards proving a fast error decay demonstrates that the mathematical toolbox developed around Gibbs measures can help understand to what extent kernel herding and its variants improve on computationally cheaper methods. Moreover, we investigate the computational bottlenecks of approximately sampling our quadrature, and we demonstrate on toy examples that a faster rate of convergence, though not worst-case, is likely.
Related papers
- Quenched large deviations for Monte Carlo integration with Coulomb gases [5.502903075972815]
We show that a random approximation of the potential preserves the fast large deviation principle that guarantees the proposed integration algorithm to outperform independent or Markov quadratures.<n>For the Coulomb interaction kernel, we need the approximation to be based on another Gibbs measure, and we prove in passing a control on the uniform convergence of the approximation of the potential.
arXiv Detail & Related papers (2025-08-02T14:52:06Z) - On the Convergence of Irregular Sampling in Reproducing Kernel Hilbert Spaces [0.0]
We discuss approximation properties of kernel regression under minimalistic assumptions on both the kernel and the input data.
We first prove error estimates in the kernel's RKHS norm.
This leads to new results concerning uniform convergence of kernel regression on compact domains.
arXiv Detail & Related papers (2025-04-18T10:57:16Z) - Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation [51.248784084461334]
We prove new convergence rates for a generalized version of Nesterov acceleration underrho conditions.
Our analysis reduces the dependence on the strong growth constant from $$ to $sqrt$ as compared to prior work.
arXiv Detail & Related papers (2024-04-03T00:41:19Z) - Quasi-Monte Carlo Graph Random Features [38.762964164013226]
We present a novel mechanism to improve the accuracy of graph random features (GRFs)
Our method induces negative correlations between the lengths of the algorithm's random walks by imposing antithetic termination.
We derive strong theoretical guarantees on the properties of these quasi-Monte Carlo GRFs.
arXiv Detail & Related papers (2023-05-21T14:12:02Z) - Quantizing Heavy-tailed Data in Statistical Estimation: (Near) Minimax
Rates, Covariate Quantization, and Uniform Recovery [22.267482745621997]
This paper studies the quantization of heavy-tailed data in some fundamental statistical estimation problems.
We propose to truncate and properly dither the data prior to a uniform quantization.
arXiv Detail & Related papers (2022-12-30T06:28:30Z) - Posterior and Computational Uncertainty in Gaussian Processes [52.26904059556759]
Gaussian processes scale prohibitively with the size of the dataset.
Many approximation methods have been developed, which inevitably introduce approximation error.
This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior.
We develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended.
arXiv Detail & Related papers (2022-05-30T22:16:25Z) - Lassoed Tree Boosting [53.56229983630983]
We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation.
Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
arXiv Detail & Related papers (2022-05-22T00:34:41Z) - Informative Planning for Worst-Case Error Minimisation in Sparse
Gaussian Process Regression [28.50144453974764]
We derive a universal worst-case error bound for sparse GP regression with bounded noise.
We show that the worst-case error minimisation can be achieved by solving a posterior entropy minimisation problem.
arXiv Detail & Related papers (2022-03-08T03:26:08Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Nystr\"om Kernel Mean Embeddings [92.10208929236826]
We propose an efficient approximation procedure based on the Nystr"om method.
It yields sufficient conditions on the subsample size to obtain the standard $n-1/2$ rate.
We discuss applications of this result for the approximation of the maximum mean discrepancy and quadrature rules.
arXiv Detail & Related papers (2022-01-31T08:26:06Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - Acceleration of the kernel herding algorithm by improved gradient
approximation [6.802401545890963]
kernel herding is a method used to construct quadrature formulas in a reproducing kernel Hilbert space.
We propose two improved versions of the kernel herding algorithm.
arXiv Detail & Related papers (2021-05-17T14:32:45Z) - One-Bit Compressed Sensing via One-Shot Hard Thresholding [7.594050968868919]
A problem of 1-bit compressed sensing is to estimate a sparse signal from a few binary measurements.
We present a novel and concise analysis that moves away from the widely used non-constrained notion of width.
arXiv Detail & Related papers (2020-07-07T17:28:03Z) - Free Energy Wells and Overlap Gap Property in Sparse PCA [81.64027805404483]
We study a variant of the sparse PCA (principal component analysis) problem in the "hard" regime.
We show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem.
We prove that the Overlap Gap Property (OGP) holds in a significant part of the hard regime.
arXiv Detail & Related papers (2020-06-18T17:18:02Z) - Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA.
We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z) - Counterexamples to the Low-Degree Conjecture [80.3668228845075]
A conjecture of Hopkins posits that for certain high-dimensional hypothesis testing problems, no-time algorithm can outperform so-called "simple statistics"
This conjecture formalizes the beliefs surrounding a line of recent work that seeks to understand statistical-versus-computational tradeoffs.
arXiv Detail & Related papers (2020-04-17T21:08:11Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z) - RFN: A Random-Feature Based Newton Method for Empirical Risk
Minimization in Reproducing Kernel Hilbert Spaces [14.924672048447334]
Large-scale finite-sum problems can be solved using efficient variants of Newton method, where the Hessian is approximated via sub-samples of data.
In this paper, we observe that for this class of problems, one can naturally use kernel approximation to speed up the Newton method.
We provide a novel second-order algorithm that enjoys local superlinear convergence and global linear convergence.
arXiv Detail & Related papers (2020-02-12T01:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.