Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds
and Comparison with KL-divergence based Optimal Posteriors and
Cross-Validation Procedure
- URL: http://arxiv.org/abs/2008.07330v1
- Date: Fri, 14 Aug 2020 03:15:23 GMT
- Title: Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds
and Comparison with KL-divergence based Optimal Posteriors and
Cross-Validation Procedure
- Authors: Puja Sahu and Nandyala Hemachandra
- Abstract summary: We investigate optimal posteriors for chi-squared divergence based PACBayesian bounds in terms of their distribution, scalability of computations, and test set performance.
Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We investigate optimal posteriors for recently introduced \cite{begin2016pac}
chi-squared divergence based PAC-Bayesian bounds in terms of nature of their
distribution, scalability of computations, and test set performance. For a
finite classifier set, we deduce bounds for three distance functions:
KL-divergence, linear and squared distances. Optimal posterior weights are
proportional to deviations of empirical risks, usually with subset support. For
uniform prior, it is sufficient to search among posteriors on classifier
subsets ordered by these risks. We show the bound minimization for linear
distance as a convex program and obtain a closed-form expression for its
optimal posterior. Whereas that for squared distance is a quasi-convex program
under a specific condition, and the one for KL-divergence is non-convex
optimization (a difference of convex functions). To compute such optimal
posteriors, we derive fast converging fixed point (FP) equations. We apply
these approaches to a finite set of SVM regularization parameter values to
yield stochastic SVMs with tight bounds. We perform a comprehensive performance
comparison between our optimal posteriors and known KL-divergence based
posteriors on a variety of UCI datasets with varying ranges and variances in
risk values, etc. Chi-squared divergence based posteriors have weaker bounds
and worse test errors, hinting at an underlying regularization by KL-divergence
based posteriors. Our study highlights the impact of divergence function on the
performance of PAC-Bayesian classifiers. We compare our stochastic classifiers
with cross-validation based deterministic classifier. The latter has better
test errors, but ours is more sample robust, has quantifiable generalization
guarantees, and is computationally much faster.
Related papers
- Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits [12.674929126684528]
We address the problem of semi-bandits, where a player selects among P actions from the power set of a set containing d base items.
We show that our approach efficiently leverages the semi-bandit feedback and outperforms bandit feedback approaches.
arXiv Detail & Related papers (2024-02-23T08:07:54Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Bayesian Pseudo-Coresets via Contrastive Divergence [5.479797073162603]
We introduce a novel approach for constructing pseudo-coresets by utilizing contrastive divergence.
It eliminates the need for approximations in the pseudo-coreset construction process.
We conduct extensive experiments on multiple datasets, demonstrating its superiority over existing BPC techniques.
arXiv Detail & Related papers (2023-03-20T17:13:50Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Fully Stochastic Trust-Region Sequential Quadratic Programming for
Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints.
The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Bayesian Joint Chance Constrained Optimization: Approximations and
Statistical Consistency [10.20554144865699]
We focus on the question of statistical consistency of the optimal value, computed using an approximate posterior distribution.
We also prove the feasibility of the approximate optimization problem.
We also demonstrate the utility of our approach on an optimal staffing problem for an M/M/c queueing model.
arXiv Detail & Related papers (2021-06-23T07:11:39Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.