Stability and Generalization of Bilevel Programming in Hyperparameter
Optimization
- URL: http://arxiv.org/abs/2106.04188v1
- Date: Tue, 8 Jun 2021 08:59:55 GMT
- Title: Stability and Generalization of Bilevel Programming in Hyperparameter
Optimization
- Authors: Fan Bao, Guoqiang Wu, Chongxuan Li, Jun Zhu, Bo Zhang
- Abstract summary: We show that gradient-based algorithms can be better than cross-validation under certain conditions in a theoretical perspective.
We prove that regularization terms in both the outer and inner levels can relieve the overfitting problem in gradient-based algorithms.
- Score: 38.93716746097571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the (gradient-based) bilevel programming framework is widely used
in hyperparameter optimization and has achieved excellent performance
empirically. Previous theoretical work mainly focuses on its optimization
properties, while leaving the analysis on generalization largely open. This
paper attempts to address the issue by presenting an expectation bound w.r.t.
the validation set based on uniform stability. Our results can explain some
mysterious behaviours of the bilevel programming in practice, for instance,
overfitting to the validation set. We also present an expectation bound for the
classical cross-validation algorithm. Our results suggest that gradient-based
algorithms can be better than cross-validation under certain conditions in a
theoretical perspective. Furthermore, we prove that regularization terms in
both the outer and inner levels can relieve the overfitting problem in
gradient-based algorithms. In experiments on feature learning and data
reweighting for noisy labels, we corroborate our theoretical findings.
Related papers
- Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition [53.68517860700599]
We propose an ensemble hypergradient strategy to reduce the variance in HPO algorithms effectively.<n>We establish a connection between excess error and hypergradient estimation, offering some understanding of empirical observations.
arXiv Detail & Related papers (2026-02-20T02:52:13Z) - Principled Algorithms for Optimizing Generalized Metrics in Binary Classification [53.604375124674796]
We introduce principled algorithms for optimizing generalized metrics, supported by $H$-consistency and finite-sample generalization bounds.<n>Our approach reformulates metric optimization as a generalized cost-sensitive learning problem.<n>We develop new algorithms, METRO, with strong theoretical performance guarantees.
arXiv Detail & Related papers (2025-12-29T01:33:42Z) - Learning Theory for Kernel Bilevel Optimization [25.28618481877551]
We investigate the generalization properties for kernel bilevel optimization problems where the inner objective is optimized over a Reproducing Kernel Hilbert Space.
We establish novel generalization error bounds for the bilevel problem under finite-sample approximation.
These generalization error estimates allow to characterize the statistical accuracy of gradient-based methods applied to the empirical discretization of the bilevel problem.
arXiv Detail & Related papers (2025-02-12T14:52:04Z) - Bilevel Learning with Inexact Stochastic Gradients [2.247833425312671]
Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications.
The large-scale nature of these problems has led to the development of inexact and computationally efficient methods.
arXiv Detail & Related papers (2024-12-16T18:18:47Z) - Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty.
Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Distributed Stochastic Optimization under a General Variance Condition [13.911633636387059]
Distributed optimization has drawn great attention recently due to its effectiveness in solving largescale machine learning problems.
We revisit the classical Federated Averaging (Avg) and establish the convergence results under only a mild variance for smooth non objective functions.
Almost a stationary convergence point is also established under the gradients condition.
arXiv Detail & Related papers (2023-01-30T05:48:09Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Learning Prediction Intervals for Regression: Generalization and
Calibration [12.576284277353606]
We study the generation of prediction intervals in regression for uncertainty quantification.
We use a general learning theory to characterize the optimality-feasibility tradeoff that encompasses Lipschitz continuity and VC-subgraph classes.
We empirically demonstrate the strengths of our interval generation and calibration algorithms in terms of testing performances compared to existing benchmarks.
arXiv Detail & Related papers (2021-02-26T17:55:30Z) - Recent Theoretical Advances in Non-Convex Optimization [56.88981258425256]
Motivated by recent increased interest in analysis of optimization algorithms for non- optimization in deep networks and other problems in data, we give an overview of recent results of theoretical optimization algorithms for non- optimization.
arXiv Detail & Related papers (2020-12-11T08:28:51Z) - Convergence Properties of Stochastic Hypergradients [38.64355126221992]
We study approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk on a large dataset.
We provide numerical experiments to support our theoretical analysis and to show the advantage of using hypergradients in practice.
arXiv Detail & Related papers (2020-11-13T20:50:36Z) - Beyond variance reduction: Understanding the true impact of baselines on
policy optimization [24.09670734037029]
We show that learning dynamics are governed by the curvature of the loss function and the noise of the gradient estimates.
We present theoretical results showing that, at least for bandit problems, curvature and noise are not sufficient to explain the learning dynamics.
arXiv Detail & Related papers (2020-08-31T17:52:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.