Hyperparameter Tuning Through Pessimistic Bilevel Optimization
- URL: http://arxiv.org/abs/2412.03666v1
- Date: Wed, 04 Dec 2024 19:01:20 GMT
- Title: Hyperparameter Tuning Through Pessimistic Bilevel Optimization
- Authors: Meltem Apaydin Ustun, Liang Xu, Bo Zeng, Xiaoning Qian,
- Abstract summary: We develop a novel relaxation-based approximation method to solve the computationally challenging pessimistic bilevel optimization problem.
pessimistic solutions have demonstrated better prediction performances than optimistic counterparts when we have limited training data or perturbed testing data.
- Score: 20.353504462492992
- License:
- Abstract: Automated hyperparameter search in machine learning, especially for deep learning models, is typically formulated as a bilevel optimization problem, with hyperparameter values determined by the upper level and the model learning achieved by the lower-level problem. Most of the existing bilevel optimization solutions either assume the uniqueness of the optimal training model given hyperparameters or adopt an optimistic view when the non-uniqueness issue emerges. Potential model uncertainty may arise when training complex models with limited data, especially when the uniqueness assumption is violated. Thus, the suitability of the optimistic view underlying current bilevel hyperparameter optimization solutions is questionable. In this paper, we propose pessimistic bilevel hyperparameter optimization to assure appropriate outer-level hyperparameters to better generalize the inner-level learned models, by explicitly incorporating potential uncertainty of the inner-level solution set. To solve the resulting computationally challenging pessimistic bilevel optimization problem, we develop a novel relaxation-based approximation method. It derives pessimistic solutions with more robust prediction models. In our empirical studies of automated hyperparameter search for binary linear classifiers, pessimistic solutions have demonstrated better prediction performances than optimistic counterparts when we have limited training data or perturbed testing data, showing the necessity of considering pessimistic solutions besides existing optimistic ones.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Decision-focused predictions via pessimistic bilevel optimization: a computational study [0.7499722271664147]
Uncertainty in optimization parameters is an important and longstanding challenge.
We build predictive models that measure a emphregret measure on decisions taken with them.
We show various computational techniques to achieve tractability.
arXiv Detail & Related papers (2023-12-29T15:05:00Z) - Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and
Optimization [59.386153202037086]
Predict-Then- framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving.
This approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step.
This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models.
arXiv Detail & Related papers (2023-11-22T01:32:06Z) - Fine-Tuning Adaptive Stochastic Optimizers: Determining the Optimal Hyperparameter $ε$ via Gradient Magnitude Histogram Analysis [0.7366405857677226]
We introduce a new framework based on the empirical probability density function of the loss's magnitude, termed the "gradient magnitude histogram"
We propose a novel algorithm using gradient magnitude histograms to automatically estimate a refined and accurate search space for the optimal safeguard.
arXiv Detail & Related papers (2023-11-20T04:34:19Z) - Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning
Approach [6.7826352751791985]
We propose a novel pessimism-based Bayesian learning method for optimal dynamic treatment regimes in the offline setting.
We integrate the pessimism principle with Thompson sampling and Bayesian machine learning for optimizing the degree of pessimism.
We develop the computational algorithm based on variational inference, which is highly efficient and scalable.
arXiv Detail & Related papers (2022-10-26T02:14:10Z) - Automatically Learning Compact Quality-aware Surrogates for Optimization
Problems [55.94450542785096]
Solving optimization problems with unknown parameters requires learning a predictive model to predict the values of the unknown parameters and then solving the problem using these values.
Recent work has shown that including the optimization problem as a layer in a complex training model pipeline results in predictions of iteration of unobserved decision making.
We show that we can improve solution quality by learning a low-dimensional surrogate model of a large optimization problem.
arXiv Detail & Related papers (2020-06-18T19:11:54Z) - Bilevel Optimization for Differentially Private Optimization in Energy
Systems [53.806512366696275]
This paper studies how to apply differential privacy to constrained optimization problems whose inputs are sensitive.
The paper shows that, under a natural assumption, a bilevel model can be solved efficiently for large-scale nonlinear optimization problems.
arXiv Detail & Related papers (2020-01-26T20:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.