Optimal Scalarizations for Sublinear Hypervolume Regret
- URL: http://arxiv.org/abs/2307.03288v2
- Date: Sun, 3 Dec 2023 00:08:15 GMT
- Title: Optimal Scalarizations for Sublinear Hypervolume Regret
- Authors: Qiuyi Zhang (Richard)
- Abstract summary: We show that hypervolume scalarizations with uniformly random weights are surprisingly optimal for provably minimizing the hypervolume regret.
We derive a novel non-Euclidean analysis that produces improved hypervolume regret bounds of $tildeO( d T-1/2 + T-1/k)$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scalarization is a general technique that can be deployed in any
multiobjective setting to reduce multiple objectives into one, such as recently
in RLHF for training reward models that align human preferences. Yet some have
dismissed this classical approach because linear scalarizations are known to
miss concave regions of the Pareto frontier. To that end, we aim to find simple
non-linear scalarizations that can explore a diverse set of $k$ objectives on
the Pareto frontier, as measured by the dominated hypervolume. We show that
hypervolume scalarizations with uniformly random weights are surprisingly
optimal for provably minimizing the hypervolume regret, achieving an optimal
sublinear regret bound of $O(T^{-1/k})$, with matching lower bounds that
preclude any algorithm from doing better asymptotically. As a theoretical case
study, we consider the multiobjective stochastic linear bandits problem and
demonstrate that by exploiting the sublinear regret bounds of the hypervolume
scalarizations, we can derive a novel non-Euclidean analysis that produces
improved hypervolume regret bounds of $\tilde{O}( d T^{-1/2} + T^{-1/k})$. We
support our theory with strong empirical performance of using simple
hypervolume scalarizations that consistently outperforms both the linear and
Chebyshev scalarizations, as well as standard multiobjective algorithms in
bayesian optimization, such as EHVI.
Related papers
- Directional Smoothness and Gradient Methods: Convergence and Adaptivity [16.779513676120096]
We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization.
Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective.
We prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness.
arXiv Detail & Related papers (2024-03-06T22:24:05Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - Robust Implicit Regularization via Weight Normalization [6.042206709451915]
We show that weight normalization enables a robust bias that persists even if the weights are at practically large scale.
Experiments suggest that the gains in both convergence speed and robustness of the implicit bias are improved dramatically by using weight normalization.
arXiv Detail & Related papers (2023-05-09T13:38:55Z) - Optimal Online Generalized Linear Regression with Stochastic Noise and
Its Application to Heteroscedastic Bandits [88.6139446295537]
We study the problem of online generalized linear regression in the setting of a generalized linear model with possibly unbounded additive noise.
We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise.
We propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
arXiv Detail & Related papers (2022-02-28T08:25:26Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models [0.2741266294612776]
We analyze a class of gradient algorithms with momentum on a high-dimensional random least squares problem.
We show that (small-batch) momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly.
In the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum.
arXiv Detail & Related papers (2021-06-07T15:08:24Z) - The Role of Momentum Parameters in the Optimal Convergence of Adaptive
Polyak's Heavy-ball Methods [12.93796690939018]
We prove that the adaptive Polyak's Heavy-ball (HB) method attains an optimal individual convergence rate of $O(frac1sqrtt)$.
Our new analysis shows how the HB momentum and its time-varying weight help us to achieve the acceleration in convex optimization.
arXiv Detail & Related papers (2021-02-15T02:57:14Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Random Hypervolume Scalarizations for Provable Multi-Objective Black Box
Optimization [8.90548944387431]
In this paper, we consider multi-objective optimization, where $f(x)$ outputs a vector of possibly competing objectives.
We show that any provably convergent single-objective optimization process can be effortlessly converted to a multi-objective optimization process with provable convergence guarantees.
arXiv Detail & Related papers (2020-06-08T15:00:30Z) - Support recovery and sup-norm convergence rates for sparse pivotal
estimation [79.13844065776928]
In high dimensional sparse regression, pivotal estimators are estimators for which the optimal regularization parameter is independent of the noise level.
We show minimax sup-norm convergence rates for non smoothed and smoothed, single task and multitask square-root Lasso-type estimators.
arXiv Detail & Related papers (2020-01-15T16:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.