Optimal Scalarizations for Sublinear Hypervolume Regret
- URL: http://arxiv.org/abs/2307.03288v2
- Date: Sun, 3 Dec 2023 00:08:15 GMT
- Title: Optimal Scalarizations for Sublinear Hypervolume Regret
- Authors: Qiuyi Zhang (Richard)
- Abstract summary: We show that hypervolume scalarizations with uniformly random weights are surprisingly optimal for provably minimizing the hypervolume regret.
We derive a novel non-Euclidean analysis that produces improved hypervolume regret bounds of $tildeO( d T-1/2 + T-1/k)$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scalarization is a general technique that can be deployed in any
multiobjective setting to reduce multiple objectives into one, such as recently
in RLHF for training reward models that align human preferences. Yet some have
dismissed this classical approach because linear scalarizations are known to
miss concave regions of the Pareto frontier. To that end, we aim to find simple
non-linear scalarizations that can explore a diverse set of $k$ objectives on
the Pareto frontier, as measured by the dominated hypervolume. We show that
hypervolume scalarizations with uniformly random weights are surprisingly
optimal for provably minimizing the hypervolume regret, achieving an optimal
sublinear regret bound of $O(T^{-1/k})$, with matching lower bounds that
preclude any algorithm from doing better asymptotically. As a theoretical case
study, we consider the multiobjective stochastic linear bandits problem and
demonstrate that by exploiting the sublinear regret bounds of the hypervolume
scalarizations, we can derive a novel non-Euclidean analysis that produces
improved hypervolume regret bounds of $\tilde{O}( d T^{-1/2} + T^{-1/k})$. We
support our theory with strong empirical performance of using simple
hypervolume scalarizations that consistently outperforms both the linear and
Chebyshev scalarizations, as well as standard multiobjective algorithms in
bayesian optimization, such as EHVI.
Related papers
- Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias [55.72269695392027]
This paper focuses on applying entropic mirror descent to solve linear systems.<n>The main challenge for the convergence analysis stems from the unboundedness of the domain.<n>To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes.
arXiv Detail & Related papers (2025-05-05T12:33:18Z) - Implicit bias of Normalized Steepest Descent in Multiclass Classification: Sign Descent, Spectral Descent, and Adam [33.082961718280245]
We characterize the implicit bias of Adam and Sign gradient descent (SignGD) in multi-class cross-entropy minimization.
We generalize our analysis to p-norm normalized steepest descent (NSD) algorithms.
A key insight is that the analysis of general entry-wise and Schatten p-norms can be reduced to the analysis of NSD with max-norm.
arXiv Detail & Related papers (2025-02-07T05:09:32Z) - Mirror Descent Under Generalized Smoothness [23.5387392871236]
We introduce a new $ell*$-smoothness concept that measures the norm of Hessian terms of a general norm and its dual.
We establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness.
arXiv Detail & Related papers (2025-02-02T11:23:10Z) - High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise.
We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models [0.2741266294612776]
We analyze a class of gradient algorithms with momentum on a high-dimensional random least squares problem.
We show that (small-batch) momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly.
In the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum.
arXiv Detail & Related papers (2021-06-07T15:08:24Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - Adaptive Online Estimation of Piecewise Polynomial Trends [23.91519151164528]
We consider the framework of non-stationary optimization with squared error losses and noisy gradient feedback.
Motivated from the theory of non-parametric regression, we introduce a new variational constraint.
We show that the same policy is minimax optimal for several other non-parametric families of interest.
arXiv Detail & Related papers (2020-09-30T19:30:28Z) - Random Hypervolume Scalarizations for Provable Multi-Objective Black Box
Optimization [8.90548944387431]
In this paper, we consider multi-objective optimization, where $f(x)$ outputs a vector of possibly competing objectives.
We show that any provably convergent single-objective optimization process can be effortlessly converted to a multi-objective optimization process with provable convergence guarantees.
arXiv Detail & Related papers (2020-06-08T15:00:30Z) - On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and
Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging.
We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.