GLISp-r: A preference-based optimization algorithm with convergence
guarantees
- URL: http://arxiv.org/abs/2202.01125v2
- Date: Mon, 2 Oct 2023 08:39:18 GMT
- Title: GLISp-r: A preference-based optimization algorithm with convergence
guarantees
- Authors: Davide Previtali, Mirko Mazzoleni, Antonio Ferramosca, Fabio Previdi
- Abstract summary: We propose an extension of a preference-based optimization procedure called GLISp-r.
In GLISp-r, we propose a different criterion to use when looking for new candidate samples that is inspired by MSRS.
Compared to GLISp, GLISp-r is less likely to get stuck on local optima of the preference-based optimization problem.
- Score: 2.517173388598129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Preference-based optimization algorithms are iterative procedures that seek
the optimal calibration of a decision vector based only on comparisons between
couples of different tunings. At each iteration, a human decision-maker
expresses a preference between two calibrations (samples), highlighting which
one, if any, is better than the other. The optimization procedure must use the
observed preferences to find the tuning of the decision vector that is most
preferred by the decision-maker, while also minimizing the number of
comparisons. In this work, we formulate the preference-based optimization
problem from a utility theory perspective. Then, we propose GLISp-r, an
extension of a recent preference-based optimization procedure called GLISp. The
latter uses a Radial Basis Function surrogate to describe the tastes of the
decision-maker. Iteratively, GLISp proposes new samples to compare with the
best calibration available by trading off exploitation of the surrogate model
and exploration of the decision space. In GLISp-r, we propose a different
criterion to use when looking for new candidate samples that is inspired by
MSRS, a popular procedure in the black-box optimization framework. Compared to
GLISp, GLISp-r is less likely to get stuck on local optima of the
preference-based optimization problem. We motivate this claim theoretically,
with a proof of global convergence, and empirically, by comparing the
performances of GLISp and GLISp-r on several benchmark optimization problems.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Compact Optimality Verification for Optimization Proxies [15.761737742798157]
Recent years have witnessed increasing interest in machine learning models that approximate the input-output mapping of parametric optimization problems.
The paper proposes a compact formulation for optimality verification that brings substantial computational benefits.
arXiv Detail & Related papers (2024-05-31T17:11:39Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Principled Preferential Bayesian Optimization [22.269732173306192]
We study the problem of preferential Bayesian optimization (BO)
We aim to optimize a black-box function with only preference feedback over a pair of candidate solutions.
An optimistic algorithm with an efficient computational method is then developed to solve the problem.
arXiv Detail & Related papers (2024-02-08T02:57:47Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Dual-Directed Algorithm Design for Efficient Pure Exploration [11.492736493413103]
We consider pure-exploration problems in the context of sequential adaptive experiments with a finite set of alternative options.
We derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples.
Our algorithm is optimal for $epsilon$-best-arm identification and thresholding bandit problems.
arXiv Detail & Related papers (2023-10-30T07:29:17Z) - An Empirical Evaluation of Zeroth-Order Optimization Methods on
AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives.
We show the advantages of ZO sign-based gradient descent (ZO-signGD)
We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z) - Optimizer Amalgamation [124.33523126363728]
We are motivated to study a new problem named Amalgamation: how can we best combine a pool of "teacher" amalgamations into a single "student" that can have stronger problem-specific performance?
First, we define three differentiable mechanisms to amalgamate a pool of analyticals by gradient descent.
In order to reduce variance of the process, we also explore methods to stabilize the process by perturbing the target.
arXiv Detail & Related papers (2022-03-12T16:07:57Z) - A unified surrogate-based scheme for black-box and preference-based
optimization [2.561649173827544]
We show that black-box and preference-based optimization problems are closely related and can be solved using the same family of approaches.
We propose the generalized Metric Response Surface (gMRS) algorithm, an optimization scheme that is a generalization of the popular MSRS framework.
arXiv Detail & Related papers (2022-02-03T08:47:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.