Related papers: GLISp-r: A preference-based optimization algorithm with convergence guarantees

GLISp-r: A preference-based optimization algorithm with convergence guarantees

URL: http://arxiv.org/abs/2202.01125v2
Date: Mon, 2 Oct 2023 08:39:18 GMT
Title: GLISp-r: A preference-based optimization algorithm with convergence guarantees
Authors: Davide Previtali, Mirko Mazzoleni, Antonio Ferramosca, Fabio Previdi
Abstract summary: We propose an extension of a preference-based optimization procedure called GLISp-r. In GLISp-r, we propose a different criterion to use when looking for new candidate samples that is inspired by MSRS. Compared to GLISp, GLISp-r is less likely to get stuck on local optima of the preference-based optimization problem.
Score: 2.517173388598129
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Preference-based optimization algorithms are iterative procedures that seek the optimal calibration of a decision vector based only on comparisons between couples of different tunings. At each iteration, a human decision-maker expresses a preference between two calibrations (samples), highlighting which one, if any, is better than the other. The optimization procedure must use the observed preferences to find the tuning of the decision vector that is most preferred by the decision-maker, while also minimizing the number of comparisons. In this work, we formulate the preference-based optimization problem from a utility theory perspective. Then, we propose GLISp-r, an extension of a recent preference-based optimization procedure called GLISp. The latter uses a Radial Basis Function surrogate to describe the tastes of the decision-maker. Iteratively, GLISp proposes new samples to compare with the best calibration available by trading off exploitation of the surrogate model and exploration of the decision space. In GLISp-r, we propose a different criterion to use when looking for new candidate samples that is inspired by MSRS, a popular procedure in the black-box optimization framework. Compared to GLISp, GLISp-r is less likely to get stuck on local optima of the preference-based optimization problem. We motivate this claim theoretically, with a proof of global convergence, and empirically, by comparing the performances of GLISp and GLISp-r on several benchmark optimization problems.

Related papers

Vector Optimization with Gaussian Process Bandits [7.049738935364297]
Learning problems in which multiple objectives must be considered simultaneously often arise in various fields, including engineering, drug design, and environmental management. Traditional methods for dealing with multiple black-box objective functions have limitations in incorporating objective preferences and exploring the solution space accordingly. We propose Vector Optimization with Gaussian Process (VOGP), a probably approximately correct adaptive elimination algorithm that performs black-box vector optimization using Gaussian process bandits.
arXiv Detail & Related papers (2024-12-03T14:47:46Z)
An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
Compact Optimality Verification for Optimization Proxies [15.761737742798157]
Recent years have witnessed increasing interest in machine learning models that approximate the input-output mapping of parametric optimization problems. The paper proposes a compact formulation for optimality verification that brings substantial computational benefits.
arXiv Detail & Related papers (2024-05-31T17:11:39Z)
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences. To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model. Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z)
Principled Preferential Bayesian Optimization [22.269732173306192]
We study the problem of preferential Bayesian optimization (BO) We aim to optimize a black-box function with only preference feedback over a pair of candidate solutions. An optimistic algorithm with an efficient computational method is then developed to solve the problem.
arXiv Detail & Related papers (2024-02-08T02:57:47Z)
Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data. We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice. We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z)
Dual-Directed Algorithm Design for Efficient Pure Exploration [11.492736493413103]
We consider pure-exploration problems in the context of sequential adaptive experiments with a finite set of alternative options. We derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples. Our algorithm is optimal for $epsilon$-best-arm identification and thresholding bandit problems.
arXiv Detail & Related papers (2023-10-30T07:29:17Z)
An Empirical Evaluation of Zeroth-Order Optimization Methods on AI-driven Molecule Optimization [78.36413169647408]
We study the effectiveness of various ZO optimization methods for optimizing molecular objectives. We show the advantages of ZO sign-based gradient descent (ZO-signGD) We demonstrate the potential effectiveness of ZO optimization methods on widely used benchmark tasks from the Guacamol suite.
arXiv Detail & Related papers (2022-10-27T01:58:10Z)
Optimizer Amalgamation [124.33523126363728]
We are motivated to study a new problem named Amalgamation: how can we best combine a pool of "teacher" amalgamations into a single "student" that can have stronger problem-specific performance? First, we define three differentiable mechanisms to amalgamate a pool of analyticals by gradient descent. In order to reduce variance of the process, we also explore methods to stabilize the process by perturbing the target.
arXiv Detail & Related papers (2022-03-12T16:07:57Z)
A unified surrogate-based scheme for black-box and preference-based optimization [2.561649173827544]
We show that black-box and preference-based optimization problems are closely related and can be solved using the same family of approaches. We propose the generalized Metric Response Surface (gMRS) algorithm, an optimization scheme that is a generalization of the popular MSRS framework.
arXiv Detail & Related papers (2022-02-03T08:47:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.