Direct Preference-Based Evolutionary Multi-Objective Optimization with
Dueling Bandit
- URL: http://arxiv.org/abs/2311.14003v1
- Date: Thu, 23 Nov 2023 13:38:43 GMT
- Title: Direct Preference-Based Evolutionary Multi-Objective Optimization with
Dueling Bandit
- Authors: Tian Huang, Ke Li
- Abstract summary: We propose a method that sidesteps the need for calculating the fitness function, relying solely on human feedback.
Our proposed approach entails conducting direct preference learning facilitated by an active dueling bandit algorithm.
This research presents a novel interactive preference-based MOEA framework that not only addresses the limitations of traditional techniques but also unveils new possibilities for optimization problems.
- Score: 6.434590883720791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimization problems find widespread use in both single-objective and
multi-objective scenarios. In practical applications, users aspire for
solutions that converge to the region of interest (ROI) along the Pareto front
(PF). While the conventional approach involves approximating a fitness function
or an objective function to reflect user preferences, this paper explores an
alternative avenue. Specifically, we aim to discover a method that sidesteps
the need for calculating the fitness function, relying solely on human
feedback. Our proposed approach entails conducting direct preference learning
facilitated by an active dueling bandit algorithm. The experimental phase is
structured into three sessions. Firstly, we assess the performance of our
active dueling bandit algorithm. Secondly, we implement our proposed method
within the context of Multi-objective Evolutionary Algorithms (MOEAs). Finally,
we deploy our method in a practical problem, specifically in protein structure
prediction (PSP). This research presents a novel interactive preference-based
MOEA framework that not only addresses the limitations of traditional
techniques but also unveils new possibilities for optimization problems.
Related papers
- Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives.
We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z) - Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization [24.533662423325943]
We study a preference-aware MO-MAB framework in the presence of explicit user preference.
This is the first theoretical study of customized MO-MAB optimization with explicit user preferences.
arXiv Detail & Related papers (2025-02-19T06:06:13Z) - Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback.
We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Analyzing and Overcoming Local Optima in Complex Multi-Objective Optimization by Decomposition-Based Evolutionary Algorithms [5.153202024713228]
Multi-objective Evolutionary Algorithms (MOEADs) often converge to local optima, limiting solution diversity.
We introduce an innovative RP selection strategy, the Vector-Guided Weight-Hybrid method, designed to overcome the local optima issue.
Our research comprises two main experimental components: an ablation involving 14 algorithms within the MOEADs framework from 2014 to 2022 to validate our theoretical framework, and a series empirical tests to evaluate the effectiveness of our proposed method against both traditional and cutting-edge alternatives.
arXiv Detail & Related papers (2024-04-12T14:29:45Z) - Preference Inference from Demonstration in Multi-objective Multi-agent
Decision Making [0.0]
We propose an algorithm to infer linear preference weights from either optimal or near-optimal demonstrations.
Empirical results demonstrate significant improvements compared to the baseline algorithms.
In future work, we plan to evaluate the algorithm's effectiveness in a multi-agent system.
arXiv Detail & Related papers (2023-04-27T12:19:28Z) - Pareto Set Learning for Neural Multi-objective Combinatorial
Optimization [6.091096843566857]
Multiobjective optimization (MOCO) problems can be found in many real-world applications.
We develop a learning-based approach to approximate the whole Pareto set for a given MOCO problem without further search procedure.
Our proposed method significantly outperforms some other methods on the multiobjective traveling salesman problem, multiconditioned vehicle routing problem and multi knapsack problem in terms of solution quality, speed, and model efficiency.
arXiv Detail & Related papers (2022-03-29T09:26:22Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - RoMA: Robust Model Adaptation for Offline Model-based Optimization [115.02677045518692]
We consider the problem of searching an input maximizing a black-box objective function given a static dataset of input-output queries.
A popular approach to solving this problem is maintaining a proxy model that approximates the true objective function.
Here, the main challenge is how to avoid adversarially optimized inputs during the search.
arXiv Detail & Related papers (2021-10-27T05:37:12Z) - Batched Data-Driven Evolutionary Multi-Objective Optimization Based on
Manifold Interpolation [6.560512252982714]
We propose a framework for implementing batched data-driven evolutionary multi-objective optimization.
It is so general that any off-the-shelf evolutionary multi-objective optimization algorithms can be applied in a plug-in manner.
Our proposed framework is featured with a faster convergence and a stronger resilience to various PF shapes.
arXiv Detail & Related papers (2021-09-12T23:54:26Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.