Direct Preference-Based Evolutionary Multi-Objective Optimization with
Dueling Bandit
- URL: http://arxiv.org/abs/2311.14003v1
- Date: Thu, 23 Nov 2023 13:38:43 GMT
- Title: Direct Preference-Based Evolutionary Multi-Objective Optimization with
Dueling Bandit
- Authors: Tian Huang, Ke Li
- Abstract summary: We propose a method that sidesteps the need for calculating the fitness function, relying solely on human feedback.
Our proposed approach entails conducting direct preference learning facilitated by an active dueling bandit algorithm.
This research presents a novel interactive preference-based MOEA framework that not only addresses the limitations of traditional techniques but also unveils new possibilities for optimization problems.
- Score: 6.434590883720791
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optimization problems find widespread use in both single-objective and
multi-objective scenarios. In practical applications, users aspire for
solutions that converge to the region of interest (ROI) along the Pareto front
(PF). While the conventional approach involves approximating a fitness function
or an objective function to reflect user preferences, this paper explores an
alternative avenue. Specifically, we aim to discover a method that sidesteps
the need for calculating the fitness function, relying solely on human
feedback. Our proposed approach entails conducting direct preference learning
facilitated by an active dueling bandit algorithm. The experimental phase is
structured into three sessions. Firstly, we assess the performance of our
active dueling bandit algorithm. Secondly, we implement our proposed method
within the context of Multi-objective Evolutionary Algorithms (MOEAs). Finally,
we deploy our method in a practical problem, specifically in protein structure
prediction (PSP). This research presents a novel interactive preference-based
MOEA framework that not only addresses the limitations of traditional
techniques but also unveils new possibilities for optimization problems.
Related papers
- Preference-Guided Diffusion for Multi-Objective Offline Optimization [64.08326521234228]
We propose a preference-guided diffusion model for offline multi-objective optimization.
Our guidance is a preference model trained to predict the probability that one design dominates another.
Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions.
arXiv Detail & Related papers (2025-03-21T16:49:38Z) - Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives.
We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z) - Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization [24.533662423325943]
We study a preference-aware MO-MAB framework in the presence of explicit user preference.
This is the first theoretical study of customized MO-MAB optimization with explicit user preferences.
arXiv Detail & Related papers (2025-02-19T06:06:13Z) - Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback.
We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Analyzing and Overcoming Local Optima in Complex Multi-Objective Optimization by Decomposition-Based Evolutionary Algorithms [5.153202024713228]
Multi-objective Evolutionary Algorithms (MOEADs) often converge to local optima, limiting solution diversity.
We introduce an innovative RP selection strategy, the Vector-Guided Weight-Hybrid method, designed to overcome the local optima issue.
Our research comprises two main experimental components: an ablation involving 14 algorithms within the MOEADs framework from 2014 to 2022 to validate our theoretical framework, and a series empirical tests to evaluate the effectiveness of our proposed method against both traditional and cutting-edge alternatives.
arXiv Detail & Related papers (2024-04-12T14:29:45Z) - Sample Efficient Preference Alignment in LLMs via Active Exploration [63.84454768573154]
We take advantage of the fact that one can often choose contexts at which to obtain human feedback to most efficiently identify a good policy.
We propose an active exploration algorithm to efficiently select the data and provide theoretical proof that it has a worst-case regret bound.
Our method outperforms the baselines with limited samples of human preferences on several language models and four real-world datasets.
arXiv Detail & Related papers (2023-12-01T00:54:02Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Preference Inference from Demonstration in Multi-objective Multi-agent
Decision Making [0.0]
We propose an algorithm to infer linear preference weights from either optimal or near-optimal demonstrations.
Empirical results demonstrate significant improvements compared to the baseline algorithms.
In future work, we plan to evaluate the algorithm's effectiveness in a multi-agent system.
arXiv Detail & Related papers (2023-04-27T12:19:28Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Characterization of Constrained Continuous Multiobjective Optimization
Problems: A Performance Space Perspective [0.0]
Constrained multiobjective optimization problems (CMOPs) are unsatisfactorily understood.
The choice of adequate CMOPs for benchmarking is difficult and lacks a formal background.
This paper presents a novel performance assessment approach designed explicitly for constrained multiobjective optimization.
arXiv Detail & Related papers (2023-02-04T14:12:30Z) - Pareto Set Learning for Neural Multi-objective Combinatorial
Optimization [6.091096843566857]
Multiobjective optimization (MOCO) problems can be found in many real-world applications.
We develop a learning-based approach to approximate the whole Pareto set for a given MOCO problem without further search procedure.
Our proposed method significantly outperforms some other methods on the multiobjective traveling salesman problem, multiconditioned vehicle routing problem and multi knapsack problem in terms of solution quality, speed, and model efficiency.
arXiv Detail & Related papers (2022-03-29T09:26:22Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - RoMA: Robust Model Adaptation for Offline Model-based Optimization [115.02677045518692]
We consider the problem of searching an input maximizing a black-box objective function given a static dataset of input-output queries.
A popular approach to solving this problem is maintaining a proxy model that approximates the true objective function.
Here, the main challenge is how to avoid adversarially optimized inputs during the search.
arXiv Detail & Related papers (2021-10-27T05:37:12Z) - Batched Data-Driven Evolutionary Multi-Objective Optimization Based on
Manifold Interpolation [6.560512252982714]
We propose a framework for implementing batched data-driven evolutionary multi-objective optimization.
It is so general that any off-the-shelf evolutionary multi-objective optimization algorithms can be applied in a plug-in manner.
Our proposed framework is featured with a faster convergence and a stronger resilience to various PF shapes.
arXiv Detail & Related papers (2021-09-12T23:54:26Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.