Related papers: Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit

Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit

URL: http://arxiv.org/abs/2311.14003v1
Date: Thu, 23 Nov 2023 13:38:43 GMT
Title: Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandit
Authors: Tian Huang, Ke Li
Abstract summary: We propose a method that sidesteps the need for calculating the fitness function, relying solely on human feedback. Our proposed approach entails conducting direct preference learning facilitated by an active dueling bandit algorithm. This research presents a novel interactive preference-based MOEA framework that not only addresses the limitations of traditional techniques but also unveils new possibilities for optimization problems.
Score: 6.434590883720791
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimization problems find widespread use in both single-objective and multi-objective scenarios. In practical applications, users aspire for solutions that converge to the region of interest (ROI) along the Pareto front (PF). While the conventional approach involves approximating a fitness function or an objective function to reflect user preferences, this paper explores an alternative avenue. Specifically, we aim to discover a method that sidesteps the need for calculating the fitness function, relying solely on human feedback. Our proposed approach entails conducting direct preference learning facilitated by an active dueling bandit algorithm. The experimental phase is structured into three sessions. Firstly, we assess the performance of our active dueling bandit algorithm. Secondly, we implement our proposed method within the context of Multi-objective Evolutionary Algorithms (MOEAs). Finally, we deploy our method in a practical problem, specifically in protein structure prediction (PSP). This research presents a novel interactive preference-based MOEA framework that not only addresses the limitations of traditional techniques but also unveils new possibilities for optimization problems.

Related papers

Online Decision-Focused Learning [63.83903681295497]
Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks.<n>We investigate DFL in dynamic environments where the objective function does not evolve over time.<n>We establish bounds on the expected dynamic regret, both when decision space is a simplex and when it is a general bounded convex polytope.
arXiv Detail & Related papers (2025-05-19T10:40:30Z)
Preference-Guided Diffusion for Multi-Objective Offline Optimization [64.08326521234228]
We propose a preference-guided diffusion model for offline multi-objective optimization. Our guidance is a preference model trained to predict the probability that one design dominates another. Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions.
arXiv Detail & Related papers (2025-03-21T16:49:38Z)
Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives. We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z)
Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization [24.533662423325943]
We study a preference-aware MO-MAB framework in the presence of explicit user preference. This is the first theoretical study of customized MO-MAB optimization with explicit user preferences.
arXiv Detail & Related papers (2025-02-19T06:06:13Z)
Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z)
UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z)
Analyzing and Overcoming Local Optima in Complex Multi-Objective Optimization by Decomposition-Based Evolutionary Algorithms [5.153202024713228]
Multi-objective Evolutionary Algorithms (MOEADs) often converge to local optima, limiting solution diversity. We introduce an innovative RP selection strategy, the Vector-Guided Weight-Hybrid method, designed to overcome the local optima issue. Our research comprises two main experimental components: an ablation involving 14 algorithms within the MOEADs framework from 2014 to 2022 to validate our theoretical framework, and a series empirical tests to evaluate the effectiveness of our proposed method against both traditional and cutting-edge alternatives.
arXiv Detail & Related papers (2024-04-12T14:29:45Z)
Sample Efficient Preference Alignment in LLMs via Active Exploration [63.84454768573154]
We take advantage of the fact that one can often choose contexts at which to obtain human feedback to most efficiently identify a good policy. We propose an active exploration algorithm to efficiently select the data and provide theoretical proof that it has a worst-case regret bound. Our method outperforms the baselines with limited samples of human preferences on several language models and four real-world datasets.
arXiv Detail & Related papers (2023-12-01T00:54:02Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
Preference Inference from Demonstration in Multi-objective Multi-agent Decision Making [0.0]
We propose an algorithm to infer linear preference weights from either optimal or near-optimal demonstrations. Empirical results demonstrate significant improvements compared to the baseline algorithms. In future work, we plan to evaluate the algorithm's effectiveness in a multi-agent system.
arXiv Detail & Related papers (2023-04-27T12:19:28Z)
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
Characterization of Constrained Continuous Multiobjective Optimization Problems: A Performance Space Perspective [0.0]
Constrained multiobjective optimization problems (CMOPs) are unsatisfactorily understood. The choice of adequate CMOPs for benchmarking is difficult and lacks a formal background. This paper presents a novel performance assessment approach designed explicitly for constrained multiobjective optimization.
arXiv Detail & Related papers (2023-02-04T14:12:30Z)
Pareto Set Learning for Neural Multi-objective Combinatorial Optimization [6.091096843566857]
Multiobjective optimization (MOCO) problems can be found in many real-world applications. We develop a learning-based approach to approximate the whole Pareto set for a given MOCO problem without further search procedure. Our proposed method significantly outperforms some other methods on the multiobjective traveling salesman problem, multiconditioned vehicle routing problem and multi knapsack problem in terms of solution quality, speed, and model efficiency.
arXiv Detail & Related papers (2022-03-29T09:26:22Z)
Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems. We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z)
RoMA: Robust Model Adaptation for Offline Model-based Optimization [115.02677045518692]
We consider the problem of searching an input maximizing a black-box objective function given a static dataset of input-output queries. A popular approach to solving this problem is maintaining a proxy model that approximates the true objective function. Here, the main challenge is how to avoid adversarially optimized inputs during the search.
arXiv Detail & Related papers (2021-10-27T05:37:12Z)
Batched Data-Driven Evolutionary Multi-Objective Optimization Based on Manifold Interpolation [6.560512252982714]
We propose a framework for implementing batched data-driven evolutionary multi-objective optimization. It is so general that any off-the-shelf evolutionary multi-objective optimization algorithms can be applied in a plug-in manner. Our proposed framework is featured with a faster convergence and a stronger resilience to various PF shapes.
arXiv Detail & Related papers (2021-09-12T23:54:26Z)
Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives. Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process. We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.