Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization
- URL: http://arxiv.org/abs/2502.13457v1
- Date: Wed, 19 Feb 2025 06:06:13 GMT
- Title: Provably Efficient Multi-Objective Bandit Algorithms under Preference-Centric Customization
- Authors: Linfeng Cao, Ming Shi, Ness B. Shroff,
- Abstract summary: We study a preference-aware MO-MAB framework in the presence of explicit user preference.
This is the first theoretical study of customized MO-MAB optimization with explicit user preferences.
- Score: 24.533662423325943
- License:
- Abstract: Multi-objective multi-armed bandit (MO-MAB) problems traditionally aim to achieve Pareto optimality. However, real-world scenarios often involve users with varying preferences across objectives, resulting in a Pareto-optimal arm that may score high for one user but perform quite poorly for another. This highlights the need for customized learning, a factor often overlooked in prior research. To address this, we study a preference-aware MO-MAB framework in the presence of explicit user preference. It shifts the focus from achieving Pareto optimality to further optimizing within the Pareto front under preference-centric customization. To our knowledge, this is the first theoretical study of customized MO-MAB optimization with explicit user preferences. Motivated by practical applications, we explore two scenarios: unknown preference and hidden preference, each presenting unique challenges for algorithm design and analysis. At the core of our algorithms are preference estimation and preference-aware optimization mechanisms to adapt to user preferences effectively. We further develop novel analytical techniques to establish near-optimal regret of the proposed algorithms. Strong empirical performance confirm the effectiveness of our approach.
Related papers
- Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives.
We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z) - Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems.
We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - mDPO: Conditional Preference Optimization for Multimodal Large Language Models [52.607764280030196]
Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment.
Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement.
We propose mDPO, a multimodal DPO objective that prevents the over-prioritization of language-only preferences by also optimizing image preference.
arXiv Detail & Related papers (2024-06-17T17:59:58Z) - MaxMin-RLHF: Alignment with Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Data-Efficient Interactive Multi-Objective Optimization Using ParEGO [6.042269506496206]
Multi-objective optimization seeks to identify a set of non-dominated solutions that provide optimal trade-offs among competing objectives.
In practical applications, decision-makers (DMs) will select a single solution that aligns with their preferences to be implemented.
We propose two novel algorithms that efficiently locate the most preferred region of the Pareto front in expensive-to-evaluate problems.
arXiv Detail & Related papers (2024-01-12T15:55:51Z) - Direct Preference-Based Evolutionary Multi-Objective Optimization with
Dueling Bandit [6.434590883720791]
We propose a method that sidesteps the need for calculating the fitness function, relying solely on human feedback.
Our proposed approach entails conducting direct preference learning facilitated by an active dueling bandit algorithm.
This research presents a novel interactive preference-based MOEA framework that not only addresses the limitations of traditional techniques but also unveils new possibilities for optimization problems.
arXiv Detail & Related papers (2023-11-23T13:38:43Z) - Multi-Objective Bayesian Optimization with Active Preference Learning [18.066263838953223]
We propose a Bayesian optimization (BO) approach to identifying the most preferred solution in a multi-objective optimization (MOO) problem.
To minimize the interaction cost with the decision maker (DM), we also propose an active learning strategy for the preference estimation.
arXiv Detail & Related papers (2023-11-22T15:24:36Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z) - Characterization of Constrained Continuous Multiobjective Optimization
Problems: A Performance Space Perspective [0.0]
Constrained multiobjective optimization problems (CMOPs) are unsatisfactorily understood.
The choice of adequate CMOPs for benchmarking is difficult and lacks a formal background.
This paper presents a novel performance assessment approach designed explicitly for constrained multiobjective optimization.
arXiv Detail & Related papers (2023-02-04T14:12:30Z) - A unified surrogate-based scheme for black-box and preference-based
optimization [2.561649173827544]
We show that black-box and preference-based optimization problems are closely related and can be solved using the same family of approaches.
We propose the generalized Metric Response Surface (gMRS) algorithm, an optimization scheme that is a generalization of the popular MSRS framework.
arXiv Detail & Related papers (2022-02-03T08:47:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.