Preference Construction: A Bayesian Interactive Preference Elicitation Framework Based on Monte Carlo Tree Search
- URL: http://arxiv.org/abs/2503.15150v1
- Date: Wed, 19 Mar 2025 12:16:54 GMT
- Title: Preference Construction: A Bayesian Interactive Preference Elicitation Framework Based on Monte Carlo Tree Search
- Authors: Yan Wang, Jiapeng Liu, Milosz KadziĆski, Xiuwu Liao,
- Abstract summary: We present a novel preference learning framework to capture participant preferences efficiently within limited interaction rounds.<n>First, we develop a variational Bayesian approach to infer the participant's preference model.<n>Second, we propose an adaptive questioning policy that maximizes cumulative uncertainty reduction.<n>Third, we apply the framework to Multiple Criteria Decision Aiding, with pairwise comparison as the preference information.
- Score: 6.473114631834851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel preference learning framework to capture participant preferences efficiently within limited interaction rounds. It involves three main contributions. First, we develop a variational Bayesian approach to infer the participant's preference model by estimating posterior distributions and managing uncertainty from limited information. Second, we propose an adaptive questioning policy that maximizes cumulative uncertainty reduction, formulating questioning as a finite Markov decision process and using Monte Carlo Tree Search to prioritize promising question trajectories. By considering long-term effects and leveraging the efficiency of the Bayesian approach, the policy avoids shortsightedness. Third, we apply the framework to Multiple Criteria Decision Aiding, with pairwise comparison as the preference information and an additive value function as the preference model. We integrate the reparameterization trick to address high-variance issues, enhancing robustness and efficiency. Computational studies on real-world and synthetic datasets demonstrate the framework's practical usability, outperforming baselines in capturing preferences and achieving superior uncertainty reduction within limited interactions.
Related papers
- Integrating Response Time and Attention Duration in Bayesian Preference Learning for Multiple Criteria Decision Aiding [2.9457161327910693]
We introduce a multiple criteria Bayesian preference learning framework incorporating behavioral cues for decision aiding.
The framework integrates pairwise comparisons, response time, and attention duration to deepen insights into decision-making processes.
arXiv Detail & Related papers (2025-04-21T08:01:44Z) - Epistemic Uncertainty-aware Recommendation Systems via Bayesian Deep Ensemble Learning [2.3310092106321365]
We propose an ensemble-based supermodel to generate more robust and reliable predictions.
We also introduce a new interpretable non-linear matching approach for the user and item embeddings.
arXiv Detail & Related papers (2025-04-14T23:04:35Z) - Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making [3.3044728148521623]
We propose a novel participatory framework that redefines decision-making as a multi-stakeholder optimization problem.<n>Our framework captures each actor's preferences through context-dependent reward functions.<n>We introduce a synthetic scoring mechanism that exploits user-defined preferences across multiple metrics to rank decision-making strategies.
arXiv Detail & Related papers (2025-02-12T16:27:40Z) - Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function.<n>We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z) - Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization [9.618391485742968]
Iterative preference optimization has recently become one of the de-facto training paradigms for large language models (LLMs)
We present an uncertainty-enhanced textbfPreference textbfOptimization framework to make the LLM self-evolve with reliable feedback.
Our framework substantially alleviates the noisy problem and improves the performance of iterative preference optimization.
arXiv Detail & Related papers (2024-09-17T14:05:58Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC.<n>We increase the consistency and informativeness of the pairwise preference signals through targeted modifications.<n>We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Tight Guarantees for Interactive Decision Making with the
Decision-Estimation Coefficient [51.37720227675476]
We introduce a new variant of the Decision-Estimation Coefficient, and use it to derive new lower bounds that improve upon prior work on three fronts.
We provide upper bounds on regret that scale with the same quantity, thereby closing all but one of the gaps between upper and lower bounds in Foster et al.
Our results apply to both the regret framework and PAC framework, and make use of several new analysis and algorithm design techniques that we anticipate will find broader use.
arXiv Detail & Related papers (2023-01-19T18:24:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.