Related papers: Interactive Multi-Objective Probabilistic Preference Learning with Soft and Hard Bounds

Interactive Multi-Objective Probabilistic Preference Learning with Soft and Hard Bounds

URL: http://arxiv.org/abs/2506.21887v1
Date: Fri, 27 Jun 2025 03:44:20 GMT
Title: Interactive Multi-Objective Probabilistic Preference Learning with Soft and Hard Bounds
Authors: Edward Chen, Sang T. Truong, Natalie Dullerud, Sanmi Koyejo, Carlos Guestrin,
Abstract summary: In brachytherapy, clinicians must balance maximizing tumor coverage against strict organ dose limits.<n>Current methods often lack systematic approaches to iteratively refine these multi-faceted preference structures.<n>We present Active-MoSH, an interactive local-global framework designed for this process.
Score: 20.97190146319937
License: http://creativecommons.org/licenses/by/4.0/
Abstract: High-stakes decision-making involves navigating multiple competing objectives with expensive evaluations. For instance, in brachytherapy, clinicians must balance maximizing tumor coverage (e.g., an aspirational target or soft bound of >95% coverage) against strict organ dose limits (e.g., a non-negotiable hard bound of <601 cGy to the bladder), with each plan evaluation being resource-intensive. Selecting Pareto-optimal solutions that match implicit preferences is challenging, as exhaustive Pareto frontier exploration is computationally and cognitively prohibitive, necessitating interactive frameworks to guide users. While decision-makers (DMs) often possess domain knowledge to narrow the search via such soft-hard bounds, current methods often lack systematic approaches to iteratively refine these multi-faceted preference structures. Critically, DMs must trust their final decision, confident they haven't missed superior alternatives; this trust is paramount in high-consequence scenarios. We present Active-MoSH, an interactive local-global framework designed for this process. Its local component integrates soft-hard bounds with probabilistic preference learning, maintaining distributions over DM preferences and bounds for adaptive Pareto subset refinement. This is guided by an active sampling strategy optimizing exploration-exploitation while minimizing cognitive burden. To build DM trust, Active-MoSH's global component, T-MoSH, leverages multi-objective sensitivity analysis to identify potentially overlooked, high-value points beyond immediate feedback. We demonstrate Active-MoSH's performance benefits through diverse synthetic and real-world applications. A user study on AI-generated image selection further validates our hypotheses regarding the framework's ability to improve convergence, enhance DM trust, and provide expressive preference articulation, enabling more effective DMs.

Related papers

Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
arXiv Detail & Related papers (2025-05-29T17:56:05Z)
RBF++: Quantifying and Optimizing Reasoning Boundaries across Measurable and Unmeasurable Capabilities for Chain-of-Thought Reasoning [60.84707424369494]
Chain-of-Thought (CoT) reasoning has proven effective in enhancing large language models (LLMs) on complex tasks.<n>We introduce the Reasoning Boundary Framework++ (RBF++), a framework for evaluating and optimizing measurable boundaries of CoT capability.
arXiv Detail & Related papers (2025-05-19T16:25:55Z)
Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks [81.44256822500257]
RLHF has emerged as a predominant approach for aligning artificial intelligence systems with human preferences.<n> RLHF exhibits insufficient compliance capabilities when confronted with complex multi-instruction tasks.<n>We propose a novel Multi-level Aware Preference Learning (MAPL) framework, capable of enhancing multi-instruction capabilities.
arXiv Detail & Related papers (2025-05-19T08:33:11Z)
Uncertain Multi-Objective Recommendation via Orthogonal Meta-Learning Enhanced Bayesian Optimization [30.031396809114625]
We introduce a novel framework that categorizes RS autonomy into five distinct levels, ranging from basic rule-based accuracy-driven systems to behavior-aware, uncertain multi-objective RSs.<n>We propose an approach that dynamically identifies and optimize multiple objectives based on individual user preferences, fostering more ethical and intelligent user-centric recommendations.
arXiv Detail & Related papers (2025-02-18T08:10:09Z)
Beyond Predictions: A Participatory Framework for Multi-Stakeholder Decision-Making [3.3044728148521623]
We propose a participatory framework that reframes decision-making as a multi-stakeholder optimization problem.<n>Our modular, model-agnostic framework employs k-fold cross-validation to fine-tune user-provided prediction models.<n>A synthetic scoring mechanism aggregates user-defined preferences across multiple metrics to rank strategies.
arXiv Detail & Related papers (2025-02-12T16:27:40Z)
MoSH: Modeling Multi-Objective Tradeoffs with Soft and Hard Bounds [29.347695311801864]
We introduce a novel conceptual framework that operationalizes soft-hard functions, SHFs, which allow for the DM to intuitively impose soft and hard bounds on each objective.<n>We show that many practical problems fit within the SHF framework and provide extensive empirical validation on diverse domains.<n>Specifically, for brachytherapy, our approach returns a compact set of points with over 3% greater SHF-defined utility than the next best approach.
arXiv Detail & Related papers (2024-12-09T02:32:20Z)
Dynamic Detection of Relevant Objectives and Adaptation to Preference Drifts in Interactive Evolutionary Multi-Objective Optimization [2.4374097382908477]
We study the dynamic nature of DM preferences, which can evolve throughout the decision-making process and affect the relevance of objectives. We propose methods to discard outdated or conflicting preferences when such shifts occur. Our experimental results demonstrate that the proposed methods effectively manage evolving preferences and significantly enhance the quality and desirability of the solutions produced by the algorithm.
arXiv Detail & Related papers (2024-11-07T09:09:06Z)
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies. HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z)
Human-in-the-Loop Policy Optimization for Preference-Based Multi-Objective Reinforcement Learning [13.627087954965695]
We propose a human-in-the-loop policy optimization framework for preference-based MORL. Our method proactively learns the DM's implicit preference information without requiring any priori knowledge. We evaluate our approach against three conventional MORL algorithms and four state-of-the-art preference-based MORL algorithms.
arXiv Detail & Related papers (2024-01-04T09:17:53Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Learning Control Admissibility Models with Graph Neural Networks for Multi-Agent Navigation [9.05607520128194]
Control admissibility models (CAMs) can be easily composed and used for online inference for an arbitrary number of agents. We show that the CAM models can be trained in environments with only a few agents and be easily composed for deployment in dense environments with hundreds of agents, achieving better performance than state-of-the-art methods.
arXiv Detail & Related papers (2022-10-17T19:20:58Z)
Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.