Efficient Contextual Preferential Bayesian Optimization with Historical Examples
- URL: http://arxiv.org/abs/2208.10300v4
- Date: Tue, 30 Sep 2025 07:12:58 GMT
- Title: Efficient Contextual Preferential Bayesian Optimization with Historical Examples
- Authors: Farha A. Khan, Tanmay Chakraborty, Jörg P. Dietrich, Christian Wirth,
- Abstract summary: We propose an offline, interpretable utility learning method that uses expert knowledge, historical examples, and coarse information about the utility space to reduce sample requirements.<n>Our method outperforms standard Gaussian processes and BOPE across four domains, showing strong performance even with biased samples, as encountered in the real-world, and limited expert input.
- Score: 0.5249805590164902
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: State-of-the-art multi-objective optimization often assumes a known utility function, learns it interactively, or computes the full Pareto front-each requiring costly expert input.~Real-world problems, however, involve implicit preferences that are hard to formalize. To reduce expert involvement, we propose an offline, interpretable utility learning method that uses expert knowledge, historical examples, and coarse information about the utility space to reduce sample requirements. We model uncertainty via a full Bayesian posterior and propagate it throughout the optimization process. Our method outperforms standard Gaussian processes and BOPE across four domains, showing strong performance even with biased samples, as encountered in the real-world, and limited expert input.
Related papers
- Prompts Generalize with Low Data: Non-vacuous Generalization Bounds for Optimizing Prompts with More Informative Priors [26.84417698402442]
We argue that such widespread success can be more fully explained through more carefully considering data- or distribution-dependent perplexity.<n>We derive novel generalization bounds that are non-vacuous for data-scarce prompt optimization via more useful priors.
arXiv Detail & Related papers (2025-10-09T16:32:46Z) - FoMEMO: Towards Foundation Models for Expensive Multi-objective Optimization [19.69959362934787]
We propose a new paradigm named FoMEMO, which enables the establishment of a foundation model conditioned on any domain trajectory and user preference.<n>Rather than accessing extensive domain experiments in the real world, we demonstrate that pre-training the foundation model with a diverse set of hundreds of millions of synthetic data can lead to superior adaptability to unknown problems.
arXiv Detail & Related papers (2025-09-03T12:00:24Z) - A Novel Unified Parametric Assumption for Nonconvex Optimization [53.943470475510196]
Non optimization is central to machine learning, but the general framework non convexity enables weak convergence guarantees too pessimistic compared to the other hand.<n>We introduce a novel unified assumption in non convex algorithms.
arXiv Detail & Related papers (2025-02-17T21:25:31Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Learning Linear Utility Functions From Pairwise Comparison Queries [35.01228510505625]
We study learnability of linear utility functions from pairwise comparison queries.
We show that in the passive learning setting, linear utilities are efficiently learnable with respect to the first objective.
In this case, we show that even the second objective is efficiently learnable, and present algorithms for both the noise-free and noisy query response settings.
arXiv Detail & Related papers (2024-05-04T08:43:45Z) - Human-Algorithm Collaborative Bayesian Optimization for Engineering Systems [0.0]
We re-introduce the human back into the data-driven decision making loop by outlining an approach for collaborative Bayesian optimization.
Our methodology exploits the hypothesis that humans are more efficient at making discrete choices rather than continuous ones.
We demonstrate our approach across a number of applied and numerical case studies including bioprocess optimization and reactor geometry design.
arXiv Detail & Related papers (2024-04-16T23:17:04Z) - Enhanced Bayesian Optimization via Preferential Modeling of Abstract
Properties [49.351577714596544]
We propose a human-AI collaborative Bayesian framework to incorporate expert preferences about unmeasured abstract properties into surrogate modeling.
We provide an efficient strategy that can also handle any incorrect/misleading expert bias in preferential judgments.
arXiv Detail & Related papers (2024-02-27T09:23:13Z) - Sharing Knowledge in Multi-Task Deep Reinforcement Learning [57.38874587065694]
We study the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning.
We prove this by providing theoretical guarantees that highlight the conditions for which is convenient to share representations among tasks.
arXiv Detail & Related papers (2024-01-17T19:31:21Z) - Sample Efficient Preference Alignment in LLMs via Active Exploration [63.84454768573154]
We take advantage of the fact that one can often choose contexts at which to obtain human feedback to most efficiently identify a good policy.<n>We propose an active exploration algorithm to efficiently select the data and provide theoretical proof that it has a worst-case regret bound.<n>Our method outperforms the baselines with limited samples of human preferences on several language models and four real-world datasets.
arXiv Detail & Related papers (2023-12-01T00:54:02Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Learning to Rank for Active Learning via Multi-Task Bilevel Optimization [29.207101107965563]
We propose a novel approach for active learning, which aims to select batches of unlabeled instances through a learned surrogate model for data acquisition.
A key challenge in this approach is developing an acquisition function that generalizes well, as the history of data, which forms part of the utility function's input, grows over time.
arXiv Detail & Related papers (2023-10-25T22:50:09Z) - Leveraging Prior Knowledge in Reinforcement Learning via Double-Sided
Bounds on the Value Function [4.48890356952206]
We show how an arbitrary approximation for the value function can be used to derive double-sided bounds on the optimal value function of interest.
We extend the framework with error analysis for continuous state and action spaces.
arXiv Detail & Related papers (2023-02-19T21:47:24Z) - Network Utility Maximization with Unknown Utility Functions: A
Distributed, Data-Driven Bilevel Optimization Approach [25.47492126908931]
Existing solutions almost exclusively assume each user utility function is known and concave.
This paper seeks to answer the question: how to allocate resources when utility functions are unknown, even to the users?
We provide a new solution using a distributed and data-driven bilevel optimization approach.
arXiv Detail & Related papers (2023-01-04T19:50:39Z) - Towards Sequence Utility Maximization under Utility Occupancy Measure [53.234101208024335]
In the database, although utility is a flexible criterion for each pattern, it is a more absolute criterion due to neglect of utility sharing.
We first define utility occupancy on sequence data and raise the problem of High Utility-Occupancy Sequential Pattern Mining.
An algorithm called Sequence Utility Maximization with Utility occupancy measure (SUMU) is proposed.
arXiv Detail & Related papers (2022-12-20T17:28:53Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z) - Towards the D-Optimal Online Experiment Design for Recommender Selection [18.204325860752768]
Finding the optimal online experiment is nontrivial since both the users and displayed recommendations carry contextual features that are informative to the reward.
We leverage the emphD-optimal design from the classical statistics literature to achieve the maximum information gain during exploration.
We then use our deployment example on Walmart.com to fully illustrate the practical insights and effectiveness of the proposed methods.
arXiv Detail & Related papers (2021-10-23T04:30:27Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Incorporating Expert Prior Knowledge into Experimental Design via
Posterior Sampling [58.56638141701966]
Experimenters can often acquire the knowledge about the location of the global optimum.
It is unknown how to incorporate the expert prior knowledge about the global optimum into Bayesian optimization.
An efficient Bayesian optimization approach has been proposed via posterior sampling on the posterior distribution of the global optimum.
arXiv Detail & Related papers (2020-02-26T01:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.