Interactively Learning Preference Constraints in Linear Bandits
- URL: http://arxiv.org/abs/2206.05255v1
- Date: Fri, 10 Jun 2022 17:52:58 GMT
- Title: Interactively Learning Preference Constraints in Linear Bandits
- Authors: David Lindner and Sebastian Tschiatschek and Katja Hofmann and Andreas
Krause
- Abstract summary: We study sequential decision-making with known rewards and unknown constraints.
As an application, we consider learning constraints to represent human preferences in a driving simulation.
- Score: 100.78514640066565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study sequential decision-making with known rewards and unknown
constraints, motivated by situations where the constraints represent
expensive-to-evaluate human preferences, such as safe and comfortable driving
behavior. We formalize the challenge of interactively learning about these
constraints as a novel linear bandit problem which we call constrained linear
best-arm identification. To solve this problem, we propose the Adaptive
Constraint Learning (ACOL) algorithm. We provide an instance-dependent lower
bound for constrained linear best-arm identification and show that ACOL's
sample complexity matches the lower bound in the worst-case. In the average
case, ACOL's sample complexity bound is still significantly tighter than bounds
of simpler approaches. In synthetic experiments, ACOL performs on par with an
oracle solution and outperforms a range of baselines. As an application, we
consider learning constraints to represent human preferences in a driving
simulation. ACOL is significantly more sample efficient than alternatives for
this application. Further, we find that learning preferences as constraints is
more robust to changes in the driving scenario than encoding the preferences
directly in the reward function.
Related papers
- Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints [8.784438985280094]
We study problems as pure exploration in multi-armed bandits with unknown linear constraints.
First, we propose a Lagrangian relaxation of the sample complexity lower bound for pure exploration under constraints.
Second, we leverage the Lagrangian lower bound and the properties of convex to propose two computationally efficient extensions of Track-and-Stop and Gamified Explorer, namely LATS and LAGEX.
arXiv Detail & Related papers (2024-10-24T15:26:14Z) - Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions.
We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z) - Active Learning in the Predict-then-Optimize Framework: A Margin-Based
Approach [5.371816551086118]
We develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream.
Our active learning method is the first to be directly informed by the decision error induced by the predicted parameters.
arXiv Detail & Related papers (2023-05-11T05:44:36Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Active Learning with Safety Constraints [25.258564629480063]
We investigate the complexity of learning the best safe decision in interactive environments.
We propose an adaptive experimental design-based algorithm, which we show efficiently trades off between the difficulty of showing an arm is unsafe vs suboptimal.
arXiv Detail & Related papers (2022-06-22T15:45:38Z) - Penalized Proximal Policy Optimization for Safe Reinforcement Learning [68.86485583981866]
We propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem.
P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective.
We show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.
arXiv Detail & Related papers (2022-05-24T06:15:51Z) - Constrained Model-based Reinforcement Learning with Robust Cross-Entropy
Method [30.407700996710023]
This paper studies the constrained/safe reinforcement learning problem with sparse indicator signals for constraint violations.
We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework.
The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-15T18:19:35Z) - Learning with Safety Constraints: Sample Complexity of Reinforcement
Learning for Constrained MDPs [13.922754427601491]
We characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy.
Our main finding is that compared to the best known bounds of the unconstrained regime, the sample of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints.
arXiv Detail & Related papers (2020-08-01T18:17:08Z) - Bloom Origami Assays: Practical Group Testing [90.2899558237778]
Group testing is a well-studied problem with several appealing solutions.
Recent biological studies impose practical constraints for COVID-19 that are incompatible with traditional methods.
We develop a new method combining Bloom filters with belief propagation to scale to larger values of n (more than 100) with good empirical results.
arXiv Detail & Related papers (2020-07-21T19:31:41Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.