Design of Experiments for Stochastic Contextual Linear Bandits
- URL: http://arxiv.org/abs/2107.09912v2
- Date: Thu, 22 Jul 2021 23:20:28 GMT
- Title: Design of Experiments for Stochastic Contextual Linear Bandits
- Authors: Andrea Zanette, Kefan Dong, Jonathan Lee, Emma Brunskill
- Abstract summary: In the linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired.
We design a single policy to collect a good dataset from which a near-optimal policy can be extracted.
We present a theoretical analysis as well as numerical experiments on both synthetic and real-world datasets.
- Score: 47.804797753836894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the stochastic linear contextual bandit setting there exist several
minimax procedures for exploration with policies that are reactive to the data
being acquired. In practice, there can be a significant engineering overhead to
deploy these algorithms, especially when the dataset is collected in a
distributed fashion or when a human in the loop is needed to implement a
different policy. Exploring with a single non-reactive policy is beneficial in
such cases. Assuming some batch contexts are available, we design a single
stochastic policy to collect a good dataset from which a near-optimal policy
can be extracted. We present a theoretical analysis as well as numerical
experiments on both synthetic and real-world datasets.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Experiment Planning with Function Approximation [49.50254688629728]
We study the problem of experiment planning with function approximation in contextual bandit problems.
We propose two experiment planning strategies compatible with function approximation.
We show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small.
arXiv Detail & Related papers (2024-01-10T14:40:23Z) - $K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic
Control [0.6906005491572401]
We propose a novel $K$-nearest neighbor reparametric procedure for estimating the performance of a policy from historical data.
Our analysis allows for the sampling of entire episodes, as is common practice in most applications.
Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization, and does not explicitly assume a parametric model for the environment's dynamics.
arXiv Detail & Related papers (2023-06-07T23:55:12Z) - Importance Weighted Actor-Critic for Optimal Conservative Offline
Reinforcement Learning [23.222448307481073]
We propose a new practical algorithm for offline reinforcement learning (RL) in complex environments with insufficient data coverage.
Our algorithm combines the marginalized importance sampling framework with the actor-critic paradigm.
We provide both theoretical analysis and experimental results to validate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2023-01-30T07:53:53Z) - SPEED: Experimental Design for Policy Evaluation in Linear
Heteroscedastic Bandits [13.02672341061555]
We study the problem of optimal data collection for policy evaluation in linear bandits.
We first formulate an optimal design for weighted least squares estimates in the heteroscedastic linear bandit setting.
We then use this formulation to derive the optimal allocation of samples per action during data collection.
arXiv Detail & Related papers (2023-01-29T04:33:13Z) - Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule.
Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded.
We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.