Experiment Planning with Function Approximation
- URL: http://arxiv.org/abs/2401.05193v1
- Date: Wed, 10 Jan 2024 14:40:23 GMT
- Title: Experiment Planning with Function Approximation
- Authors: Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill
- Abstract summary: We study the problem of experiment planning with function approximation in contextual bandit problems.
We propose two experiment planning strategies compatible with function approximation.
We show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small.
- Score: 49.50254688629728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of experiment planning with function approximation in
contextual bandit problems. In settings where there is a significant overhead
to deploying adaptive algorithms -- for example, when the execution of the data
collection policies is required to be distributed, or a human in the loop is
needed to implement these policies -- producing in advance a set of policies
for data collection is paramount. We study the setting where a large dataset of
contexts but not rewards is available and may be used by the learner to design
an effective data collection strategy. Although when rewards are linear this
problem has been well studied, results are still missing for more complex
reward models. In this work we propose two experiment planning strategies
compatible with function approximation. The first is an eluder planning and
sampling procedure that can recover optimality guarantees depending on the
eluder dimension of the reward function class. For the second, we show that a
uniform sampler achieves competitive optimality rates in the setting where the
number of actions is small. We finalize our results introducing a statistical
gap fleshing out the fundamental differences between planning and adaptive
learning and provide results for planning with model selection.
Related papers
- Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Contextual Bandits in a Survey Experiment on Charitable Giving:
Within-Experiment Outcomes versus Policy Learning [21.9468085255912]
We design and implement an adaptive experiment (a contextual bandit'') to learn a targeted treatment assignment policy.
The goal is to use a participant's survey responses to determine which charity to expose them to in a donation solicitation.
We evaluate alternative experimental designs by collecting pilot data and then conducting a simulation study.
arXiv Detail & Related papers (2022-11-22T04:44:17Z) - Adaptive Sampling Strategies to Construct Equitable Training Datasets [0.7036032466145111]
In domains ranging from computer vision to natural language processing, machine learning models have been shown to exhibit stark disparities.
One factor contributing to these performance gaps is a lack of representation in the data the models are trained on.
We formalize the problem of creating equitable training datasets, and propose a statistical framework for addressing this problem.
arXiv Detail & Related papers (2022-01-31T19:19:30Z) - Model Selection in Batch Policy Optimization [88.52887493684078]
We study the problem of model selection in batch policy optimization.
We identify three sources of error that any model selection algorithm should optimally trade-off in order to be competitive.
arXiv Detail & Related papers (2021-12-23T02:31:50Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Policy Learning with Adaptively Collected Data [22.839095992238537]
We address the challenge of learning the optimal policy with adaptively collected data.
We propose an algorithm based on generalized augmented inverse propensity weighted estimators.
We demonstrate our algorithm's effectiveness using both synthetic data and public benchmark datasets.
arXiv Detail & Related papers (2021-05-05T22:03:10Z) - Decomposition and Adaptive Sampling for Data-Driven Inverse Linear
Optimization [12.610576072466895]
This work addresses inverse linear optimization where the goal is to infer the unknown cost vector of a linear program.
We introduce a new formulation of the problem that, compared to other existing methods, allows the recovery of a less restrictive and generally more appropriate admissible set of cost estimates.
arXiv Detail & Related papers (2020-09-16T22:25:31Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Improving Multi-Turn Response Selection Models with Complementary
Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals.
We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.