Related papers: Experiment Planning with Function Approximation

Experiment Planning with Function Approximation

URL: http://arxiv.org/abs/2401.05193v1
Date: Wed, 10 Jan 2024 14:40:23 GMT
Title: Experiment Planning with Function Approximation
Authors: Aldo Pacchiano, Jonathan N. Lee, Emma Brunskill
Abstract summary: We study the problem of experiment planning with function approximation in contextual bandit problems. We propose two experiment planning strategies compatible with function approximation. We show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small.
Score: 49.50254688629728
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of experiment planning with function approximation in contextual bandit problems. In settings where there is a significant overhead to deploying adaptive algorithms -- for example, when the execution of the data collection policies is required to be distributed, or a human in the loop is needed to implement these policies -- producing in advance a set of policies for data collection is paramount. We study the setting where a large dataset of contexts but not rewards is available and may be used by the learner to design an effective data collection strategy. Although when rewards are linear this problem has been well studied, results are still missing for more complex reward models. In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection.

Related papers

Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization [66.67988187816185]
We aim to emphscale up the number of on-policy samples via repeated random sampling to improve alignment performance. Our experiments reveal that this strategy leads to a emphdecline in performance as the sample size increases. We introduce a scalable preference data construction strategy that consistently enhances model performance as the sample scale increases.
arXiv Detail & Related papers (2025-02-24T04:22:57Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning [21.9468085255912]
We design and implement an adaptive experiment (a contextual bandit'') to learn a targeted treatment assignment policy. The goal is to use a participant's survey responses to determine which charity to expose them to in a donation solicitation. We evaluate alternative experimental designs by collecting pilot data and then conducting a simulation study.
arXiv Detail & Related papers (2022-11-22T04:44:17Z)
Adaptive Sampling Strategies to Construct Equitable Training Datasets [0.7036032466145111]
In domains ranging from computer vision to natural language processing, machine learning models have been shown to exhibit stark disparities. One factor contributing to these performance gaps is a lack of representation in the data the models are trained on. We formalize the problem of creating equitable training datasets, and propose a statistical framework for addressing this problem.
arXiv Detail & Related papers (2022-01-31T19:19:30Z)
Model Selection in Batch Policy Optimization [88.52887493684078]
We study the problem of model selection in batch policy optimization. We identify three sources of error that any model selection algorithm should optimally trade-off in order to be competitive.
arXiv Detail & Related papers (2021-12-23T02:31:50Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
Policy Learning with Adaptively Collected Data [22.839095992238537]
We address the challenge of learning the optimal policy with adaptively collected data. We propose an algorithm based on generalized augmented inverse propensity weighted estimators. We demonstrate our algorithm's effectiveness using both synthetic data and public benchmark datasets.
arXiv Detail & Related papers (2021-05-05T22:03:10Z)
Decomposition and Adaptive Sampling for Data-Driven Inverse Linear Optimization [12.610576072466895]
This work addresses inverse linear optimization where the goal is to infer the unknown cost vector of a linear program. We introduce a new formulation of the problem that, compared to other existing methods, allows the recovery of a less restrictive and generally more appropriate admissible set of cost estimates.
arXiv Detail & Related papers (2020-09-16T22:25:31Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
Improving Multi-Turn Response Selection Models with Complementary Last-Utterance Selection by Instance Weighting [84.9716460244444]
We consider utilizing the underlying correlation in the data resource itself to derive different kinds of supervision signals. We conduct extensive experiments in two public datasets and obtain significant improvement in both datasets.
arXiv Detail & Related papers (2020-02-18T06:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.