Adaptive Combinatorial Allocation
- URL: http://arxiv.org/abs/2011.02330v1
- Date: Wed, 4 Nov 2020 15:02:59 GMT
- Title: Adaptive Combinatorial Allocation
- Authors: Maximilian Kasy and Alexander Teytelboym
- Abstract summary: We consider settings where an allocation has to be chosen repeatedly, returns are unknown but can be learned, and decisions are subject to constraints.
Our model covers two-sided and one-sided matching, even with complex constraints.
- Score: 77.86290991564829
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We consider settings where an allocation has to be chosen repeatedly, returns
are unknown but can be learned, and decisions are subject to constraints. Our
model covers two-sided and one-sided matching, even with complex constraints.
We propose an approach based on Thompson sampling. Our main result is a
prior-independent finite-sample bound on the expected regret for this
algorithm. Although the number of allocations grows exponentially in the number
of participants, the bound does not depend on this number. We illustrate the
performance of our algorithm using data on refugee resettlement in the United
States.
Related papers
- Sample size planning for conditional counterfactual mean estimation with
a K-armed randomized experiment [0.0]
We show how to determine a sufficiently large sample size for a $K$-armed randomized experiment.
Using policy trees to learn sub-groups, we evaluate our nominal guarantees on a large publicly-available randomized experiment test data set.
arXiv Detail & Related papers (2024-03-06T20:37:29Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - Root-finding Approaches for Computing Conformal Prediction Set [18.405645120971496]
Conformal prediction constructs a confidence region for an unobserved response of a feature vector based on previous identically distributed and exchangeable observations.
We exploit the fact that, emphoften, conformal prediction sets are intervals whose boundaries can be efficiently approximated by classical root-finding software.
arXiv Detail & Related papers (2021-04-14T06:41:12Z) - Batched Neural Bandits [107.5072688105936]
BatchNeuralUCB combines neural networks with optimism to address the exploration-exploitation tradeoff.
We prove that BatchNeuralUCB achieves the same regret as the fully sequential version while reducing the number of policy updates considerably.
arXiv Detail & Related papers (2021-02-25T17:36:44Z) - Regret Bound Balancing and Elimination for Model Selection in Bandits
and RL [34.15978290083909]
We propose a simple model selection approach for algorithms in bandit and reinforcement learning problems.
We prove that the total regret of this approach is bounded by the best valid candidate regret bound times a multiplicative factor.
Unlike recent efforts in model selection for linear bandits, our approach is versatile enough to also cover cases where the context information is generated by an adversarial environment.
arXiv Detail & Related papers (2020-12-24T00:53:42Z) - Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a
Differentially Private Scheme [16.1694012177079]
We study the best-arm identification problem in multi-armed bandits with, potentially private rewards.
The goal is to identify the arm with the highest quantile at a fixed, prescribed level.
We show that our algorithm is $delta$-PAC and we characterize its sample complexity.
arXiv Detail & Related papers (2020-06-11T20:23:43Z) - The Price of Incentivizing Exploration: A Characterization via Thompson
Sampling and Sample Complexity [83.81297078039836]
We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents.
We focus on the price of incentives: the loss in performance, broadly construed, incurred for the sake of incentive-compatibility.
arXiv Detail & Related papers (2020-02-03T04:58:51Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.