Contextual Bandits with Large Action Spaces: Made Practical
- URL: http://arxiv.org/abs/2207.05836v1
- Date: Tue, 12 Jul 2022 21:01:48 GMT
- Title: Contextual Bandits with Large Action Spaces: Made Practical
- Authors: Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro
- Abstract summary: We present the first efficient, general-purpose algorithm for contextual bandits with continuous, linearly structured action spaces.
Our algorithm makes use of computational oracles for supervised learning, and (ii) optimization over the action space, and achieves sample complexity, runtime, and memory independent of the size of the action space.
- Score: 48.28690486203131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central problem in sequential decision making is to develop algorithms that
are practical and computationally efficient, yet support the use of flexible,
general-purpose models. Focusing on the contextual bandit problem, recent
progress provides provably efficient algorithms with strong empirical
performance when the number of possible alternatives ("actions") is small, but
guarantees for decision making in large, continuous action spaces have remained
elusive, leading to a significant gap between theory and practice. We present
the first efficient, general-purpose algorithm for contextual bandits with
continuous, linearly structured action spaces. Our algorithm makes use of
computational oracles for (i) supervised learning, and (ii) optimization over
the action space, and achieves sample complexity, runtime, and memory
independent of the size of the action space. In addition, it is simple and
practical. We perform a large-scale empirical evaluation, and show that our
approach typically enjoys superior performance and efficiency compared to
standard baselines.
Related papers
- Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms.
This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Practical, Provably-Correct Interactive Learning in the Realizable
Setting: The Power of True Believers [12.09273192079783]
We consider interactive learning in the realizable setting and develop a general framework to handle problems ranging from best arm identification to active classification.
We design novel computationally efficient algorithms for the realizable setting that match the minimax lower bound up to logarithmic factors.
arXiv Detail & Related papers (2021-11-09T02:33:36Z) - Outlier-Robust Sparse Estimation via Non-Convex Optimization [73.18654719887205]
We explore the connection between high-dimensional statistics and non-robust optimization in the presence of sparsity constraints.
We develop novel and simple optimization formulations for these problems.
As a corollary, we obtain that any first-order method that efficiently converges to station yields an efficient algorithm for these tasks.
arXiv Detail & Related papers (2021-09-23T17:38:24Z) - Automated Decision-based Adversarial Attacks [48.01183253407982]
We consider the practical and challenging decision-based black-box adversarial setting.
Under this setting, the attacker can only acquire the final classification labels by querying the target model.
We propose to automatically discover decision-based adversarial attack algorithms.
arXiv Detail & Related papers (2021-05-09T13:15:10Z) - Structure Adaptive Algorithms for Stochastic Bandits [22.871155520200773]
We study reward maximisation in a class of structured multi-armed bandit problems.
The mean rewards of arms satisfy some given structural constraints.
We develop algorithms from instance-dependent lower-bounds using iterative saddle-point solvers.
arXiv Detail & Related papers (2020-07-02T08:59:54Z) - Adaptive Discretization for Model-Based Reinforcement Learning [10.21634042036049]
We introduce the technique of adaptive discretization to design an efficient model-based episodic reinforcement learning algorithm.
Our algorithm is based on optimistic one-step value iteration extended to maintain an adaptive discretization of the space.
arXiv Detail & Related papers (2020-07-01T19:36:46Z) - Efficient Contextual Bandits with Continuous Actions [102.64518426624535]
We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure.
Our reduction-style algorithm composes with most supervised learning representations.
arXiv Detail & Related papers (2020-06-10T19:38:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.