Efficient Contextual Bandits with Continuous Actions
- URL: http://arxiv.org/abs/2006.06040v2
- Date: Thu, 3 Dec 2020 23:22:14 GMT
- Title: Efficient Contextual Bandits with Continuous Actions
- Authors: Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy,
John Langford, Aleksandrs Slivkins
- Abstract summary: We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure.
Our reduction-style algorithm composes with most supervised learning representations.
- Score: 102.64518426624535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We create a computationally tractable algorithm for contextual bandits with
continuous actions having unknown structure. Our reduction-style algorithm
composes with most supervised learning representations. We prove that it works
in a general sense and verify the new functionality with large-scale
experiments.
Related papers
- Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits [15.342585350280535]
We study how representation learning can improve the learning efficiency of contextual bandit problems.
We present a new algorithm based on alternating projected gradient descent (GD) and minimization estimator.
arXiv Detail & Related papers (2024-10-02T22:30:29Z) - Sharing Knowledge in Multi-Task Deep Reinforcement Learning [57.38874587065694]
We study the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning.
We prove this by providing theoretical guarantees that highlight the conditions for which is convenient to share representations among tasks.
arXiv Detail & Related papers (2024-01-17T19:31:21Z) - Provably Efficient Learning in Partially Observable Contextual Bandit [4.910658441596583]
We show how causal bounds can be applied to improving classical bandit algorithms.
This research has the potential to enhance the performance of contextual bandit agents in real-world applications.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - Tree-Based Adaptive Model Learning [62.997667081978825]
We extend the Kearns-Vazirani learning algorithm to handle systems that change over time.
We present a new learning algorithm that can reuse and update previously learned behavior, implement it in the LearnLib library, and evaluate it on large examples.
arXiv Detail & Related papers (2022-08-31T21:24:22Z) - Contextual Bandits with Large Action Spaces: Made Practical [48.28690486203131]
We present the first efficient, general-purpose algorithm for contextual bandits with continuous, linearly structured action spaces.
Our algorithm makes use of computational oracles for supervised learning, and (ii) optimization over the action space, and achieves sample complexity, runtime, and memory independent of the size of the action space.
arXiv Detail & Related papers (2022-07-12T21:01:48Z) - Efficient Algorithms for Learning to Control Bandits with Unobserved
Contexts [1.370633147306388]
We present an implementable posterior sampling algorithm for bandits with imperfect context observations.
The proposed algorithm exposes efficiency in learning from the noisy imperfect observations and taking actions accordingly.
arXiv Detail & Related papers (2022-02-02T04:03:19Z) - Co$^2$L: Contrastive Continual Learning [69.46643497220586]
Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks.
We propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations.
arXiv Detail & Related papers (2021-06-28T06:14:38Z) - Provably Efficient Exploration for Reinforcement Learning Using
Unsupervised Learning [96.78504087416654]
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems, we investigate when this paradigm is provably efficient.
We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a noregret tabular RL algorithm.
arXiv Detail & Related papers (2020-03-15T19:23:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.