Boosted Off-Policy Learning
- URL: http://arxiv.org/abs/2208.01148v2
- Date: Tue, 2 May 2023 17:30:59 GMT
- Title: Boosted Off-Policy Learning
- Authors: Ben London, Levi Lu, Ted Sandler, Thorsten Joachims
- Abstract summary: We propose the first boosting algorithm for off-policy learning from logged bandit feedback.
Unlike existing boosting methods for supervised learning, our algorithm directly optimize an estimate of the policy's expected reward.
We show how to reduce the base learner to supervised learning, which opens up a broad range of readily available base learners.
- Score: 21.042970740577648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose the first boosting algorithm for off-policy learning from logged
bandit feedback. Unlike existing boosting methods for supervised learning, our
algorithm directly optimizes an estimate of the policy's expected reward. We
analyze this algorithm and prove that the excess empirical risk decreases
(possibly exponentially fast) with each round of boosting, provided a ''weak''
learning condition is satisfied by the base learner. We further show how to
reduce the base learner to supervised learning, which opens up a broad range of
readily available base learners with practical benefits, such as decision
trees. Experiments indicate that our algorithm inherits many desirable
properties of tree-based boosting algorithms (e.g., robustness to feature
scaling and hyperparameter tuning), and that it can outperform off-policy
learning with deep neural networks as well as methods that simply regress on
the observed rewards.
Related papers
- Neural Active Learning Beyond Bandits [69.99592173038903]
We study both stream-based and pool-based active learning with neural network approximations.
We propose two algorithms based on the newly designed exploitation and exploration neural networks for stream-based and pool-based active learning.
arXiv Detail & Related papers (2024-04-18T21:52:14Z) - Distributional Bellman Operators over Mean Embeddings [37.5480897544168]
We propose a novel framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions.
We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework.
arXiv Detail & Related papers (2023-12-09T11:36:14Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Emphatic Algorithms for Deep Reinforcement Learning [43.17171330951343]
Temporal difference learning algorithms can become unstable when combined with function approximation and off-policy sampling.
Emphatic temporal difference (ETD($lambda$) algorithm ensures convergence in the linear case by appropriately weighting the TD($lambda$) updates.
We show that naively adapting ETD($lambda$) to popular deep reinforcement learning algorithms, which use forward view multi-step returns, results in poor performance.
arXiv Detail & Related papers (2021-06-21T12:11:39Z) - Boosting for Online Convex Optimization [64.15578413206715]
We consider the decision-making framework of online convex optimization with a large number of experts.
We define a weak learning algorithm as a mechanism that guarantees approximate regret against a base class of experts.
We give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class.
arXiv Detail & Related papers (2021-02-18T12:30:49Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Mastering Rate based Curriculum Learning [78.45222238426246]
We argue that the notion of learning progress itself has several shortcomings that lead to a low sample efficiency for the learner.
We propose a new algorithm, based on the notion of mastering rate, that significantly outperforms learning progress-based algorithms.
arXiv Detail & Related papers (2020-08-14T16:34:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.