Related papers: SoftTreeMax: Policy Gradient with Tree Search

SoftTreeMax: Policy Gradient with Tree Search

URL: http://arxiv.org/abs/2209.13966v1
Date: Wed, 28 Sep 2022 09:55:47 GMT
Title: SoftTreeMax: Policy Gradient with Tree Search
Authors: Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik
Abstract summary: We introduce SoftTreeMax, the first approach that integrates tree-search into policy gradient. On Atari, SoftTreeMax demonstrates up to 5x better performance in faster run-time compared with distributed PPO.
Score: 72.9513807133171
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Policy-gradient methods are widely used for learning control policies. They can be easily distributed to multiple workers and reach state-of-the-art results in many domains. Unfortunately, they exhibit large variance and subsequently suffer from high-sample complexity since they aggregate gradients over entire trajectories. At the other extreme, planning methods, like tree search, optimize the policy using single-step transitions that consider future lookahead. These approaches have been mainly considered for value-based algorithms. Planning-based algorithms require a forward model and are computationally intensive at each step, but are more sample efficient. In this work, we introduce SoftTreeMax, the first approach that integrates tree-search into policy gradient. Traditionally, gradients are computed for single state-action pairs. Instead, our tree-based policy structure leverages all gradients at the tree leaves in each environment step. This allows us to reduce the variance of gradients by three orders of magnitude and to benefit from better sample complexity compared with standard policy gradient. On Atari, SoftTreeMax demonstrates up to 5x better performance in faster run-time compared with distributed PPO.

Related papers

Fast unsupervised ground metric learning with tree-Wasserstein distance [14.235762519615175]
unsupervised ground metric learning approaches have been introduced. We propose to augment the WSV method by embedding samples and features on trees, on which we compute the tree-Wasserstein distance (TWD) We demonstrate theoretically and empirically that the algorithm converges to a better approximation of the full WSV approach than the best known alternatives, and does so with $mathcalO(n3)$ complexity.
arXiv Detail & Related papers (2024-11-11T23:21:01Z)
Large-Scale Multi-Robot Coverage Path Planning via Local Search [3.396452421780085]
We study graph-based Multi-Robot Coverage Path Planning (MCPP) that aims to compute coverage paths for multiple robots. We introduce a new algorithmic framework, called LS-MCPP, which leverages a local search to operate directly on $D$. Our experiments demonstrate the effectiveness of LS-MCPP, consistently improving the initial solution.
arXiv Detail & Related papers (2023-12-17T19:14:07Z)
SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search [68.66904039405871]
We introduce SoftTreeMax, a generalization of softmax that takes planning into account. We show for the first time the role of a tree expansion policy in mitigating this variance. Our differentiable tree-based policy leverages all gradients at the tree leaves in each environment step instead of the traditional single-sample-based gradient.
arXiv Detail & Related papers (2023-01-30T19:03:14Z)
Unbiased and Efficient Sampling of Dependency Trees [0.0]
Most treebanks require that every valid dependency tree has a single edge coming out of the ROOT node. Zmigrod et al. have recently proposed algorithms for sampling with and without replacement from the single-root dependency tree distribution. We show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact biased.
arXiv Detail & Related papers (2022-05-25T09:57:28Z)
Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure. HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques. All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z)
Planning and Learning with Adaptive Lookahead [74.39132848733847]
Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation. Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration. We propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate.
arXiv Detail & Related papers (2022-01-28T20:26:55Z)
Dive into Decision Trees and Forests: A Theoretical Demonstration [0.0]
Decision trees use the strategy of "divide-and-conquer" to divide a complex problem on the dependency between input features and labels into smaller ones. Recent advances have greatly improved their performance in computational advertising, recommender system, information retrieval, etc.
arXiv Detail & Related papers (2021-01-20T16:47:59Z)
An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs) We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach. Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z)
MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search. Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.