SoftTreeMax: Policy Gradient with Tree Search
- URL: http://arxiv.org/abs/2209.13966v1
- Date: Wed, 28 Sep 2022 09:55:47 GMT
- Title: SoftTreeMax: Policy Gradient with Tree Search
- Authors: Gal Dalal, Assaf Hallak, Shie Mannor, Gal Chechik
- Abstract summary: We introduce SoftTreeMax, the first approach that integrates tree-search into policy gradient.
On Atari, SoftTreeMax demonstrates up to 5x better performance in faster run-time compared with distributed PPO.
- Score: 72.9513807133171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy-gradient methods are widely used for learning control policies. They
can be easily distributed to multiple workers and reach state-of-the-art
results in many domains. Unfortunately, they exhibit large variance and
subsequently suffer from high-sample complexity since they aggregate gradients
over entire trajectories. At the other extreme, planning methods, like tree
search, optimize the policy using single-step transitions that consider future
lookahead. These approaches have been mainly considered for value-based
algorithms. Planning-based algorithms require a forward model and are
computationally intensive at each step, but are more sample efficient. In this
work, we introduce SoftTreeMax, the first approach that integrates tree-search
into policy gradient. Traditionally, gradients are computed for single
state-action pairs. Instead, our tree-based policy structure leverages all
gradients at the tree leaves in each environment step. This allows us to reduce
the variance of gradients by three orders of magnitude and to benefit from
better sample complexity compared with standard policy gradient. On Atari,
SoftTreeMax demonstrates up to 5x better performance in faster run-time
compared with distributed PPO.
Related papers
- Large-Scale Multi-Robot Coverage Path Planning via Local Search [3.396452421780085]
We study graph-based Multi-Robot Coverage Path Planning (MCPP) that aims to compute coverage paths for multiple robots.
We introduce a new algorithmic framework, called LS-MCPP, which leverages a local search to operate directly on $D$.
Our experiments demonstrate the effectiveness of LS-MCPP, consistently improving the initial solution.
arXiv Detail & Related papers (2023-12-17T19:14:07Z) - SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree
Search [68.66904039405871]
We introduce SoftTreeMax, a generalization of softmax that takes planning into account.
We show for the first time the role of a tree expansion policy in mitigating this variance.
Our differentiable tree-based policy leverages all gradients at the tree leaves in each environment step instead of the traditional single-sample-based gradient.
arXiv Detail & Related papers (2023-01-30T19:03:14Z) - Unbiased and Efficient Sampling of Dependency Trees [0.0]
Most treebanks require that every valid dependency tree has a single edge coming out of the ROOT node.
Zmigrod et al. have recently proposed algorithms for sampling with and without replacement from the single-root dependency tree distribution.
We show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact biased.
arXiv Detail & Related papers (2022-05-25T09:57:28Z) - Hierarchical Shrinkage: improving the accuracy and interpretability of
tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure.
HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques.
All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z) - Planning and Learning with Adaptive Lookahead [74.39132848733847]
Policy Iteration (PI) algorithm alternates between greedy one-step policy improvement and policy evaluation.
Recent literature shows that multi-step lookahead policy improvement leads to a better convergence rate at the expense of increased complexity per iteration.
We propose for the first time to dynamically adapt the multi-step lookahead horizon as a function of the state and of the value estimate.
arXiv Detail & Related papers (2022-01-28T20:26:55Z) - Dive into Decision Trees and Forests: A Theoretical Demonstration [0.0]
Decision trees use the strategy of "divide-and-conquer" to divide a complex problem on the dependency between input features and labels into smaller ones.
Recent advances have greatly improved their performance in computational advertising, recommender system, information retrieval, etc.
arXiv Detail & Related papers (2021-01-20T16:47:59Z) - An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs)
We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach.
Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.