SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree
Search
- URL: http://arxiv.org/abs/2301.13236v1
- Date: Mon, 30 Jan 2023 19:03:14 GMT
- Title: SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree
Search
- Authors: Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik
- Abstract summary: We introduce SoftTreeMax, a generalization of softmax that takes planning into account.
We show for the first time the role of a tree expansion policy in mitigating this variance.
Our differentiable tree-based policy leverages all gradients at the tree leaves in each environment step instead of the traditional single-sample-based gradient.
- Score: 68.66904039405871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the popularity of policy gradient methods, they are known to suffer
from large variance and high sample complexity. To mitigate this, we introduce
SoftTreeMax -- a generalization of softmax that takes planning into account. In
SoftTreeMax, we extend the traditional logits with the multi-step discounted
cumulative reward, topped with the logits of future states. We consider two
variants of SoftTreeMax, one for cumulative reward and one for exponentiated
reward. For both, we analyze the gradient variance and reveal for the first
time the role of a tree expansion policy in mitigating this variance. We prove
that the resulting variance decays exponentially with the planning horizon as a
function of the expansion policy. Specifically, we show that the closer the
resulting state transitions are to uniform, the faster the decay. In a
practical implementation, we utilize a parallelized GPU-based simulator for
fast and efficient tree search. Our differentiable tree-based policy leverages
all gradients at the tree leaves in each environment step instead of the
traditional single-sample-based gradient. We then show in simulation how the
variance of the gradient is reduced by three orders of magnitude, leading to
better sample complexity compared to the standard policy gradient. On Atari,
SoftTreeMax demonstrates up to 5x better performance in a faster run time
compared to distributed PPO. Lastly, we demonstrate that high reward correlates
with lower variance.
Related papers
- MAPTree: Beating "Optimal" Decision Trees with Bayesian Decision Trees [2.421336072915701]
We present a Bayesian approach to decision tree induction via maximum a posteriori inference of a posterior distribution over trees.
We propose an AND/OR search algorithm, dubbed MAPTree, which is able to recover the maximum a posteriori tree.
arXiv Detail & Related papers (2023-09-26T23:43:37Z) - Bridging Discrete and Backpropagation: Straight-Through and Beyond [62.46558842476455]
We propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables.
We propose ReinMax, which achieves second-order accuracy by integrating Heun's method, a second-order numerical method for solving ODEs.
arXiv Detail & Related papers (2023-04-17T20:59:49Z) - SoftTreeMax: Policy Gradient with Tree Search [72.9513807133171]
We introduce SoftTreeMax, the first approach that integrates tree-search into policy gradient.
On Atari, SoftTreeMax demonstrates up to 5x better performance in faster run-time compared with distributed PPO.
arXiv Detail & Related papers (2022-09-28T09:55:47Z) - Social Interpretable Tree for Pedestrian Trajectory Prediction [75.81745697967608]
We propose a tree-based method, termed as Social Interpretable Tree (SIT), to address this multi-modal prediction task.
A path in the tree from the root to leaf represents an individual possible future trajectory.
Despite the hand-crafted tree, the experimental results on ETH-UCY and Stanford Drone datasets demonstrate that our method is capable of matching or exceeding the performance of state-of-the-art methods.
arXiv Detail & Related papers (2022-05-26T12:18:44Z) - Lassoed Tree Boosting [53.56229983630983]
We prove that a gradient boosted tree algorithm with early stopping faster than $n-1/4$ L2 convergence in the large nonparametric space of cadlag functions of bounded sectional variation.
Our convergence proofs are based on a novel, general theorem on early stopping with empirical loss minimizers of nested Donsker classes.
arXiv Detail & Related papers (2022-05-22T00:34:41Z) - Hierarchical Shrinkage: improving the accuracy and interpretability of
tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure.
HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques.
All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z) - A better method to enforce monotonic constraints in regression and
classification trees [0.0]
We present two new ways of enforcing monotone constraints in regression and classification trees.
One yields better results than the current LightGBM, and has a similar computation time.
The other one yields even better results, but is much slower than the current LightGBM.
arXiv Detail & Related papers (2020-11-02T14:04:21Z) - An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs)
We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach.
Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.