Multi-Task Option Learning and Discovery for Stochastic Path Planning
- URL: http://arxiv.org/abs/2210.00068v1
- Date: Fri, 30 Sep 2022 19:57:52 GMT
- Title: Multi-Task Option Learning and Discovery for Stochastic Path Planning
- Authors: Naman Shah, Siddharth Srivastava
- Abstract summary: This paper addresses the problem of reliably and efficiently solving broad classes of long-horizon path planning problems.
Our approach computes useful options with policies as well as high-level paths that compose the discovered options.
We show that this approach yields strong guarantees of executability and solvability.
- Score: 27.384742641275228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the problem of reliably and efficiently solving broad
classes of long-horizon stochastic path planning problems. Starting with a
vanilla RL formulation with a stochastic dynamics simulator and an occupancy
matrix of the environment, our approach computes useful options with policies
as well as high-level paths that compose the discovered options.
Our main contributions are (1) data-driven methods for creating abstract
states that serve as endpoints for helpful options, (2) methods for computing
option policies using auto-generated option guides in the form of dense
pseudo-reward functions, and (3) an overarching algorithm for composing the
computed options. We show that this approach yields strong guarantees of
executability and solvability: under fairly general conditions, the computed
option guides lead to composable option policies and consequently ensure
downward refinability. Empirical evaluation on a range of robots, environments,
and tasks shows that this approach effectively transfers knowledge across
related tasks and that it outperforms existing approaches by a significant
margin.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - An Efficient Learning-Based Solver for Two-Stage DC Optimal Power Flow with Feasibility Guarantees [4.029937264494929]
This paper proposes a learning method to solve the two-stage problem in a more efficient and optimal way.
A technique called the gauge map is incorporated into the learning architecture design to guarantee the learned solutions' feasibility to the network constraints.
arXiv Detail & Related papers (2023-04-03T22:56:08Z) - Policy Gradient for Rectangular Robust Markov Decision Processes [62.397882389472564]
We introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs)
Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent.
arXiv Detail & Related papers (2023-01-31T12:40:50Z) - Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control
Approach [0.3093890460224435]
We address the solution of the popular Wordle puzzle, using new reinforcement learning methods.
For the Wordle puzzle, they yield on-line solution strategies that are very close to optimal at relatively modest computational cost.
arXiv Detail & Related papers (2022-11-15T03:46:41Z) - A Reinforcement Learning Approach to the Stochastic Cutting Stock
Problem [0.0]
We propose a formulation of the cutting stock problem as a discounted infinite-horizon decision process.
An optimal solution corresponds to a policy that associates each state with a decision and minimizes the expected total cost.
arXiv Detail & Related papers (2021-09-20T14:47:54Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic.
We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds.
We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z) - Data-efficient Hindsight Off-policy Option Learning [20.42535406663446]
We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning algorithm.
It robustly trains all policy components off-policy and end-to-end.
The approach outperforms existing option learning methods on common benchmarks.
arXiv Detail & Related papers (2020-07-30T16:52:33Z) - SOAC: The Soft Option Actor-Critic Architecture [25.198302636265286]
Methods have been proposed for concurrently learning low-level intra-option policies and high-level option selection policy.
Existing methods typically suffer from two major challenges: ineffective exploration and unstable updates.
We present a novel and stable off-policy approach that builds on the maximum entropy model to address these challenges.
arXiv Detail & Related papers (2020-06-25T13:06:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.