Adversarial Skill Chaining for Long-Horizon Robot Manipulation via
Terminal State Regularization
- URL: http://arxiv.org/abs/2111.07999v1
- Date: Mon, 15 Nov 2021 18:59:03 GMT
- Title: Adversarial Skill Chaining for Long-Horizon Robot Manipulation via
Terminal State Regularization
- Authors: Youngwoon Lee and Joseph J. Lim and Anima Anandkumar and Yuke Zhu
- Abstract summary: We propose to chain multiple policies without excessively large initial state distributions.
We evaluate our approach on two complex long-horizon manipulation tasks of furniture assembly.
Our results have shown that our method establishes the first model-free reinforcement learning algorithm to solve these tasks.
- Score: 65.09725599705493
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Skill chaining is a promising approach for synthesizing complex behaviors by
sequentially combining previously learned skills. Yet, a naive composition of
skills fails when a policy encounters a starting state never seen during its
training. For successful skill chaining, prior approaches attempt to widen the
policy's starting state distribution. However, these approaches require larger
state distributions to be covered as more policies are sequenced, and thus are
limited to short skill sequences. In this paper, we propose to chain multiple
policies without excessively large initial state distributions by regularizing
the terminal state distributions in an adversarial learning framework. We
evaluate our approach on two complex long-horizon manipulation tasks of
furniture assembly. Our results have shown that our method establishes the
first model-free reinforcement learning algorithm to solve these tasks; whereas
prior skill chaining approaches fail. The code and videos are available at
https://clvrai.com/skill-chaining
Related papers
- Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning [0.8488322025656236]
One class of methods designed to address these issues forms temporally extended actions, often called skills, from interaction data collected in the same domain.
We propose a novel approach to skill-generation with two components. First we discretize the action space through clustering, and second we leverage a tokenization technique borrowed from natural language processing to generate temporally extended actions.
arXiv Detail & Related papers (2023-09-08T17:37:05Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - Chaining Value Functions for Off-Policy Learning [22.54793586116019]
We discuss a novel family of off-policy prediction algorithms which are convergent by construction.
We prove that the proposed scheme is convergent and corresponds to an iterative decomposition of the inverse key matrix.
Empirically we evaluate the idea on challenging MDPs such as Baird's counter example and observe favourable results.
arXiv Detail & Related papers (2022-01-17T15:26:47Z) - Conjugated Discrete Distributions for Distributional Reinforcement
Learning [0.0]
We show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process.
We argue that distributional reinforcement learning lends itself to remedy this situation completely.
arXiv Detail & Related papers (2021-12-14T14:14:49Z) - Training Transition Policies via Distribution Matching for Complex Tasks [7.310043452300736]
hierarchical reinforcement learning seeks to leverage lower-level policies for simple tasks to solve complex ones.
We introduce transition policies that smoothly connect lower-level policies by producing a distribution of states and actions that matches what is expected by the next policy.
We show that it smoothly connects the lower-level policies, achieving higher success rates than previous methods.
arXiv Detail & Related papers (2021-10-08T19:57:37Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Same State, Different Task: Continual Reinforcement Learning without
Interference [21.560701568064864]
Key challenge in Continual Learning (CL) is catastrophic forgetting, which arises when performance on a previously mastered task is reduced when learning a new task.
We show that existing CL methods based on single neural network predictors with shared replay buffers fail in the presence of interference.
We propose a simple method, OWL, to address this challenge. OWL learns a factorized policy, using shared feature extraction layers, but separate heads, each specializing on a new task.
arXiv Detail & Related papers (2021-06-05T17:55:10Z) - State Augmented Constrained Reinforcement Learning: Overcoming the
Limitations of Learning with Rewards [88.30521204048551]
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds.
We show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards.
This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods.
arXiv Detail & Related papers (2021-02-23T21:07:35Z) - Breaking the Deadly Triad with a Target Network [80.82586530205776]
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.
We provide the first convergent linear $Q$-learning algorithms under nonrestrictive and changing behavior policies without bi-level optimization.
arXiv Detail & Related papers (2021-01-21T21:50:10Z) - DDPG++: Striving for Simplicity in Continuous-control Off-Policy
Reinforcement Learning [95.60782037764928]
We show that simple Deterministic Policy Gradient works remarkably well as long as the overestimation bias is controlled.
Second, we pinpoint training instabilities, typical of off-policy algorithms, to the greedy policy update step.
Third, we show that ideas in the propensity estimation literature can be used to importance-sample transitions from replay buffer and update policy to prevent deterioration of performance.
arXiv Detail & Related papers (2020-06-26T20:21:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.