GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via
Stationary Distribution Correction Estimation
- URL: http://arxiv.org/abs/2312.10802v1
- Date: Sun, 17 Dec 2023 19:47:49 GMT
- Title: GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via
Stationary Distribution Correction Estimation
- Authors: Abhinav Jain, Vaibhav Unhelkar
- Abstract summary: GO-DICE is an offline IL technique for goal-conditioned long-horizon sequential tasks.
Inspired by the expansive DICE-family of techniques, policy learning at both the levels transpires within the space of stationary distributions.
Experimental results substantiate that GO-DICE outperforms recent baselines.
- Score: 1.4703485217797363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline imitation learning (IL) refers to learning expert behavior solely
from demonstrations, without any additional interaction with the environment.
Despite significant advances in offline IL, existing techniques find it
challenging to learn policies for long-horizon tasks and require significant
re-training when task specifications change. Towards addressing these
limitations, we present GO-DICE an offline IL technique for goal-conditioned
long-horizon sequential tasks. GO-DICE discerns a hierarchy of sub-tasks from
demonstrations and uses these to learn separate policies for sub-task
transitions and action execution, respectively; this hierarchical policy
learning facilitates long-horizon reasoning. Inspired by the expansive
DICE-family of techniques, policy learning at both the levels transpires within
the space of stationary distributions. Further, both policies are learnt with
goal conditioning to minimize need for retraining when task goals change.
Experimental results substantiate that GO-DICE outperforms recent baselines, as
evidenced by a marked improvement in the completion rate of increasingly
challenging pick-and-place Mujoco robotic tasks. GO-DICE is also capable of
leveraging imperfect demonstration and partial task segmentation when
available, both of which boost task performance relative to learning from
expert demonstrations alone.
Related papers
- Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation [12.243491328213217]
Reinforcement Learning (RL) based methods have been increasingly explored for robot learning.
We propose a Temporal-Logic-guided Hybrid policy framework (HyTL) which leverages three-level decision layers to improve the agent's performance.
We evaluate HyTL on four challenging manipulation tasks, which demonstrate its effectiveness and interpretability.
arXiv Detail & Related papers (2024-12-29T03:34:53Z) - Adaptformer: Sequence models as adaptive iterative planners [0.0]
Decision-making in multi-task missions is a challenging problem for autonomous systems.
We propose Adaptformer, an adaptive planner that utilizes sequence models for sample-efficient exploration and exploitation.
We show that Adaptformer outperforms the state-of-the-art method by up to 25% in multi-goal maze reachability tasks.
arXiv Detail & Related papers (2024-11-30T00:34:41Z) - Hierarchical Orchestra of Policies [1.6574413179773757]
HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks.
HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous.
Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks.
arXiv Detail & Related papers (2024-11-05T11:13:09Z) - Universal Visual Decomposer: Long-Horizon Manipulation Made Easy [54.93745986073738]
Real-world robotic tasks stretch over extended horizons and encompass multiple stages.
Prior task decomposition methods require task-specific knowledge, are computationally intensive, and cannot readily be applied to new tasks.
We propose Universal Visual Decomposer (UVD), an off-the-shelf task decomposition method for visual long horizon manipulation.
We extensively evaluate UVD on both simulation and real-world tasks, and in all cases, UVD substantially outperforms baselines across imitation and reinforcement learning settings.
arXiv Detail & Related papers (2023-10-12T17:59:41Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks.
We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
arXiv Detail & Related papers (2021-07-19T15:56:01Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.