GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via
Stationary Distribution Correction Estimation
- URL: http://arxiv.org/abs/2312.10802v1
- Date: Sun, 17 Dec 2023 19:47:49 GMT
- Title: GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via
Stationary Distribution Correction Estimation
- Authors: Abhinav Jain, Vaibhav Unhelkar
- Abstract summary: GO-DICE is an offline IL technique for goal-conditioned long-horizon sequential tasks.
Inspired by the expansive DICE-family of techniques, policy learning at both the levels transpires within the space of stationary distributions.
Experimental results substantiate that GO-DICE outperforms recent baselines.
- Score: 1.4703485217797363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline imitation learning (IL) refers to learning expert behavior solely
from demonstrations, without any additional interaction with the environment.
Despite significant advances in offline IL, existing techniques find it
challenging to learn policies for long-horizon tasks and require significant
re-training when task specifications change. Towards addressing these
limitations, we present GO-DICE an offline IL technique for goal-conditioned
long-horizon sequential tasks. GO-DICE discerns a hierarchy of sub-tasks from
demonstrations and uses these to learn separate policies for sub-task
transitions and action execution, respectively; this hierarchical policy
learning facilitates long-horizon reasoning. Inspired by the expansive
DICE-family of techniques, policy learning at both the levels transpires within
the space of stationary distributions. Further, both policies are learnt with
goal conditioning to minimize need for retraining when task goals change.
Experimental results substantiate that GO-DICE outperforms recent baselines, as
evidenced by a marked improvement in the completion rate of increasingly
challenging pick-and-place Mujoco robotic tasks. GO-DICE is also capable of
leveraging imperfect demonstration and partial task segmentation when
available, both of which boost task performance relative to learning from
expert demonstrations alone.
Related papers
- Hierarchical Orchestra of Policies [1.6574413179773757]
HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks.
HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous.
Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks.
arXiv Detail & Related papers (2024-11-05T11:13:09Z) - Universal Visual Decomposer: Long-Horizon Manipulation Made Easy [54.93745986073738]
Real-world robotic tasks stretch over extended horizons and encompass multiple stages.
Prior task decomposition methods require task-specific knowledge, are computationally intensive, and cannot readily be applied to new tasks.
We propose Universal Visual Decomposer (UVD), an off-the-shelf task decomposition method for visual long horizon manipulation.
We extensively evaluate UVD on both simulation and real-world tasks, and in all cases, UVD substantially outperforms baselines across imitation and reinforcement learning settings.
arXiv Detail & Related papers (2023-10-12T17:59:41Z) - An Offline Time-aware Apprenticeship Learning Framework for Evolving
Reward Functions [19.63724590121946]
Apprenticeship learning (AL) is a process of inducing effective decision-making policies via observing and imitating experts' demonstrations.
Most existing AL approaches are not designed to cope with the evolving reward functions commonly found in human-centric tasks such as healthcare.
We propose an offline Time-aware Hierarchical EM Energy-based Sub-trajectory (THEMES) AL framework to tackle the evolving reward functions in such tasks.
arXiv Detail & Related papers (2023-05-15T23:51:07Z) - Automaton-Guided Curriculum Generation for Reinforcement Learning Agents [14.20447398253189]
Automaton-guided Curriculum Learning (AGCL) is a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs)
AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP representation to generate a curriculum as a DAG.
Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance.
arXiv Detail & Related papers (2023-04-11T15:14:31Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks.
Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z) - Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks.
We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
arXiv Detail & Related papers (2021-07-19T15:56:01Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.