Related papers: GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via Stationary Distribution Correction Estimation

GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via Stationary Distribution Correction Estimation

URL: http://arxiv.org/abs/2312.10802v1
Date: Sun, 17 Dec 2023 19:47:49 GMT
Title: GO-DICE: Goal-Conditioned Option-Aware Offline Imitation Learning via Stationary Distribution Correction Estimation
Authors: Abhinav Jain, Vaibhav Unhelkar
Abstract summary: GO-DICE is an offline IL technique for goal-conditioned long-horizon sequential tasks. Inspired by the expansive DICE-family of techniques, policy learning at both the levels transpires within the space of stationary distributions. Experimental results substantiate that GO-DICE outperforms recent baselines.
Score: 1.4703485217797363
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline imitation learning (IL) refers to learning expert behavior solely from demonstrations, without any additional interaction with the environment. Despite significant advances in offline IL, existing techniques find it challenging to learn policies for long-horizon tasks and require significant re-training when task specifications change. Towards addressing these limitations, we present GO-DICE an offline IL technique for goal-conditioned long-horizon sequential tasks. GO-DICE discerns a hierarchy of sub-tasks from demonstrations and uses these to learn separate policies for sub-task transitions and action execution, respectively; this hierarchical policy learning facilitates long-horizon reasoning. Inspired by the expansive DICE-family of techniques, policy learning at both the levels transpires within the space of stationary distributions. Further, both policies are learnt with goal conditioning to minimize need for retraining when task goals change. Experimental results substantiate that GO-DICE outperforms recent baselines, as evidenced by a marked improvement in the completion rate of increasingly challenging pick-and-place Mujoco robotic tasks. GO-DICE is also capable of leveraging imperfect demonstration and partial task segmentation when available, both of which boost task performance relative to learning from expert demonstrations alone.

Related papers

Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation [12.243491328213217]
Reinforcement Learning (RL) based methods have been increasingly explored for robot learning. We propose a Temporal-Logic-guided Hybrid policy framework (HyTL) which leverages three-level decision layers to improve the agent's performance. We evaluate HyTL on four challenging manipulation tasks, which demonstrate its effectiveness and interpretability.
arXiv Detail & Related papers (2024-12-29T03:34:53Z)
Adaptformer: Sequence models as adaptive iterative planners [0.0]
Decision-making in multi-task missions is a challenging problem for autonomous systems. We propose Adaptformer, an adaptive planner that utilizes sequence models for sample-efficient exploration and exploitation. We show that Adaptformer outperforms the state-of-the-art method by up to 25% in multi-goal maze reachability tasks.
arXiv Detail & Related papers (2024-11-30T00:34:41Z)
Hierarchical Orchestra of Policies [1.6574413179773757]
HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks. HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous. Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks.
arXiv Detail & Related papers (2024-11-05T11:13:09Z)
Universal Visual Decomposer: Long-Horizon Manipulation Made Easy [54.93745986073738]
Real-world robotic tasks stretch over extended horizons and encompass multiple stages. Prior task decomposition methods require task-specific knowledge, are computationally intensive, and cannot readily be applied to new tasks. We propose Universal Visual Decomposer (UVD), an off-the-shelf task decomposition method for visual long horizon manipulation. We extensively evaluate UVD on both simulation and real-world tasks, and in all cases, UVD substantially outperforms baselines across imitation and reinforcement learning settings.
arXiv Detail & Related papers (2023-10-12T17:59:41Z)
An Offline Time-aware Apprenticeship Learning Framework for Evolving Reward Functions [19.63724590121946]
Apprenticeship learning (AL) is a process of inducing effective decision-making policies via observing and imitating experts' demonstrations. Most existing AL approaches are not designed to cope with the evolving reward functions commonly found in human-centric tasks such as healthcare. We propose an offline Time-aware Hierarchical EM Energy-based Sub-trajectory (THEMES) AL framework to tackle the evolving reward functions in such tasks.
arXiv Detail & Related papers (2023-05-15T23:51:07Z)
Automaton-Guided Curriculum Generation for Reinforcement Learning Agents [14.20447398253189]
Automaton-guided Curriculum Learning (AGCL) is a novel method for automatically generating curricula for the target task in the form of Directed Acyclic Graphs (DAGs) AGCL encodes the specification in the form of a deterministic finite automaton (DFA), and then uses the DFA along with the Object-Oriented MDP representation to generate a curriculum as a DAG. Experiments in gridworld and physics-based simulated robotics domains show that the curricula produced by AGCL achieve improved time-to-threshold performance.
arXiv Detail & Related papers (2023-04-11T15:14:31Z)
Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL. We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z)
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command. We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z)
Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks. Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z)
Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks. We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
arXiv Detail & Related papers (2021-07-19T15:56:01Z)
Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning. We propose 'Adaptive Insubordination' (ADVISOR) to address this gap. ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.