Related papers: LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Iterative Tasks

LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Iterative Tasks

URL: http://arxiv.org/abs/2107.04775v1
Date: Sat, 10 Jul 2021 06:46:10 GMT
Title: LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Iterative Tasks
Authors: Albert Wilcox and Ashwin Balakrishna and Brijen Thananjeyan and Joseph E. Gonzalez and Ken Goldberg
Abstract summary: Reinforcement learning algorithms have shown impressive success in exploring high-dimensional environments to learn complex, long-horizon tasks. A promising strategy for safe learning in dynamically uncertain environments is requiring that the agent can robustly return to states where task success can be guaranteed. We present Latent Space Safe Sets (LS3), which extends this strategy to iterative, long-horizon tasks with image observations.
Score: 28.287631944795823
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) algorithms have shown impressive success in exploring high-dimensional environments to learn complex, long-horizon tasks, but can often exhibit unsafe behaviors and require extensive environment interaction when exploration is unconstrained. A promising strategy for safe learning in dynamically uncertain environments is requiring that the agent can robustly return to states where task success (and therefore safety) can be guaranteed. While this approach has been successful in low-dimensions, enforcing this constraint in environments with high-dimensional state spaces, such as images, is challenging. We present Latent Space Safe Sets (LS3), which extends this strategy to iterative, long-horizon tasks with image observations by using suboptimal demonstrations and a learned dynamics model to restrict exploration to the neighborhood of a learned Safe Set where task completion is likely. We evaluate LS3 on 4 domains, including a challenging sequential pushing task in simulation and a physical cable routing task. We find that LS3 can use prior task successes to restrict exploration and learn more efficiently than prior algorithms while satisfying constraints. See https://tinyurl.com/latent-ss for code and supplementary material.

Related papers

Spatial Reasoning and Planning for Deep Embodied Agents [2.7195102129095003]
This thesis explores the development of data-driven techniques for spatial reasoning and planning tasks. It focuses on enhancing learning efficiency, interpretability, and transferability across novel scenarios.
arXiv Detail & Related papers (2024-09-28T23:05:56Z)
Image-Based Deep Reinforcement Learning with Intrinsically Motivated Stimuli: On the Execution of Complex Robotic Tasks [2.1040342571709885]
In this work, inspired by intrinsic motivation theory, we postulate that the intrinsic stimuli of novelty and surprise can assist in improving exploration in complex, sparsely rewarded environments. We introduce a novel sample-efficient method able to learn directly from pixels, an image-based extension of TD3 with an autoencoder called textitNaSA-TD3. Experiments demonstrate that NaSA-TD3 is easy to train and an efficient method for tackling complex continuous-control robotic tasks, both in simulated environments and real-world settings.
arXiv Detail & Related papers (2024-07-31T05:11:06Z)
Safe Guaranteed Exploration for Non-linear Systems [44.2908666969021]
We propose a novel safe guaranteed exploration framework using optimal control, which achieves first-of-its-kind results. Based on this framework we propose an efficient algorithm, SageMPC, SAfe Guaranteed Exploration using Model Predictive Control. We demonstrate safe efficient exploration in challenging unknown environments using SageMPC with a car model.
arXiv Detail & Related papers (2024-02-09T17:26:26Z)
Mission-driven Exploration for Accelerated Deep Reinforcement Learning with Temporal Logic Task Specifications [11.812602599752294]
We consider robots with unknown dynamics operating in environments with unknown structure. Our goal is to synthesize a control policy that maximizes the probability of satisfying an automaton-encoded task. We propose a novel DRL algorithm, which has the capability to learn control policies at a notably faster rate compared to similar methods.
arXiv Detail & Related papers (2023-11-28T18:59:58Z)
Generalizable Long-Horizon Manipulations with Large Language Models [91.740084601715]
This work introduces a framework harnessing the capabilities of Large Language Models (LLMs) to generate primitive task conditions for generalizable long-horizon manipulations. We create a challenging robotic manipulation task suite based on Pybullet for long-horizon task evaluation.
arXiv Detail & Related papers (2023-10-03T17:59:46Z)
Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z)
Skill-based Meta-Reinforcement Learning [65.31995608339962]
We devise a method that enables meta-learning on long-horizon, sparse-reward tasks. Our core idea is to leverage prior experience extracted from offline datasets during meta-learning.
arXiv Detail & Related papers (2022-04-25T17:58:19Z)
Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning [54.378444600773875]
We introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments. SFL drives exploration by estimating state-novelty and enables high-level planning by abstracting the state-space as a non-parametric landmark-based graph. We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces.
arXiv Detail & Related papers (2021-11-18T18:36:05Z)
Discovering and Exploiting Sparse Rewards in a Learned Behavior Space [0.46736439782713946]
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions. We introduce STAX, an algorithm designed to learn a behavior space on-the-fly and to explore it while efficiently optimizing any reward discovered.
arXiv Detail & Related papers (2021-11-02T22:21:11Z)
Batch Exploration with Examples for Scalable Robotic Reinforcement Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states. BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z)
Broadly-Exploring, Local-Policy Trees for Long-Horizon Task Planning [12.024736761925864]
Long-horizon planning in realistic environments requires the ability to reason over sequential tasks in high-dimensional state spaces. We present Broadly-Exploring-Local-policy Trees (BELT), a task-conditioned, model-based tree search. BELT is demonstrated experimentally to be able to plan long-horizon, sequential with a goal conditioned policy and generate plans that are robust.
arXiv Detail & Related papers (2020-10-13T15:51:24Z)
Weakly-Supervised Reinforcement Learning for Controllable Behavior [126.04932929741538]
Reinforcement learning (RL) is a powerful framework for learning to take actions to solve tasks. In many settings, an agent must winnow down the inconceivably large space of all possible tasks to the single task that it is currently being asked to solve. We introduce a framework for using weak supervision to automatically disentangle this semantically meaningful subspace of tasks from the enormous space of nonsensical "chaff" tasks.
arXiv Detail & Related papers (2020-04-06T17:50:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.