Related papers: Safe Deep RL in 3D Environments using Human Feedback

Safe Deep RL in 3D Environments using Human Feedback

URL: http://arxiv.org/abs/2201.08102v2
Date: Fri, 21 Jan 2022 16:10:14 GMT
Title: Safe Deep RL in 3D Environments using Human Feedback
Authors: Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane Legg, Jan Leike
Abstract summary: ReQueST aims to solve problem by learning a neural simulator of the environment from safe human trajectories. It is yet unknown whether this approach is feasible in complex 3D environments with feedback obtained from real humans. We show that the resulting agent exhibits an order of magnitude reduction in unsafe behaviour compared to standard reinforcement learning.
Score: 15.038298345682556
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Agents should avoid unsafe behaviour during both training and deployment. This typically requires a simulator and a procedural specification of unsafe behaviour. Unfortunately, a simulator is not always available, and procedurally specifying constraints can be difficult or impossible for many real-world tasks. A recently introduced technique, ReQueST, aims to solve this problem by learning a neural simulator of the environment from safe human trajectories, then using the learned simulator to efficiently learn a reward model from human feedback. However, it is yet unknown whether this approach is feasible in complex 3D environments with feedback obtained from real humans - whether sufficient pixel-based neural simulator quality can be achieved, and whether the human data requirements are viable in terms of both quantity and quality. In this paper we answer this question in the affirmative, using ReQueST to train an agent to perform a 3D first-person object collection task using data entirely from human contractors. We show that the resulting agent exhibits an order of magnitude reduction in unsafe behaviour compared to standard reinforcement learning.

Related papers

Robot Learning with Super-Linear Scaling [20.730206708381704]
CASHER is a pipeline for scaling up data collection and learning in simulation where the performance scales superlinearly with human effort. We show that CASHER enables fine-tuning of pre-trained policies to a target scenario using a video scan without any additional human effort.
arXiv Detail & Related papers (2024-12-02T18:12:02Z)
ReGentS: Real-World Safety-Critical Driving Scenario Generation Made Stable [88.08120417169971]
Machine learning based autonomous driving systems often face challenges with safety-critical scenarios that are rare in real-world data. This work explores generating safety-critical driving scenarios by modifying complex real-world regular scenarios through trajectory optimization. Our approach addresses unrealistic diverging trajectories and unavoidable collision scenarios that are not useful for training robust planner.
arXiv Detail & Related papers (2024-09-12T08:26:33Z)
OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering [55.50438181721271]
Previous method utilizing NeRF for surface rendering to recover the occluded areas requires more than one day to train and several seconds to render occluded areas. We propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input.
arXiv Detail & Related papers (2024-04-12T13:00:06Z)
Reducing Training Demands for 3D Gait Recognition with Deep Koopman Operator Constraints [8.382355998881879]
We introduce a new Linear Dynamical Systems (LDS) module and loss based on Koopman operator theory, which provides an unsupervised motion regularization for the periodic nature of gait. We also show that our 3D modeling approach is much better than other 3D gait approaches in overcoming viewpoint variation under normal, bag-carrying and clothing change conditions.
arXiv Detail & Related papers (2023-08-14T21:39:33Z)
Off Environment Evaluation Using Convex Risk Minimization [0.0]
We propose a convex risk minimization algorithm to estimate the model mismatch between the simulator and the target domain. We show that this estimator can be used along with the simulator to evaluate performance of an RL agents in the target domain.
arXiv Detail & Related papers (2021-12-21T21:31:54Z)
Pre-training of Deep RL Agents for Improved Learning under Domain Randomization [63.09932240840656]
We show how to pre-train a perception encoder that already provides an embedding invariant to the randomization. We demonstrate this yields consistently improved results on a randomized version of DeepMind control suite tasks and a stacking environment on arbitrary backgrounds with zero-shot transfer to a physical robot.
arXiv Detail & Related papers (2021-04-29T14:54:11Z)
Learning What To Do by Simulating the Past [76.86449554580291]
We show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.
arXiv Detail & Related papers (2021-04-08T17:43:29Z)
Reactive Long Horizon Task Execution via Visual Skill and Precondition Models [59.76233967614774]
We describe an approach for sim-to-real training that can accomplish unseen robotic tasks using models learned in simulation to ground components of a simple task planner. We show an increase in success rate from 91.6% to 98% in simulation and from 10% to 80% success rate in the real-world as compared with naive baselines.
arXiv Detail & Related papers (2020-11-17T15:24:01Z)
Robust Reinforcement Learning-based Autonomous Driving Agent for Simulation and Real World [0.0]
We present a DRL-based algorithm that is capable of performing autonomous robot control using Deep Q-Networks (DQN) In our approach, the agent is trained in a simulated environment and it is able to navigate both in a simulated and real-world environment. The trained agent is able to run on limited hardware resources and its performance is comparable to state-of-the-art approaches.
arXiv Detail & Related papers (2020-09-23T15:23:54Z)
Cascaded deep monocular 3D human pose estimation with evolutionary training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation. This paper proposes a novel data augmentation method that is scalable for massive amount of training data. Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z)
Exploring the Capabilities and Limits of 3D Monocular Object Detection -- A Study on Simulation and Real World Data [0.0]
3D object detection based on monocular camera data is key enabler for autonomous driving. Recent deep learning methods show promising results to recover depth information from single images. In this paper, we evaluate the performance of a 3D object detection pipeline which is parameterizable with different depth estimation configurations.
arXiv Detail & Related papers (2020-05-15T09:05:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.