Safe Deep RL in 3D Environments using Human Feedback
- URL: http://arxiv.org/abs/2201.08102v2
- Date: Fri, 21 Jan 2022 16:10:14 GMT
- Title: Safe Deep RL in 3D Environments using Human Feedback
- Authors: Matthew Rahtz, Vikrant Varma, Ramana Kumar, Zachary Kenton, Shane
Legg, Jan Leike
- Abstract summary: ReQueST aims to solve problem by learning a neural simulator of the environment from safe human trajectories.
It is yet unknown whether this approach is feasible in complex 3D environments with feedback obtained from real humans.
We show that the resulting agent exhibits an order of magnitude reduction in unsafe behaviour compared to standard reinforcement learning.
- Score: 15.038298345682556
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agents should avoid unsafe behaviour during both training and deployment.
This typically requires a simulator and a procedural specification of unsafe
behaviour. Unfortunately, a simulator is not always available, and procedurally
specifying constraints can be difficult or impossible for many real-world
tasks. A recently introduced technique, ReQueST, aims to solve this problem by
learning a neural simulator of the environment from safe human trajectories,
then using the learned simulator to efficiently learn a reward model from human
feedback. However, it is yet unknown whether this approach is feasible in
complex 3D environments with feedback obtained from real humans - whether
sufficient pixel-based neural simulator quality can be achieved, and whether
the human data requirements are viable in terms of both quantity and quality.
In this paper we answer this question in the affirmative, using ReQueST to
train an agent to perform a 3D first-person object collection task using data
entirely from human contractors. We show that the resulting agent exhibits an
order of magnitude reduction in unsafe behaviour compared to standard
reinforcement learning.
Related papers
- ReGentS: Real-World Safety-Critical Driving Scenario Generation Made Stable [88.08120417169971]
Machine learning based autonomous driving systems often face challenges with safety-critical scenarios that are rare in real-world data.
This work explores generating safety-critical driving scenarios by modifying complex real-world regular scenarios through trajectory optimization.
Our approach addresses unrealistic diverging trajectories and unavoidable collision scenarios that are not useful for training robust planner.
arXiv Detail & Related papers (2024-09-12T08:26:33Z) - OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering [55.50438181721271]
Previous method utilizing NeRF for surface rendering to recover the occluded areas requires more than one day to train and several seconds to render occluded areas.
We propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input.
arXiv Detail & Related papers (2024-04-12T13:00:06Z) - Reducing Training Demands for 3D Gait Recognition with Deep Koopman
Operator Constraints [8.382355998881879]
We introduce a new Linear Dynamical Systems (LDS) module and loss based on Koopman operator theory, which provides an unsupervised motion regularization for the periodic nature of gait.
We also show that our 3D modeling approach is much better than other 3D gait approaches in overcoming viewpoint variation under normal, bag-carrying and clothing change conditions.
arXiv Detail & Related papers (2023-08-14T21:39:33Z) - Off Environment Evaluation Using Convex Risk Minimization [0.0]
We propose a convex risk minimization algorithm to estimate the model mismatch between the simulator and the target domain.
We show that this estimator can be used along with the simulator to evaluate performance of an RL agents in the target domain.
arXiv Detail & Related papers (2021-12-21T21:31:54Z) - Pre-training of Deep RL Agents for Improved Learning under Domain
Randomization [63.09932240840656]
We show how to pre-train a perception encoder that already provides an embedding invariant to the randomization.
We demonstrate this yields consistently improved results on a randomized version of DeepMind control suite tasks and a stacking environment on arbitrary backgrounds with zero-shot transfer to a physical robot.
arXiv Detail & Related papers (2021-04-29T14:54:11Z) - Learning What To Do by Simulating the Past [76.86449554580291]
We show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done.
The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.
arXiv Detail & Related papers (2021-04-08T17:43:29Z) - Reactive Long Horizon Task Execution via Visual Skill and Precondition
Models [59.76233967614774]
We describe an approach for sim-to-real training that can accomplish unseen robotic tasks using models learned in simulation to ground components of a simple task planner.
We show an increase in success rate from 91.6% to 98% in simulation and from 10% to 80% success rate in the real-world as compared with naive baselines.
arXiv Detail & Related papers (2020-11-17T15:24:01Z) - Robust Reinforcement Learning-based Autonomous Driving Agent for
Simulation and Real World [0.0]
We present a DRL-based algorithm that is capable of performing autonomous robot control using Deep Q-Networks (DQN)
In our approach, the agent is trained in a simulated environment and it is able to navigate both in a simulated and real-world environment.
The trained agent is able to run on limited hardware resources and its performance is comparable to state-of-the-art approaches.
arXiv Detail & Related papers (2020-09-23T15:23:54Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z) - Exploring the Capabilities and Limits of 3D Monocular Object Detection
-- A Study on Simulation and Real World Data [0.0]
3D object detection based on monocular camera data is key enabler for autonomous driving.
Recent deep learning methods show promising results to recover depth information from single images.
In this paper, we evaluate the performance of a 3D object detection pipeline which is parameterizable with different depth estimation configurations.
arXiv Detail & Related papers (2020-05-15T09:05:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.