Learning What To Do by Simulating the Past
- URL: http://arxiv.org/abs/2104.03946v1
- Date: Thu, 8 Apr 2021 17:43:29 GMT
- Title: Learning What To Do by Simulating the Past
- Authors: David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan
- Abstract summary: We show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done.
The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.
- Score: 76.86449554580291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since reward functions are hard to specify, recent work has focused on
learning policies from human feedback. However, such approaches are impeded by
the expense of acquiring such feedback. Recent work proposed that agents have
access to a source of information that is effectively free: in any environment
that humans have acted in, the state will already be optimized for human
preferences, and thus an agent can extract information about what humans want
from the state. Such learning is possible in principle, but requires simulating
all possible past trajectories that could have led to the observed state. This
is feasible in gridworlds, but how do we scale it to complex tasks? In this
work, we show that by combining a learned feature encoder with learned inverse
models, we can enable agents to simulate human actions backwards in time to
infer what they must have done. The resulting algorithm is able to reproduce a
specific skill in MuJoCo environments given a single state sampled from the
optimal policy for that skill.
Related papers
- TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction [25.36756787147331]
Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots.
We propose a data-driven approach to enable successful sim-to-real transfer based on a human-in-the-loop framework.
We show that our approach can achieve successful sim-to-real transfer in complex and contact-rich manipulation tasks such as furniture assembly.
arXiv Detail & Related papers (2024-05-16T17:59:07Z) - Dexterous Functional Grasping [39.15442658671798]
This paper combines the best of both worlds to accomplish functional grasping for in-the-wild objects.
We propose a novel application of eigengrasps to reduce the search space of RL using a small amount of human data.
We find that eigengrasp action space beats baselines in simulation and outperforms hardcoded grasping in real and matches or outperforms a trained human teleoperator.
arXiv Detail & Related papers (2023-12-05T18:59:23Z) - Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL.
Our approach learns from passive data by modeling intentions.
Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z) - Guiding Pretraining in Reinforcement Learning with Large Language Models [133.32146904055233]
We describe a method that uses background knowledge from text corpora to shape exploration.
This method, called ELLM, rewards an agent for achieving goals suggested by a language model.
By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop.
arXiv Detail & Related papers (2023-02-13T21:16:03Z) - DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to
Reality [64.51295032956118]
We train a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand.
Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups.
arXiv Detail & Related papers (2022-10-25T01:51:36Z) - Human-to-Robot Imitation in the Wild [50.49660984318492]
We propose an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective.
We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild.
arXiv Detail & Related papers (2022-07-19T17:59:59Z) - Safe Deep RL in 3D Environments using Human Feedback [15.038298345682556]
ReQueST aims to solve problem by learning a neural simulator of the environment from safe human trajectories.
It is yet unknown whether this approach is feasible in complex 3D environments with feedback obtained from real humans.
We show that the resulting agent exhibits an order of magnitude reduction in unsafe behaviour compared to standard reinforcement learning.
arXiv Detail & Related papers (2022-01-20T10:26:34Z) - Combining Learning from Human Feedback and Knowledge Engineering to
Solve Hierarchical Tasks in Minecraft [1.858151490268935]
We present the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge: Learning from Human Feedback in Minecraft.
Our approach uses the available human demonstration data to train an imitation learning policy for navigation.
We compare this hybrid intelligence approach to both end-to-end machine learning and pure engineered solutions, which are then judged by human evaluators.
arXiv Detail & Related papers (2021-12-07T04:12:23Z) - Reactive Long Horizon Task Execution via Visual Skill and Precondition
Models [59.76233967614774]
We describe an approach for sim-to-real training that can accomplish unseen robotic tasks using models learned in simulation to ground components of a simple task planner.
We show an increase in success rate from 91.6% to 98% in simulation and from 10% to 80% success rate in the real-world as compared with naive baselines.
arXiv Detail & Related papers (2020-11-17T15:24:01Z) - Feature Expansive Reward Learning: Rethinking Human Input [31.413656752926208]
We introduce a new type of human input in which the person guides the robot from states where the feature being taught is highly expressed to states where it is not.
We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function.
arXiv Detail & Related papers (2020-06-23T17:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.