Related papers: Learning What To Do by Simulating the Past

Learning What To Do by Simulating the Past

URL: http://arxiv.org/abs/2104.03946v1
Date: Thu, 8 Apr 2021 17:43:29 GMT
Title: Learning What To Do by Simulating the Past
Authors: David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan
Abstract summary: We show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.
Score: 76.86449554580291
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Since reward functions are hard to specify, recent work has focused on learning policies from human feedback. However, such approaches are impeded by the expense of acquiring such feedback. Recent work proposed that agents have access to a source of information that is effectively free: in any environment that humans have acted in, the state will already be optimized for human preferences, and thus an agent can extract information about what humans want from the state. Such learning is possible in principle, but requires simulating all possible past trajectories that could have led to the observed state. This is feasible in gridworlds, but how do we scale it to complex tasks? In this work, we show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.

Related papers

Rethinking Agent Design: From Top-Down Workflows to Bottom-Up Skill Evolution [34.66260172204154]
We introduce a bottom-up agent paradigm that mirrors the human learning process.<n>Agents acquire competence through a trial-and-reasoning mechanism-exploring, reflecting on outcomes, and abstracting skills over time.<n>We evaluate this paradigm in Slay the Spire and Civilization V, where agents perceive through raw visual inputs and act via mouse outputs, the same as human players.
arXiv Detail & Related papers (2025-05-23T09:38:55Z)
Robot Learning with Super-Linear Scaling [20.730206708381704]
CASHER is a pipeline for scaling up data collection and learning in simulation where the performance scales superlinearly with human effort. We show that CASHER enables fine-tuning of pre-trained policies to a target scenario using a video scan without any additional human effort.
arXiv Detail & Related papers (2024-12-02T18:12:02Z)
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction [25.36756787147331]
Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. We propose a data-driven approach to enable successful sim-to-real transfer based on a human-in-the-loop framework. We show that our approach can achieve successful sim-to-real transfer in complex and contact-rich manipulation tasks such as furniture assembly.
arXiv Detail & Related papers (2024-05-16T17:59:07Z)
Dexterous Functional Grasping [39.15442658671798]
This paper combines the best of both worlds to accomplish functional grasping for in-the-wild objects. We propose a novel application of eigengrasps to reduce the search space of RL using a small amount of human data. We find that eigengrasp action space beats baselines in simulation and outperforms hardcoded grasping in real and matches or outperforms a trained human teleoperator.
arXiv Detail & Related papers (2023-12-05T18:59:23Z)
Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL. Our approach learns from passive data by modeling intentions. Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z)
Guiding Pretraining in Reinforcement Learning with Large Language Models [133.32146904055233]
We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM, rewards an agent for achieving goals suggested by a language model. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop.
arXiv Detail & Related papers (2023-02-13T21:16:03Z)
DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality [64.51295032956118]
We train a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups.
arXiv Detail & Related papers (2022-10-25T01:51:36Z)
Human-to-Robot Imitation in the Wild [50.49660984318492]
We propose an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective. We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild.
arXiv Detail & Related papers (2022-07-19T17:59:59Z)
Safe Deep RL in 3D Environments using Human Feedback [15.038298345682556]
ReQueST aims to solve problem by learning a neural simulator of the environment from safe human trajectories. It is yet unknown whether this approach is feasible in complex 3D environments with feedback obtained from real humans. We show that the resulting agent exhibits an order of magnitude reduction in unsafe behaviour compared to standard reinforcement learning.
arXiv Detail & Related papers (2022-01-20T10:26:34Z)
Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft [1.858151490268935]
We present the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge: Learning from Human Feedback in Minecraft. Our approach uses the available human demonstration data to train an imitation learning policy for navigation. We compare this hybrid intelligence approach to both end-to-end machine learning and pure engineered solutions, which are then judged by human evaluators.
arXiv Detail & Related papers (2021-12-07T04:12:23Z)
Reactive Long Horizon Task Execution via Visual Skill and Precondition Models [59.76233967614774]
We describe an approach for sim-to-real training that can accomplish unseen robotic tasks using models learned in simulation to ground components of a simple task planner. We show an increase in success rate from 91.6% to 98% in simulation and from 10% to 80% success rate in the real-world as compared with naive baselines.
arXiv Detail & Related papers (2020-11-17T15:24:01Z)
Feature Expansive Reward Learning: Rethinking Human Input [31.413656752926208]
We introduce a new type of human input in which the person guides the robot from states where the feature being taught is highly expressed to states where it is not. We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function.
arXiv Detail & Related papers (2020-06-23T17:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.