Related papers: Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft

Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft

URL: http://arxiv.org/abs/2112.03482v1
Date: Tue, 7 Dec 2021 04:12:23 GMT
Title: Combining Learning from Human Feedback and Knowledge Engineering to Solve Hierarchical Tasks in Minecraft
Authors: Vinicius G. Goecks, Nicholas Waytowich, David Watkins, Bharat Prakash
Abstract summary: We present the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge: Learning from Human Feedback in Minecraft. Our approach uses the available human demonstration data to train an imitation learning policy for navigation. We compare this hybrid intelligence approach to both end-to-end machine learning and pure engineered solutions, which are then judged by human evaluators.
Score: 1.858151490268935
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-world tasks of interest are generally poorly defined by human-readable descriptions and have no pre-defined reward signals unless it is defined by a human designer. Conversely, data-driven algorithms are often designed to solve a specific, narrowly defined, task with performance metrics that drives the agent's learning. In this work, we present the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge: Learning from Human Feedback in Minecraft, which challenged participants to use human data to solve four tasks defined only by a natural language description and no reward function. Our approach uses the available human demonstration data to train an imitation learning policy for navigation and additional human feedback to train an image classifier. These modules, together with an estimated odometry map, are then combined into a state-machine designed based on human knowledge of the tasks that breaks them down in a natural hierarchy and controls which macro behavior the learning agent should follow at any instant. We compare this hybrid intelligence approach to both end-to-end machine learning and pure engineered solutions, which are then judged by human evaluators. Codebase is available at https://github.com/viniciusguigo/kairos_minerl_basalt.

Related papers

Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z)
HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining [75.1086193340286]
It is desirable to have a general pretrain model for versatile human-centric downstream tasks. We propose a textbfHumanBench based on existing datasets to evaluate on the common ground the generalization abilities of different pretraining methods. Our PATH achieves new state-of-the-art results on 17 downstream datasets and on-par results on the other 2 datasets.
arXiv Detail & Related papers (2023-03-10T02:57:07Z)
Few-Shot Preference Learning for Human-in-the-Loop RL [13.773589150740898]
Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries. We reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot.
arXiv Detail & Related papers (2022-12-06T23:12:26Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)
Learning from humans: combining imitation and deep reinforcement learning to accomplish human-level performance on a virtual foraging task [6.263481844384228]
We develop a method to learn bio-inspired foraging policies using human data. We conduct an experiment where humans are virtually immersed in an open field foraging environment and are trained to collect the highest amount of rewards.
arXiv Detail & Related papers (2022-03-11T20:52:30Z)
HAKE: A Knowledge Engine Foundation for Human Activity Understanding [65.24064718649046]
Human activity understanding is of widespread interest in artificial intelligence and spans diverse applications like health care and behavior analysis. We propose a novel paradigm to reformulate this task in two stages: first mapping pixels to an intermediate space spanned by atomic activity primitives, then programming detected primitives with interpretable logic rules to infer semantics. Our framework, the Human Activity Knowledge Engine (HAKE), exhibits superior generalization ability and performance upon challenging benchmarks.
arXiv Detail & Related papers (2022-02-14T16:38:31Z)
Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback [82.96694147237113]
We present Skill Preferences, an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data. We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks.
arXiv Detail & Related papers (2021-08-11T18:04:08Z)
The MineRL BASALT Competition on Learning from Human Feedback [58.17897225617566]
The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. We provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline.
arXiv Detail & Related papers (2021-07-05T12:18:17Z)
Learning What To Do by Simulating the Past [76.86449554580291]
We show that by combining a learned feature encoder with learned inverse models, we can enable agents to simulate human actions backwards in time to infer what they must have done. The resulting algorithm is able to reproduce a specific skill in MuJoCo environments given a single state sampled from the optimal policy for that skill.
arXiv Detail & Related papers (2021-04-08T17:43:29Z)
Learning Human Rewards by Inferring Their Latent Intelligence Levels in Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data [18.750834997334664]
We argue that humans are bounded rational and have different intelligence levels when reasoning about others' decision-making process. We propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans' latent intelligence levels during learning.
arXiv Detail & Related papers (2021-03-07T07:48:31Z)
Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text [12.88819706338837]
Recent work has described neural-network-based agents that are trained with reinforcement learning to execute language-like commands in simulated worlds. We propose a conceptually simple method for training instruction-following agents with deep RL that are robust to natural human instructions.
arXiv Detail & Related papers (2020-05-19T12:16:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.