The MineRL BASALT Competition on Learning from Human Feedback
- URL: http://arxiv.org/abs/2107.01969v1
- Date: Mon, 5 Jul 2021 12:18:17 GMT
- Title: The MineRL BASALT Competition on Learning from Human Feedback
- Authors: Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton,
William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay
Topin, Pieter Abbeel, Stuart Russell, Anca Dragan
- Abstract summary: The MineRL BASALT competition aims to spur forward research on this important class of techniques.
We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions.
We provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline.
- Score: 58.17897225617566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The last decade has seen a significant increase of interest in deep learning
research, with many public successes that have demonstrated its potential. As
such, these systems are now being incorporated into commercial products. With
this comes an additional challenge: how can we build AI systems that solve
tasks where there is not a crisp, well-defined specification? While multiple
solutions have been proposed, in this competition we focus on one in
particular: learning from human feedback. Rather than training AI systems using
a predefined reward function or using a labeled dataset with a predefined set
of categories, we instead train the AI system using a learning signal derived
from some form of human feedback, which can evolve over time as the
understanding of the task changes, or as the capabilities of the AI system
improve.
The MineRL BASALT competition aims to spur forward research on this important
class of techniques. We design a suite of four tasks in Minecraft for which we
expect it will be hard to write down hardcoded reward functions. These tasks
are defined by a paragraph of natural language: for example, "create a
waterfall and take a scenic picture of it", with additional clarifying details.
Participants must train a separate agent for each task, using any method they
want. Agents are then evaluated by humans who have read the task description.
To help participants get started, we provide a dataset of human demonstrations
on each of the four tasks, as well as an imitation learning baseline that
leverages these demonstrations.
Our hope is that this competition will improve our ability to build AI
systems that do what their designers intend them to do, even when the intent
cannot be easily formalized. Besides allowing AI to solve more tasks, this can
also enable more effective regulation of AI systems, as well as making progress
on the value alignment problem.
Related papers
- Not Just Novelty: A Longitudinal Study on Utility and Customization of an AI Workflow [18.15979295351043]
Generative AI brings novel and impressive abilities to help people in everyday tasks.
It is uncertain how useful generative AI are after the novelty wears off.
We conducted a three-week longitudinal study with 12 users to understand the familiarization and customization of generative AI tools for science communication.
arXiv Detail & Related papers (2024-02-15T11:39:11Z) - Exploration with Principles for Diverse AI Supervision [88.61687950039662]
Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.
While this generative AI approach has produced impressive results, it heavily leans on human supervision.
This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation.
We propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data.
arXiv Detail & Related papers (2023-10-13T07:03:39Z) - Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from
Offline Data [101.43350024175157]
Self-supervised learning has the potential to decrease the amount of human annotation and engineering effort required to learn control strategies.
Our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem.
We demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks.
arXiv Detail & Related papers (2023-06-06T01:36:56Z) - Human Decision Makings on Curriculum Reinforcement Learning with
Difficulty Adjustment [52.07473934146584]
We guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process.
Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications.
It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level.
arXiv Detail & Related papers (2022-08-04T23:53:51Z) - Building Human-like Communicative Intelligence: A Grounded Perspective [1.0152838128195465]
After making astounding progress in language learning, AI systems seem to approach the ceiling that does not reflect important aspects of human communicative capacities.
This paper suggests that the dominant cognitively-inspired AI directions, based on nativist and symbolic paradigms, lack necessary substantiation and concreteness to guide progress in modern AI.
I propose a list of concrete, implementable components for building "grounded" linguistic intelligence.
arXiv Detail & Related papers (2022-01-02T01:43:24Z) - Combining Learning from Human Feedback and Knowledge Engineering to
Solve Hierarchical Tasks in Minecraft [1.858151490268935]
We present the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge: Learning from Human Feedback in Minecraft.
Our approach uses the available human demonstration data to train an imitation learning policy for navigation.
We compare this hybrid intelligence approach to both end-to-end machine learning and pure engineered solutions, which are then judged by human evaluators.
arXiv Detail & Related papers (2021-12-07T04:12:23Z) - Actionable Models: Unsupervised Offline Reinforcement Learning of
Robotic Skills [93.12417203541948]
We propose the objective of learning a functional understanding of the environment by learning to reach any goal state in a given dataset.
We find that our method can operate on high-dimensional camera images and learn a variety of skills on real robots that generalize to previously unseen scenes and objects.
arXiv Detail & Related papers (2021-04-15T20:10:11Z) - Empowering Things with Intelligence: A Survey of the Progress,
Challenges, and Opportunities in Artificial Intelligence of Things [98.10037444792444]
We show how AI can empower the IoT to make it faster, smarter, greener, and safer.
First, we present progress in AI research for IoT from four perspectives: perceiving, learning, reasoning, and behaving.
Finally, we summarize some promising applications of AIoT that are likely to profoundly reshape our world.
arXiv Detail & Related papers (2020-11-17T13:14:28Z) - Explainability via Responsibility [0.9645196221785693]
We present an approach to explainable artificial intelligence in which certain training instances are offered to human users.
We evaluate this approach by approximating its ability to provide human users with the explanations of AI agent's actions.
arXiv Detail & Related papers (2020-10-04T20:41:03Z) - AI from concrete to abstract: demystifying artificial intelligence to
the general public [0.0]
This article presents a new methodology, AI from concrete to abstract (AIcon2abs)
The main strategy adopted by is to promote a demystification of artificial intelligence.
The simplicity of the WiSARD weightless artificial neural network model enables easy visualization and understanding of training and classification tasks.
arXiv Detail & Related papers (2020-06-07T01:14:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.