Benchmarking the Spectrum of Agent Capabilities
- URL: http://arxiv.org/abs/2109.06780v1
- Date: Tue, 14 Sep 2021 15:49:31 GMT
- Title: Benchmarking the Spectrum of Agent Capabilities
- Authors: Danijar Hafner
- Abstract summary: We introduce Crafter, an open world survival game with visual inputs that evaluates a wide range of general abilities within a single environment.
Agents learn from the provided reward signal or through intrinsic objectives and are evaluated by semantically meaningful achievements.
We experimentally verify that Crafter is of appropriate difficulty to drive future research and provide baselines scores of reward agents and unsupervised agents.
- Score: 7.088856621650764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluating the general abilities of intelligent agents requires complex
simulation environments. Existing benchmarks typically evaluate only one narrow
task per environment, requiring researchers to perform expensive training runs
on many different environments. We introduce Crafter, an open world survival
game with visual inputs that evaluates a wide range of general abilities within
a single environment. Agents either learn from the provided reward signal or
through intrinsic objectives and are evaluated by semantically meaningful
achievements that can be unlocked during each episode, such as discovering
resources and crafting tools. Consistently unlocking all achievements requires
strong generalization, deep exploration, and long-term reasoning. We
experimentally verify that Crafter is of appropriate difficulty to drive future
research and provide baselines scores of reward agents and unsupervised agents.
Furthermore, we observe sophisticated behaviors emerging from maximizing the
reward signal, such as building tunnel systems, bridges, houses, and
plantations. We hope that Crafter will accelerate research progress by quickly
evaluating a wide spectrum of abilities.
Related papers
- OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization [66.22117723598872]
We introduce an open-source framework designed to facilitate the development of multimodal web agent.
We first train the base model with imitation learning to gain the basic abilities.
We then let the agent explore the open web and collect feedback on its trajectories.
arXiv Detail & Related papers (2024-10-25T15:01:27Z) - Evaluating Environments Using Exploratory Agents [0.0]
We investigate the using an exploratory agent to provide feedback on the design of procedurally generated game levels.
Our study showed that our exploratory agent can clearly distinguish between engaging and unengaging levels.
arXiv Detail & Related papers (2024-09-04T11:51:26Z) - Learning to Generalize with Object-centric Agents in the Open World
Survival Game Crafter [72.80855376702746]
Reinforcement learning agents must generalize beyond their training experience.
We introduce a new set of environments suitable for evaluating some agent's ability to generalize.
We show that current agents struggle to generalize, and introduce novel object-centric agents that improve over strong baselines.
arXiv Detail & Related papers (2022-08-05T20:05:46Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Open-Ended Learning Leads to Generally Capable Agents [12.079718607356178]
We define a universe of tasks within an environment domain and demonstrate the ability to train agents that are capable across this vast space and beyond.
The resulting space is exceptionally diverse in terms of the challenges posed to agents, and as such, even measuring the learning progress of an agent is an open research problem.
We show that through constructing an open-ended learning process, which dynamically changes the training task distributions and training objectives such that the agent never stops learning, we achieve consistent learning of new behaviours.
arXiv Detail & Related papers (2021-07-27T13:30:07Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Self-Supervised Exploration via Latent Bayesian Surprise [4.088019409160893]
In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning.
We extensively evaluate our model by measuring the agent's performance in terms of environment exploration.
Our model is cheap and empirically shows state-of-the-art performance on several problems.
arXiv Detail & Related papers (2021-04-15T14:40:16Z) - Fast active learning for pure exploration in reinforcement learning [48.98199700043158]
We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon.
We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
arXiv Detail & Related papers (2020-07-27T11:28:32Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.