Learning to Generalize with Object-centric Agents in the Open World
Survival Game Crafter
- URL: http://arxiv.org/abs/2208.03374v1
- Date: Fri, 5 Aug 2022 20:05:46 GMT
- Title: Learning to Generalize with Object-centric Agents in the Open World
Survival Game Crafter
- Authors: Aleksandar Stani\'c, Yujin Tang, David Ha, J\"urgen Schmidhuber
- Abstract summary: Reinforcement learning agents must generalize beyond their training experience.
We introduce a new set of environments suitable for evaluating some agent's ability to generalize.
We show that current agents struggle to generalize, and introduce novel object-centric agents that improve over strong baselines.
- Score: 72.80855376702746
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning agents must generalize beyond their training
experience. Prior work has focused mostly on identical training and evaluation
environments. Starting from the recently introduced Crafter benchmark, a 2D
open world survival game, we introduce a new set of environments suitable for
evaluating some agent's ability to generalize on previously unseen (numbers of)
objects and to adapt quickly (meta-learning). In Crafter, the agents are
evaluated by the number of unlocked achievements (such as collecting resources)
when trained for 1M steps. We show that current agents struggle to generalize,
and introduce novel object-centric agents that improve over strong baselines.
We also provide critical insights of general interest for future work on
Crafter through several experiments. We show that careful hyper-parameter
tuning improves the PPO baseline agent by a large margin and that even
feedforward agents can unlock almost all achievements by relying on the
inventory display. We achieve new state-of-the-art performance on the original
Crafter environment. Additionally, when trained beyond 1M steps, our tuned
agents can unlock almost all achievements. We show that the recurrent PPO
agents improve over feedforward ones, even with the inventory information
removed. We introduce CrafterOOD, a set of 15 new environments that evaluate
OOD generalization. On CrafterOOD, we show that the current agents fail to
generalize, whereas our novel object-centric agents achieve state-of-the-art
OOD generalization while also being interpretable. Our code is public.
Related papers
- OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization [66.22117723598872]
We introduce an open-source framework designed to facilitate the development of multimodal web agent.
We first train the base model with imitation learning to gain the basic abilities.
We then let the agent explore the open web and collect feedback on its trajectories.
arXiv Detail & Related papers (2024-10-25T15:01:27Z) - Training on more Reachable Tasks for Generalisation in Reinforcement Learning [5.855552389030083]
In multi-task reinforcement learning, agents train on a fixed set of tasks and have to generalise to new ones.
Recent work has shown that increased exploration improves this generalisation, but it remains unclear why exactly that is.
We introduce the concept of reachability in multi-task reinforcement learning and show that an initial exploration phase increases the number of reachable tasks the agent is trained on.
arXiv Detail & Related papers (2024-10-04T16:15:31Z) - Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning [5.624791703748109]
We show that increased exploration during training can be leveraged to increase the generalisation performance of the agent.
We propose a novel method Explore-Go that exploits this intuition by increasing the number of states on which the agent trains.
arXiv Detail & Related papers (2024-06-12T10:39:31Z) - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - Can Agents Run Relay Race with Strangers? Generalization of RL to
Out-of-Distribution Trajectories [88.08381083207449]
We show the prevalence of emphgeneralization failure on controllable states from stranger agents.
We propose a novel method called Self-Trajectory Augmentation (STA), which will reset the environment to the agent's old states according to the Q function during training.
arXiv Detail & Related papers (2023-04-26T10:12:12Z) - Improving Multimodal Interactive Agents with Reinforcement Learning from
Human Feedback [16.268581985382433]
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback.
Here we demonstrate how to use reinforcement learning from human feedback to improve upon simulated, embodied agents.
arXiv Detail & Related papers (2022-11-21T16:00:31Z) - Benchmarking the Spectrum of Agent Capabilities [7.088856621650764]
We introduce Crafter, an open world survival game with visual inputs that evaluates a wide range of general abilities within a single environment.
Agents learn from the provided reward signal or through intrinsic objectives and are evaluated by semantically meaningful achievements.
We experimentally verify that Crafter is of appropriate difficulty to drive future research and provide baselines scores of reward agents and unsupervised agents.
arXiv Detail & Related papers (2021-09-14T15:49:31Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.