Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
- URL: http://arxiv.org/abs/2405.15143v2
- Date: Thu, 30 May 2024 06:48:44 GMT
- Title: Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
- Authors: Cong Lu, Shengran Hu, Jeff Clune,
- Abstract summary: Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems.
We propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore.
IGE has a human-like ability to instinctively identify how interesting or promising any new state is.
- Score: 5.404186221463082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems, built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics to guide exploration, which is time-consuming and infeasible in general. To resolve this, we propose Intelligent Go-Explore (IGE) which greatly extends the scope of the original Go-Explore by replacing these heuristics with the intelligence and internalized human notions of interestingness captured by giant foundation models (FMs). This provides IGE with a human-like ability to instinctively identify how interesting or promising any new state is (e.g. discovering new objects, locations, or behaviors), even in complex environments where heuristics are hard to define. Moreover, IGE offers the exciting and previously impossible opportunity to recognize and capitalize on serendipitous discoveries that cannot be predicted ahead of time. We evaluate IGE on a range of language-based tasks that require search and exploration. In Game of 24, a multistep mathematical reasoning problem, IGE reaches 100% success rate 70.8% faster than the best classic graph search baseline. Next, in BabyAI-Text, a challenging partially observable gridworld, IGE exceeds the previous SOTA with orders of magnitude fewer online samples. Finally, in TextWorld, we show the unique ability of IGE to succeed in settings requiring long-horizon exploration where prior SOTA FM agents like Reflexion completely fail. Overall, IGE combines the tremendous strengths of FMs and the powerful Go-Explore algorithm, opening up a new frontier of research into creating more generally capable agents with impressive exploration capabilities.
Related papers
- A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond [84.95530356322621]
This survey presents a systematic review of the advancements in code intelligence.
It covers over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works.
Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence.
arXiv Detail & Related papers (2024-03-21T08:54:56Z) - Ghost in the Minecraft: Generally Capable Agents for Open-World
Environments via Large Language Models with Text-based Knowledge and Memory [97.87093169454431]
Ghost in the Minecraft (GITM) is a novel framework that integrates Large Language Models (LLMs) with text-based knowledge and memory.
We develop a set of structured actions and leverage LLMs to generate action plans for the agents to execute.
The resulting LLM-based agent markedly surpasses previous methods, achieving a remarkable improvement of +47.5% in success rate.
arXiv Detail & Related papers (2023-05-25T17:59:49Z) - OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks.
This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents.
We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z) - Sparks of Artificial General Intelligence: Early experiments with GPT-4 [66.1188263570629]
GPT-4, developed by OpenAI, was trained using an unprecedented scale of compute and data.
We demonstrate that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more.
We believe GPT-4 could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.
arXiv Detail & Related papers (2023-03-22T16:51:28Z) - Time-Myopic Go-Explore: Learning A State Representation for the
Go-Explore Paradigm [0.5156484100374059]
We introduce a novel time-myopic state representation that clusters temporal close states together.
We demonstrate the first learned state representation that reliably estimates novelty instead of using the hand-crafted representation.
Our approach is evaluated on the hard exploration environments MontezumaRevenge, Gravitar and Frostbite (Atari)
arXiv Detail & Related papers (2023-01-13T16:13:44Z) - First Go, then Post-Explore: the Benefits of Post-Exploration in
Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
Key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state.
We refer to such exploration after a goal is reached as 'post-exploration'
arXiv Detail & Related papers (2022-12-06T18:56:47Z) - Cell-Free Latent Go-Explore [3.1868913341776106]
Latent Go-Explore (LGE) is a simple and general approach based on the Go-Explore paradigm for exploration in reinforcement learning (RL)
We show that LGE can be flexibly combined with any strategy for learning a latent representation.
Our results indicate that LGE, although simpler than Go-Explore, is more robust and outperforms state-of-the-art algorithms in terms of pure exploration.
arXiv Detail & Related papers (2022-08-31T15:52:02Z) - When to Go, and When to Explore: The Benefit of Post-Exploration in
Intrinsic Motivation [7.021281655855703]
Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards.
We refer to such exploration after a goal is reached as 'post-exploration'
We introduce new methodology to adaptively decide when to post-explore and for how long to post-explore.
arXiv Detail & Related papers (2022-03-29T16:50:12Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - First return, then explore [18.876005532689234]
Go-Explore is a family of algorithms that explicitly remembers promising states and first returning to such states before intentionally exploring.
Go-Explore solves all heretofore unsolved Atari games and surpasses the state of the art on all hard-exploration games.
We show that adding a goal-conditioned policy can further improve Go-Explore's exploration efficiency and enable it to handleity throughout training.
arXiv Detail & Related papers (2020-04-27T16:31:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.