Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems
- URL: http://arxiv.org/abs/2509.24116v2
- Date: Tue, 30 Sep 2025 02:57:33 GMT
- Title: Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems
- Authors: Minsoo Kim, Seung-won Hwang,
- Abstract summary: LLM-based agents have seen promising advances, yet they are still limited in "hard-exploration" tasks requiring learning new knowledge through exploration.<n>We present GLoW, a novel approach leveraging dual-scale world models, maintaining a trajectory frontier of high-value discoveries at the global scale.<n>We tackle the Jericho benchmark suite of text-based games, where GLoW achieves a new state-of-theart performance for LLM-based approaches.
- Score: 41.790981479496644
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM-based agents have seen promising advances, yet they are still limited in "hard-exploration" tasks requiring learning new knowledge through exploration. We present GLoW, a novel approach leveraging dual-scale world models, maintaining a trajectory frontier of high-value discoveries at the global scale, while learning from local trial-and-error in exploration through a Multi-path Advantage Reflection mechanism which infers advantage-based progress signals to guide exploration. To evaluate our framework for hard-exploration, we tackle the Jericho benchmark suite of text-based games, where GLoW achieves a new state-of-theart performance for LLM-based approaches. Compared to state-of-the-art RLbased methods, our approach achieves comparable performance while requiring 100-800x fewer environment interactions.
Related papers
- Language-based Trial and Error Falls Behind in the Era of Experience [50.503828360874536]
Large Language Models (LLMs) excel in language-based agentic tasks, but their applicability to unseen, nonlinguistic environments remains limited.<n>In this work, we demonstrate the primary bottleneck is the prohibitive cost of exploration.<n>We propose SCOUT, a novel framework that decouples exploration from semantic exploitation.
arXiv Detail & Related papers (2026-01-29T14:08:41Z) - Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration [16.651645602449577]
Vision-and-Language Navigation (VLN) agents leveraging Large Language Models (LLMs) excel in generalization but suffer from insufficient spatial perception.<n>We present Spatial-VLN, a perception-guided exploration framework designed to overcome these challenges.
arXiv Detail & Related papers (2026-01-19T06:53:02Z) - ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay [22.725471788115403]
Embodied exploration is a target-driven process that requires embodied agents to possess fine-grained perception and knowledge-enhanced decision making.<n>While recent attempts leverage MLLMs for exploration due to their strong perceptual and reasoning abilities, we find that MLLM-based embodied agents remain suboptimal in exploring new environments.<n>We address these challenges with ReEXplore, a training-free framework that performs retrospective experience replay to inject distilled, abstract experience at inference time, and hierarchical frontier selection to decompose frontier ranking into coarse-to-fine decisions.
arXiv Detail & Related papers (2025-11-24T12:13:05Z) - Guiding Exploration in Reinforcement Learning Through LLM-Augmented Observations [0.0]
Large Language Models (LLMs) possess procedural knowledge and reasoning capabilities from text pretraining.<n>We propose a framework that provides LLM-generated action recommendations through augmented observation spaces.
arXiv Detail & Related papers (2025-10-09T19:54:31Z) - Exploration with Foundation Models: Capabilities, Limitations, and Hybrid Approaches [2.9165586612027234]
We show that VLM guidance can significantly improve early-stage sample efficiency.<n>Our results provide a clear analysis of the potential and constraints of using foundation models to guide exploration rather than for end-to-end control.
arXiv Detail & Related papers (2025-09-24T09:25:15Z) - VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction [14.873988791609127]
We present VIR-Bench, a benchmark consisting of 200 travel videos that frames itinerary reconstruction as a challenging task.<n> Experimental results reveal that state-of-the-art MLLMs, including proprietary ones, struggle to achieve high scores.<n>We conduct an in-depth case study in which we develop a prototype travel-planning agent.
arXiv Detail & Related papers (2025-09-23T13:46:31Z) - Sample Efficient Reinforcement Learning via Large Vision Language Model Distillation [19.48826538310603]
We introduce LVLM to Policy (LVLM2P), a framework that distills knowledge from large vision-language models (LVLM) into more efficientReinforcement Learning agents.<n>Our approach leverages the LVLM as a teacher, providing instructional actions based on trajectories collected by the RL agent.<n>We show that LVLM2P significantly enhances the sample efficiency of baseline RL algorithms.
arXiv Detail & Related papers (2025-05-16T13:15:54Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - Open-World Reinforcement Learning over Long Short-Term Imagination [91.28998327423295]
Training visual reinforcement learning agents in a high-dimensional open world presents significant challenges.<n>We present LS-Imagine, which extends the imagination horizon within a limited number of state transition steps.<n>Our method demonstrates significant improvements over state-of-the-art techniques in MineDojo.
arXiv Detail & Related papers (2024-10-04T17:17:30Z) - World Models with Hints of Large Language Models for Goal Achieving [56.91610333715712]
Reinforcement learning struggles in the face of long-horizon tasks and sparse goals.
Inspired by human cognition, we propose a new multi-modal model-based RL approach named Dreaming with Large Language Models (M).DLL.M integrates the proposed hinting subgoals into the model rollouts to encourage goal discovery and reaching in challenging tasks.
arXiv Detail & Related papers (2024-06-11T15:49:08Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - LLaMA Rider: Spurring Large Language Models to Explore the Open World [36.261626047323695]
The capacity of Large Language Models to continuously acquire environmental knowledge and adapt in an open world remains uncertain.
We propose an approach to spur LLMs to explore the open world, gather experiences, and learn to improve their task-solving capabilities.
By evaluation in Minecraft, an open-ended sandbox world, we demonstrate that our approach LLaMA-Rider enhances the efficiency of the LLM in exploring the environment.
arXiv Detail & Related papers (2023-10-13T07:47:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.