Disentangling Exploration of Large Language Models by Optimal Exploitation
- URL: http://arxiv.org/abs/2501.08925v3
- Date: Sun, 24 Aug 2025 16:04:08 GMT
- Title: Disentangling Exploration of Large Language Models by Optimal Exploitation
- Authors: Tim Grams, Patrick Betz, Sascha Marton, Stefan Lüdtke, Christian Bartelt,
- Abstract summary: This work isolates exploration as the sole objective, tasking an agent with gathering information that enhances future returns.<n>We decompose missing rewards into their exploration and exploitation components based on the optimal achievable return.
- Score: 17.346054308224993
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Exploration is a crucial skill for in-context reinforcement learning in unknown environments. However, it remains unclear if large language models can effectively explore a partially hidden state space. This work isolates exploration as the sole objective, tasking an agent with gathering information that enhances future returns. Within this framework, we argue that measuring agent returns is not sufficient for a fair evaluation. Hence, we decompose missing rewards into their exploration and exploitation components based on the optimal achievable return. Experiments with various models reveal that most struggle to explore the state space, and weak exploration is insufficient. Nevertheless, we found a positive correlation between exploration performance and reasoning capabilities. Our decomposition can provide insights into differences in behaviors driven by prompt engineering, offering a valuable tool for refining performance in exploratory tasks.
Related papers
- Task-Aware Exploration via a Predictive Bisimulation Metric [13.445649480300132]
We present TEB, a Task-aware Exploration approach that tightly couples task-relevant representations with exploration through a predictive Bisimulation metric.<n>Specifically, TEB leverages the metric not only to learn behaviorally grounded task representations but also to measure behaviorally intrinsic novelty over the learned latent space.<n>Building on this robust metric, we design potential-based exploration bonuses, which measure the relative novelty of adjacent observations over the latent space.
arXiv Detail & Related papers (2026-02-21T05:30:34Z) - Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration? [83.13508919229939]
Theory of Space is defined as an agent's ability to actively acquire information through self-directed, active exploration.<n>A key innovation is spatial belief probing, which prompts models to reveal their internal spatial representations at each step.<n>Our findings suggest that current foundation models struggle to maintain coherent, revisable spatial beliefs during active exploration.
arXiv Detail & Related papers (2026-02-04T19:06:40Z) - Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration [52.35887679314727]
Long-term Memory Embodied Exploration aims to unify the agent's exploratory cognition and decision-making behaviors.<n>To enhance the agent's memory recall and proactive exploration capabilities, we propose MemoryExplorer.
arXiv Detail & Related papers (2026-01-11T16:23:22Z) - Goal Discovery with Causal Capacity for Efficient Reinforcement Learning [85.28685202281918]
Causal inference is crucial for humans to explore the world.<n>We propose a novel Goal Discovery with Causal Capacity framework for efficient environment exploration.
arXiv Detail & Related papers (2025-08-13T08:54:56Z) - Exploitation Is All You Need... for Exploration [0.0]
We show that an agent trained solely to maximize a greedy (exploitation-only) objective can nonetheless exhibit emergent exploratory behavior.<n>Under the right prerequisites, exploration and exploitation need not be treated as objectives but can emerge from a unified reward-maximization process.
arXiv Detail & Related papers (2025-08-02T09:42:59Z) - Imagine, Verify, Execute: Memory-guided Agentic Exploration with Vision-Language Models [81.08295968057453]
We present IVE, an agentic exploration framework inspired by human curiosity.<n>We evaluate IVE in both simulated and real-world tabletop environments.
arXiv Detail & Related papers (2025-05-12T17:59:11Z) - Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering [87.76784654371312]
Embodied Question Answering requires agents to dynamically explore 3D environments, actively gather visual information, and perform multi-step reasoning to answer questions.
Existing datasets often introduce biases or prior knowledge, leading to disembodied reasoning.
We construct the largest dataset designed specifically to evaluate both exploration and reasoning capabilities.
arXiv Detail & Related papers (2025-03-14T06:29:47Z) - Fostering Intrinsic Motivation in Reinforcement Learning with Pretrained Foundation Models [8.255197802529118]
Recent rise of foundation models, such as CLIP, offers opportunity to leverage pretrained, semantically rich embeddings.
Introductory modules can effectively utilize full state information, significantly increasing sample efficiency.
We show that embeddings provided by foundation models are sometimes even better than those constructed by the agent during training.
arXiv Detail & Related papers (2024-10-09T20:05:45Z) - Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL)
Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms.
It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z) - World Models with Hints of Large Language Models for Goal Achieving [56.91610333715712]
Reinforcement learning struggles in the face of long-horizon tasks and sparse goals.
Inspired by human cognition, we propose a new multi-modal model-based RL approach named Dreaming with Large Language Models (M).DLL.M integrates the proposed hinting subgoals into the model rollouts to encourage goal discovery and reaching in challenging tasks.
arXiv Detail & Related papers (2024-06-11T15:49:08Z) - WESE: Weak Exploration to Strong Exploitation for LLM Agents [95.6720931773781]
This paper proposes a novel approach, Weak Exploration to Strong Exploitation (WESE) to enhance LLM agents in solving open-world interactive tasks.
WESE involves decoupling the exploration and exploitation process, employing a cost-effective weak agent to perform exploration tasks for global knowledge.
A knowledge graph-based strategy is then introduced to store the acquired knowledge and extract task-relevant knowledge, enhancing the stronger agent in success rate and efficiency for the exploitation task.
arXiv Detail & Related papers (2024-04-11T03:31:54Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Efficient Exploration for LLMs [27.59380499111532]
We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models.
In our experiments, an agent sequentially generates queries while fitting a reward model to the feedback received.
Our results demonstrate that efficient exploration enables high levels of performance with far fewer queries.
arXiv Detail & Related papers (2024-02-01T07:32:24Z) - Successor-Predecessor Intrinsic Exploration [18.440869985362998]
We focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards.
We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information.
We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods.
arXiv Detail & Related papers (2023-05-24T16:02:51Z) - Self-supervised network distillation: an effective approach to exploration in sparse reward environments [0.0]
Reinforcement learning can train an agent to behave in an environment according to a predesigned reward function.
The solution to such a problem may be to equip the agent with an intrinsic motivation that will provide informed exploration.
We present Self-supervised Network Distillation (SND), a class of intrinsic motivation algorithms based on the distillation error as a novelty indicator.
arXiv Detail & Related papers (2023-02-22T18:58:09Z) - Follow your Nose: Using General Value Functions for Directed Exploration
in Reinforcement Learning [5.40729975786985]
This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy.
We provide a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrate the performance advantage in three navigation tasks.
arXiv Detail & Related papers (2022-03-02T05:14:11Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - Exploration in Deep Reinforcement Learning: A Comprehensive Survey [24.252352133705735]
Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant success across a wide range of domains, such as game AI, autonomous vehicles, robotics and finance.
DRL and deep MARL agents are widely known to be sample-inefficient and millions of interactions are usually needed even for relatively simple game settings.
This paper provides a comprehensive survey on existing exploration methods in DRL and deep MARL.
arXiv Detail & Related papers (2021-09-14T13:16:33Z) - Online reinforcement learning with sparse rewards through an active
inference capsule [62.997667081978825]
This paper introduces an active inference agent which minimizes the novel free energy of the expected future.
Our model is capable of solving sparse-reward problems with a very high sample efficiency.
We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives.
arXiv Detail & Related papers (2021-06-04T10:03:36Z) - Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via
Latent Model Ensembles [73.15950858151594]
This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards.
We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling.
We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives.
arXiv Detail & Related papers (2020-10-27T22:06:57Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions.
We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z) - AutoOD: Automated Outlier Detection via Curiosity-guided Search and
Self-imitation Learning [72.99415402575886]
Outlier detection is an important data mining task with numerous practical applications.
We propose AutoOD, an automated outlier detection framework, which aims to search for an optimal neural network model.
Experimental results on various real-world benchmark datasets demonstrate that the deep model identified by AutoOD achieves the best performance.
arXiv Detail & Related papers (2020-06-19T18:57:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.