Related papers: Exploration Unbound

Exploration Unbound

URL: http://arxiv.org/abs/2407.12178v1
Date: Tue, 16 Jul 2024 21:14:43 GMT
Title: Exploration Unbound
Authors: Dilip Arumugam, Wanqiao Xu, Benjamin Van Roy,
Abstract summary: A sequential decision-making agent balances between exploring to gain new knowledge and exploiting current knowledge to maximize immediate reward. We offer a simple, quintessential example of such a complex environment. In this environment, rewards are unbounded and an agent can always increase the rate at which rewards accumulate by exploring to learn more.
Score: 26.27811928866858
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A sequential decision-making agent balances between exploring to gain new knowledge about an environment and exploiting current knowledge to maximize immediate reward. For environments studied in the traditional literature, optimal decisions gravitate over time toward exploitation as the agent accumulates sufficient knowledge and the benefits of further exploration vanish. What if, however, the environment offers an unlimited amount of useful knowledge and there is large benefit to further exploration no matter how much the agent has learned? We offer a simple, quintessential example of such a complex environment. In this environment, rewards are unbounded and an agent can always increase the rate at which rewards accumulate by exploring to learn more. Consequently, an optimal agent forever maintains a propensity to explore.

Related papers

Intrinsically-Motivated Humans and Agents in Open-World Exploration [50.00331050937369]
We compare adults, children, and AI agents in a complex open-ended environment. We find that only Entropy and Empowerment are consistently positively correlated with human exploration progress.
arXiv Detail & Related papers (2025-03-31T00:09:00Z)
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Exploration and Persuasion [58.87314871998078]
We show how to incentivize self-interested agents to explore when they prefer to exploit. Consider a population of self-interested agents that make decisions under uncertainty. They "explore" to acquire new information and "exploit" this information to make good decisions. This is because exploration is costly, but its benefits are spread over many agents in the future.
arXiv Detail & Related papers (2024-10-22T15:13:13Z)
WESE: Weak Exploration to Strong Exploitation for LLM Agents [95.6720931773781]
This paper proposes a novel approach, Weak Exploration to Strong Exploitation (WESE) to enhance LLM agents in solving open-world interactive tasks. WESE involves decoupling the exploration and exploitation process, employing a cost-effective weak agent to perform exploration tasks for global knowledge. A knowledge graph-based strategy is then introduced to store the acquired knowledge and extract task-relevant knowledge, enhancing the stronger agent in success rate and efficiency for the exploitation task.
arXiv Detail & Related papers (2024-04-11T03:31:54Z)
Information Content Exploration [1.7034813545878589]
We propose a new intrinsic reward that systemically quantifies exploratory behavior and promotes state coverage. We show that our information theoretic reward induces efficient exploration and outperforms in various games.
arXiv Detail & Related papers (2023-10-10T16:51:32Z)
Successor-Predecessor Intrinsic Exploration [18.440869985362998]
We focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information. We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods.
arXiv Detail & Related papers (2023-05-24T16:02:51Z)
Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning [5.40729975786985]
This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy. We provide a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrate the performance advantage in three navigation tasks.
arXiv Detail & Related papers (2022-03-02T05:14:11Z)
Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore) In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process. We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z)
Understanding the origin of information-seeking exploration in probabilistic objectives for control [62.997667081978825]
An exploration-exploitation trade-off is central to the description of adaptive behaviour. One approach to solving this trade-off has been to equip or propose that agents possess an intrinsic 'exploratory drive' We show that this combination of utility maximizing and information-seeking behaviour arises from the minimization of an entirely difference class of objectives.
arXiv Detail & Related papers (2021-03-11T18:42:39Z)
Fast active learning for pure exploration in reinforcement learning [48.98199700043158]
We show that bonuses that scale with $1/n$ bring faster learning rates, improving the known upper bounds with respect to the dependence on the horizon. We also show that with an improved analysis of the stopping time, we can improve by a factor $H$ the sample complexity in the best-policy identification setting.
arXiv Detail & Related papers (2020-07-27T11:28:32Z)
WordCraft: An Environment for Benchmarking Commonsense Agents [107.20421897619002]
We propose WordCraft, an RL environment based on Little Alchemy 2. This lightweight environment is fast to run and built upon entities and relations inspired by real-world semantics.
arXiv Detail & Related papers (2020-07-17T18:40:46Z)
Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal Agent [21.548271801592907]
Reinforcement learners are agents that learn to pick actions that lead to high reward. We show that if an agent is guaranteed to be "asymptotically optimal" in any environment, then subject to an assumption about the true environment, this agent will be either "destroyed" or "incapacitated" We present an agent, Mentee, with the modest guarantee of approaching the performance of a mentor, doing safe exploration instead of reckless exploration.
arXiv Detail & Related papers (2020-06-05T10:42:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.