From Laws to Motivation: Guiding Exploration through Law-Based Reasoning and Rewards
- URL: http://arxiv.org/abs/2411.15891v1
- Date: Sun, 24 Nov 2024 15:57:53 GMT
- Title: From Laws to Motivation: Guiding Exploration through Law-Based Reasoning and Rewards
- Authors: Ziyu Chen, Zhiqing Xiao, Xinbei Jiang, Junbo Zhao,
- Abstract summary: Large Language Models (LLMs) and Reinforcement Learning (RL) are powerful approaches for building autonomous agents.
We propose a method that extracts experience from interaction records to model the underlying laws of the game environment.
- Score: 12.698095783768322
- License:
- Abstract: Large Language Models (LLMs) and Reinforcement Learning (RL) are two powerful approaches for building autonomous agents. However, due to limited understanding of the game environment, agents often resort to inefficient exploration and trial-and-error, struggling to develop long-term strategies or make decisions. We propose a method that extracts experience from interaction records to model the underlying laws of the game environment, using these experience as internal motivation to guide agents. These experience, expressed in language, are highly flexible and can either assist agents in reasoning directly or be transformed into rewards for guiding training. Our evaluation results in Crafter demonstrate that both RL and LLM agents benefit from these experience, leading to improved overall performance.
Related papers
- Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Empowering Large Language Model Agents through Action Learning [85.39581419680755]
Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error.
We argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents.
We introduce a framework LearnAct with an iterative learning strategy to create and improve actions in the form of Python functions.
arXiv Detail & Related papers (2024-02-24T13:13:04Z) - Language Agents with Reinforcement Learning for Strategic Play in the
Werewolf Game [40.438765131992525]
We develop strategic language agents that generate flexible language actions and possess strong decision-making abilities.
To mitigate the intrinsic bias in language actions, our agents use an LLM to perform deductive reasoning and generate a diverse set of action candidates.
Experiments show that our agents overcome the intrinsic bias and outperform existing LLM-based agents in the Werewolf game.
arXiv Detail & Related papers (2023-10-29T09:02:57Z) - ExpeL: LLM Agents Are Experiential Learners [60.54312035818746]
We introduce the Experiential Learning (ExpeL) agent to allow learning from agent experiences without requiring parametric updates.
Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks.
At inference, the agent recalls its extracted insights and past experiences to make informed decisions.
arXiv Detail & Related papers (2023-08-20T03:03:34Z) - Learning from Ambiguous Demonstrations with Self-Explanation Guided
Reinforcement Learning [20.263419567168388]
Our work aims at efficiently leveraging ambiguous demonstrations for the training of a reinforcement learning (RL) agent.
Inspired by how humans handle such situations, we propose to use self-explanation to recognize valuable high-level relational features.
Our main contribution is to propose the Self-Explanation for RL from Demonstrations (SERLfD) framework, which can overcome the limitations of traditional RLfD works.
arXiv Detail & Related papers (2021-10-11T13:59:48Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Co-Imitation Learning without Expert Demonstration [39.988945772085465]
We propose a novel learning framework called Co-Imitation Learning (CoIL) to exploit the past good experiences of the agents without expert demonstration.
While the experiences could be valuable or misleading, we propose to estimate the potential utility of each piece of experience with the expected gain of the value function.
Experimental results on various tasks show significant superiority of the proposed Co-Imitation Learning framework.
arXiv Detail & Related papers (2021-03-27T06:58:40Z) - Soft Expert Reward Learning for Vision-and-Language Navigation [94.86954695912125]
Vision-and-Language Navigation (VLN) requires an agent to find a specified spot in an unseen environment by following natural language instructions.
We introduce a Soft Expert Reward Learning (SERL) model to overcome the reward engineering designing and generalisation problems of the VLN task.
arXiv Detail & Related papers (2020-07-21T14:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.