True Knowledge Comes from Practice: Aligning LLMs with Embodied
Environments via Reinforcement Learning
- URL: http://arxiv.org/abs/2401.14151v2
- Date: Mon, 11 Mar 2024 03:15:58 GMT
- Title: True Knowledge Comes from Practice: Aligning LLMs with Embodied
Environments via Reinforcement Learning
- Authors: Weihao Tan, Wentao Zhang, Shanqi Liu, Longtao Zheng, Xinrun Wang, Bo
An
- Abstract summary: Large language models (LLMs) often fail in solving simple decision-making tasks due to misalignment of the knowledge in LLMs with environments.
We propose TWOSOME, a novel framework that deploys LLMs as decision-making agents to efficiently interact and align with embodied environments via RL.
- Score: 37.10401435242991
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the impressive performance across numerous tasks, large language
models (LLMs) often fail in solving simple decision-making tasks due to the
misalignment of the knowledge in LLMs with environments. On the contrary,
reinforcement learning (RL) agents learn policies from scratch, which makes
them always align with environments but difficult to incorporate prior
knowledge for efficient explorations. To narrow the gap, we propose TWOSOME, a
novel general online framework that deploys LLMs as decision-making agents to
efficiently interact and align with embodied environments via RL without
requiring any prepared datasets or prior knowledge of the environments.
Firstly, we query the joint probabilities of each valid action with LLMs to
form behavior policies. Then, to enhance the stability and robustness of the
policies, we propose two normalization methods and summarize four prompt design
principles. Finally, we design a novel parameter-efficient training
architecture where the actor and critic share one frozen LLM equipped with
low-rank adapters (LoRA) updated by PPO. We conduct extensive experiments to
evaluate TWOSOME. i) TWOSOME exhibits significantly better sample efficiency
and performance compared to the conventional RL method, PPO, and prompt tuning
method, SayCan, in both classical decision-making environment, Overcooked, and
simulated household environment, VirtualHome. ii) Benefiting from LLMs'
open-vocabulary feature, TWOSOME shows superior generalization ability to
unseen tasks. iii) Under our framework, there is no significant loss of the
LLMs' original ability during online PPO finetuning.
Related papers
- LLM-PySC2: Starcraft II learning environment for Large Language Models [16.918044347226104]
This paper introduces a new environment that serves to develop Large Language Models (LLMs) based decision-making methodologies.
This environment is the first to offer the complete StarCraft II action space, multi-modal observation interfaces, and a structured game knowledge database.
arXiv Detail & Related papers (2024-11-08T06:04:22Z) - Efficient Reinforcement Learning with Large Language Model Priors [18.72288751305885]
Large language models (LLMs) have recently emerged as powerful general-purpose tools.
We propose treating LLMs as prior action distributions and integrating them into RL frameworks.
We show that incorporating LLM-based action priors significantly reduces exploration and complexity optimization.
arXiv Detail & Related papers (2024-10-10T13:54:11Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models [67.49462724595445]
Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs)
We propose a novel method that involves learning scalable and pluggable virtual tokens for RAG.
arXiv Detail & Related papers (2024-05-30T03:44:54Z) - Reinforcement Learning Problem Solving with Large Language Models [0.0]
Large Language Models (LLMs) have an extensive amount of world knowledge, and this has enabled their application in various domains to improve the performance of Natural Language Processing (NLP) tasks.
This has also facilitated a more accessible paradigm of conversation-based interactions between humans and AI systems to solve intended problems.
We show the practicality of our approach through two detailed case studies for "Research Scientist" and "Legal Matter Intake"
arXiv Detail & Related papers (2024-04-29T12:16:08Z) - Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts [10.929547354171723]
This paper introduces Knowledgeable Agents from Language Model Rollouts (KALM)
It extracts knowledge from large language models (LLMs) in the form of imaginary rollouts that can be easily learned by the agent through offline reinforcement learning methods.
It achieves a success rate of 46% in executing tasks with unseen goals, substantially surpassing the 26% success rate achieved by baseline methods.
arXiv Detail & Related papers (2024-04-14T13:19:40Z) - EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments.
We train a small RL agent in a mixture of the original and LLM-generated environments.
We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making
using Language Guided World Modelling [101.59430768507997]
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of the world.
We propose using few-shot large language models (LLMs) to hypothesize an Abstract World Model (AWM)
Our method of hypothesizing an AWM with LLMs and then verifying the AWM based on agent experience not only increases sample efficiency over contemporary methods by an order of magnitude.
arXiv Detail & Related papers (2023-01-28T02:04:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.