Reason for Future, Act for Now: A Principled Framework for Autonomous
LLM Agents with Provable Sample Efficiency
- URL: http://arxiv.org/abs/2309.17382v2
- Date: Wed, 11 Oct 2023 06:18:04 GMT
- Title: Reason for Future, Act for Now: A Principled Framework for Autonomous
LLM Agents with Provable Sample Efficiency
- Authors: Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu,
Zhaoran Wang
- Abstract summary: We propose a principled framework with provable regret guarantees to orchestrate reasoning and acting.
Specifically, we design a prompt template for reasoning that learns from the memory buffer and plans a future trajectory over a long horizon.
At each step, the LLM agent takes the initial action of the planned trajectory ("act for now"), stores the collected feedback in the memory buffer, and reinvokes the reasoning routine to replan the future trajectory from the new state.
- Score: 53.8779374188643
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) demonstrate impressive reasoning abilities, but
translating reasoning into actions in the real world remains challenging. In
particular, it remains unclear how to complete a given task provably within a
minimum number of interactions with the external environment, e.g., through an
internal mechanism of reasoning. To this end, we propose a principled framework
with provable regret guarantees to orchestrate reasoning and acting, which we
call "reason for future, act for now" (\texttt{RAFA}). Specifically, we design
a prompt template for reasoning that learns from the memory buffer and plans a
future trajectory over a long horizon ("reason for future"). At each step, the
LLM agent takes the initial action of the planned trajectory ("act for now"),
stores the collected feedback in the memory buffer, and reinvokes the reasoning
routine to replan the future trajectory from the new state.
The key idea is to cast reasoning in LLMs as learning and planning in
Bayesian adaptive Markov decision processes (MDPs). Correspondingly, we prompt
LLMs to form an updated posterior of the unknown environment from the memory
buffer (learning) and generate an optimal trajectory for multiple future steps
that maximizes a value function (planning). The learning and planning
subroutines are performed in an "in-context" manner to emulate the actor-critic
update for MDPs. Our theoretical analysis proves that the novel combination of
long-term reasoning and short-term acting achieves a $\sqrt{T}$ regret. In
particular, the regret bound highlights an intriguing interplay between the
prior knowledge obtained through pretraining and the uncertainty reduction
achieved by reasoning and acting. Our empirical validation shows that it
outperforms various existing frameworks and achieves nearly perfect scores on a
few benchmarks.
Related papers
- ReasonPlanner: Enhancing Autonomous Planning in Dynamic Environments with Temporal Knowledge Graphs and LLMs [0.32141666878560626]
We introduce ReasonPlanner, a novel generalist agent designed for reflective thinking, planning, and interactive reasoning.
ReasonPlanner significantly outperforms previous state-of-the-art prompting-based methods on the ScienceWorld benchmark by more than 1.8 times.
It relies solely on frozen weights thus requiring no gradient updates.
arXiv Detail & Related papers (2024-10-11T20:58:51Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing [61.98556945939045]
We propose a framework to learn planning-based reasoning through Direct Preference Optimization (DPO) on collected trajectories.
Our results on challenging logical reasoning benchmarks demonstrate the effectiveness of our learning framework.
arXiv Detail & Related papers (2024-02-01T15:18:33Z) - Are LLMs Rigorous Logical Reasoner? Empowering Natural Language Proof
Generation with Contrastive Stepwise Decoding [11.385103498440932]
We introduce contrastive decoding to stepwise proof generation, making use of negative reasoning paths to strengthen the model's capacity for logical deduction.
Experiments on EntailmentBank underscore the success of our method in augmenting the proof planning abilities of language models.
arXiv Detail & Related papers (2023-11-12T05:12:49Z) - DetermLR: Augmenting LLM-based Logical Reasoning from Indeterminacy to Determinacy [76.58614128865652]
We propose DetermLR, a novel perspective that rethinks the reasoning process as an evolution from indeterminacy to determinacy.
First, we categorize known conditions into two types: determinate and indeterminate premises This provides an oveall direction for the reasoning process and guides LLMs in converting indeterminate data into progressively determinate insights.
We automate the storage and extraction of available premises and reasoning paths with reasoning memory, preserving historical reasoning details for subsequent reasoning steps.
arXiv Detail & Related papers (2023-10-28T10:05:51Z) - Remembering for the Right Reasons: Explanations Reduce Catastrophic
Forgetting [100.75479161884935]
We propose a novel training paradigm called Remembering for the Right Reasons (RRR)
RRR stores visual model explanations for each example in the buffer and ensures the model has "the right reasons" for its predictions.
We demonstrate how RRR can be easily added to any memory or regularization-based approach and results in reduced forgetting.
arXiv Detail & Related papers (2020-10-04T10:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.