Ask more, know better: Reinforce-Learned Prompt Questions for Decision
Making with Large Language Models
- URL: http://arxiv.org/abs/2310.18127v2
- Date: Thu, 29 Feb 2024 03:41:23 GMT
- Title: Ask more, know better: Reinforce-Learned Prompt Questions for Decision
Making with Large Language Models
- Authors: Xue Yan, Yan Song, Xinyu Cui, Filippos Christianos, Haifeng Zhang,
David Henry Mguni, Jun Wang
- Abstract summary: Large language models (LLMs) combine action-based policies with chain of thought (CoT) reasoning.
Human intervention is also required to develop grounding functions that ensure low-level controllers appropriately process CoT reasoning.
We propose a comprehensive training framework for complex task-solving, incorporating human prior knowledge into the learning of action policies.
- Score: 18.409654309062027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) demonstrate their promise in tackling
complicated practical challenges by combining action-based policies with chain
of thought (CoT) reasoning. Having high-quality prompts on hand, however, is
vital to the framework's effectiveness. Currently, these prompts are
handcrafted utilising extensive human labor, resulting in CoT policies that
frequently fail to generalise. Human intervention is also required to develop
grounding functions that ensure low-level controllers appropriately process CoT
reasoning. In this paper, we propose a comprehensive training framework for
complex task-solving, incorporating human prior knowledge into the learning of
action policies. To that purpose, we offer a new leader-follower bilevel
framework that is capable of learning to ask relevant questions (prompts) and
subsequently undertaking reasoning to guide the learning of actions. The prompt
policy is employed to make introspective revisions based on historical
findings, leading the CoT process to consider the anticipated goals and
generate outputs that lead to decisive, high-performing actions. The action
policy subsequently learns to comprehend and integrate the CoT outputs to take
actions. Our empirical data reveal that our framework outperforms leading
methods in $5$ decision-making tasks such as Overcooked and FourRoom.
Related papers
- Active Fine-Tuning of Generalist Policies [54.65568433408307]
We propose AMF (Active Multi-task Fine-tuning) to maximize multi-task policy performance under a limited demonstration budget.
We derive performance guarantees for AMF under regularity assumptions and demonstrate its empirical effectiveness in complex and high-dimensional environments.
arXiv Detail & Related papers (2024-10-07T13:26:36Z) - Taking Action Towards Graceful Interaction: The Effects of Performing
Actions on Modelling Policies for Instruction Clarification Requests [23.405917899107767]
Transformer-based models fail to learn good policies for when to ask Instruction CRs.
We discuss the shortcomings of the data-driven paradigm for learning meta-communication acts.
arXiv Detail & Related papers (2024-01-30T14:18:31Z) - On the Value of Myopic Behavior in Policy Reuse [67.37788288093299]
Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.
In this work, we present a framework called Selective Myopic bEhavior Control(SMEC)
SMEC adaptively aggregates the sharable short-term behaviors of prior policies and the long-term behaviors of the task policy, leading to coordinated decisions.
arXiv Detail & Related papers (2023-05-28T03:59:37Z) - Active Prompting with Chain-of-Thought for Large Language Models [26.5029080638055]
This paper proposes a new method, Active-Prompt, to adapt large language models to different tasks.
By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty.
Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks.
arXiv Detail & Related papers (2023-02-23T18:58:59Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Learning When and What to Ask: a Hierarchical Reinforcement Learning
Framework [17.017688226277834]
We formulate a hierarchical reinforcement learning framework for learning to decide when to request additional information from humans.
Results on a simulated human-assisted navigation problem demonstrate the effectiveness of our framework.
arXiv Detail & Related papers (2021-10-14T01:30:36Z) - Feudal Reinforcement Learning by Reading Manuals [23.19226806839748]
We present a Feudal Reinforcement Learning model consisting of a manager agent and a worker agent.
Our model effectively alleviates the mismatching between text-level inference and low-level perceptions and actions.
arXiv Detail & Related papers (2021-10-13T03:50:15Z) - Attaining Interpretability in Reinforcement Learning via Hierarchical
Primitive Composition [3.1078562713129765]
We propose a novel hierarchical reinforcement learning algorithm that mitigates the aforementioned issues by decomposing the original task in a hierarchy.
We show how the proposed scheme can be employed in practice by solving a pick and place task with a 6 DoF manipulator.
arXiv Detail & Related papers (2021-10-05T05:59:31Z) - CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented
Dialog Systems [56.302581679816775]
This paper proposes Comprehensive Instruction (CINS) that exploits PLMs with task-specific instructions.
We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD.
Experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data.
arXiv Detail & Related papers (2021-09-10T03:23:06Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Tree-Structured Policy based Progressive Reinforcement Learning for
Temporally Language Grounding in Video [128.08590291947544]
Temporally language grounding in untrimmed videos is a newly-raised task in video understanding.
Inspired by human's coarse-to-fine decision-making paradigm, we formulate a novel Tree-Structured Policy based Progressive Reinforcement Learning framework.
arXiv Detail & Related papers (2020-01-18T15:08:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.