Related papers: METAREFLECTION: Learning Instructions for Language Agents using Past Reflections

METAREFLECTION: Learning Instructions for Language Agents using Past Reflections

URL: http://arxiv.org/abs/2405.13009v1
Date: Mon, 13 May 2024 10:51:43 GMT
Title: METAREFLECTION: Learning Instructions for Language Agents using Past Reflections
Authors: Priyanshu Gupta, Shashank Kirtania, Ananya Singha, Sumit Gulwani, Arjun Radhakrishna, Sherry Shi, Gustavo Soares,
Abstract summary: We introduce METAREFLECTION, a novel technique that learns general prompt instructions for a specific domain from individual self-reflections gathered during a training phase. We evaluate our technique in two domains: Infrastructure as Code (IAC) vulnerability detection and question-answering (QA) using REACT and COT.
Score: 11.028256182234017
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the popularity of Large Language Models (LLMs), crafting specific prompts for LLMs to perform particular tasks remains challenging. Users often engage in multiple conversational turns with an LLM-based agent to accomplish their intended task. Recent studies have demonstrated that linguistic feedback, in the form of self-reflections generated by the model, can work as reinforcement during these conversations, thus enabling quicker convergence to the desired outcome. Motivated by these findings, we introduce METAREFLECTION, a novel technique that learns general prompt instructions for a specific domain from individual self-reflections gathered during a training phase. We evaluate our technique in two domains: Infrastructure as Code (IAC) vulnerability detection and question-answering (QA) using REACT and COT. Our results demonstrate a notable improvement, with METARELECTION outperforming GPT-4 by 16.82% (IAC), 31.33% (COT), and 15.42% (REACT), underscoring the potential of METAREFLECTION as a viable method for enhancing the efficiency of LLMs.

Related papers

Beyond Syntax: Action Semantics Learning for App Agents [60.56331102288794]
Action Semantics Learning (ASL) is a learning framework where the learning objective is capturing the semantics of the ground truth actions.<n>ASL significantly improves the accuracy and generalisation of App agents over existing methods.
arXiv Detail & Related papers (2025-06-21T12:08:19Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models [5.0741409008225755]
Large language models (LLMs) have emerged as promising tools for solving challenging robotic tasks. Most existing LLM-based agents lack the ability to retain and learn from past interactions. We propose RAG-Modulo, a framework that enhances LLM-based agents with a memory of past interactions and incorporates critics to evaluate the agents' decisions.
arXiv Detail & Related papers (2024-09-18T20:03:32Z)
RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents [27.807695570974644]
We propose a novel method, textscRePrompt, which does agradient descent"-like approach to optimize the step-by-step instructions in the prompts given to LLM agents. By leveraging intermediate feedback, textscRePrompt can optimize the prompt without the need for a final solution checker.
arXiv Detail & Related papers (2024-06-17T01:23:11Z)
Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs [15.911445732909849]
Large language models (LLMs) are at the forefront of transforming numerous domains globally. However, their inclusivity and effectiveness remain limited for non-Latin scripts and low-resource languages. This paper tackles the imperative challenge of enhancing the multilingual performance of LLMs without extensive training or fine-tuning.
arXiv Detail & Related papers (2024-05-28T16:56:42Z)
An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications [7.286515881369693]
This paper systematically investigates the capacity of Large Language Models (LLMs) to repair declarative specifications in Alloy.<n>We designed 12 different repair settings, encompassing single-agent and dual-agent paradigms, utilizing various LLMs.<n>Our study reveals that dual-agent with auto-prompting setup outperforms the other settings, albeit with a marginal increase in the number of iterations and token usage.
arXiv Detail & Related papers (2024-04-17T03:46:38Z)
Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers [108.72225067368592]
We propose a novel perspective to investigate the design of large language models (LLMs)-based prompts. We identify two pivotal factors in model parameter learning: update direction and update method. In particular, we borrow the theoretical framework and learning methods from gradient-based optimization to design improved strategies.
arXiv Detail & Related papers (2024-02-27T15:05:32Z)
Are Large Language Models Good Prompt Optimizers? [65.48910201816223]
We conduct a study to uncover the actual mechanism of LLM-based Prompt Optimization. Our findings reveal that the LLMs struggle to identify the true causes of errors during reflection, tending to be biased by their own prior knowledge. We introduce a new "Automatic Behavior Optimization" paradigm, which directly optimize the target model's behavior in a more controllable manner.
arXiv Detail & Related papers (2024-02-03T09:48:54Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization [62.0397906276669]
CLIN is the first language-based agent to continually improve over multiple trials. It can improve its zero-shot performance by 4 points (13 for new tasks) and can further improve performance there through continual memory updates. This suggests a new architecture for agents built on frozen models that can still continually and rapidly improve over time.
arXiv Detail & Related papers (2023-10-16T07:17:27Z)
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models [31.509994889286183]
We introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of language models (LMs) in reasoning, acting, and planning. A key feature of our approach is the incorporation of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism. LATS achieves state-of-the-art pass@1 accuracy (92.7%) for programming on HumanEval with GPT-4 and demonstrates gradient-free performance (average score of 75.9) comparable to gradient-based fine-tuning for web navigation on WebShop with GPT
arXiv Detail & Related papers (2023-10-06T17:55:11Z)
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization [103.70896967077294]
This paper introduces a principled framework for reinforcing large language agents by learning a retrospective model. Our proposed agent architecture learns from rewards across multiple environments and tasks, for fine-tuning a pre-trained language model. Experimental results on various tasks demonstrate that the language agents improve over time.
arXiv Detail & Related papers (2023-08-04T06:14:23Z)
Bridging the Language Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs [15.911445732909849]
Large language models (LLMs) have revolutionized various domains but still struggle with non-Latin scripts and low-resource languages. We introduce a novel dynamic learning approach that optimize prompt strategy, embedding model, and LLM per query at runtime. We show our approach results in 10-15% improvements in multilingual performance over pre-trained models and 4x gains compared to fine-tuned, language-specific models.
arXiv Detail & Related papers (2023-05-28T14:48:38Z)
Reflexion: Language Agents with Verbal Reinforcement Learning [44.85337947858337]
Reflexion is a novel framework to reinforce language agents not by updating weights, but through linguistic feedback. It is flexible enough to incorporate various types (scalar values or free-form language) and sources (external or internally simulated) of feedback signals. For example, Reflexion achieves a 91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previous state-of-the-art GPT-4 that achieves 80%.
arXiv Detail & Related papers (2023-03-20T18:08:50Z)
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models [67.19124099815645]
We propose a novel Language-Aware Soft Prompting (LASP) learning method to alleviate base class overfitting. LASP is inherently amenable to including, during training, virtual classes, i.e. class names for which no visual samples are available. LASP matches and surpasses, for the first time, the accuracy on novel classes obtained by hand-crafted prompts and CLIP for 8 out of 11 test datasets.
arXiv Detail & Related papers (2022-10-03T17:56:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.