Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
- URL: http://arxiv.org/abs/2402.11651v2
- Date: Tue, 16 Apr 2024 11:41:13 GMT
- Title: Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
- Authors: Renxi Wang, Haonan Li, Xudong Han, Yixuan Zhang, Timothy Baldwin,
- Abstract summary: Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools such as search engines.
Previous work has first collected interaction trajectories between LLMs and environments, using only trajectories that successfully finished the task to fine-tune smaller models.
We argue that unsuccessful trajectories offer valuable insights, and LLMs can learn from these trajectories through appropriate quality control and fine-tuning strategies.
- Score: 41.14201835950814
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools such as search engines. However, LLMs are optimized for language generation instead of tool use during training or alignment, limiting their effectiveness as agents. To resolve this problem, previous work has first collected interaction trajectories between LLMs and environments, using only trajectories that successfully finished the task to fine-tune smaller models, making fine-tuning data scarce and acquiring it both difficult and costly. Discarding failed trajectories also leads to significant wastage of data and resources and limits the possible optimization paths during fine-tuning. In this paper, we argue that unsuccessful trajectories offer valuable insights, and LLMs can learn from these trajectories through appropriate quality control and fine-tuning strategies. By simply adding a prefix or suffix that tells the model whether to generate a successful trajectory during training, we improve model performance by a large margin on mathematical reasoning, multi-hop question answering, and strategic question answering tasks. We further analyze the inference results and find that our method provides a better trade-off between valuable information and errors in unsuccessful trajectories. To our knowledge, we are the first to demonstrate the value of negative trajectories and their application in agent-tunning scenarios. Our findings offer guidance for developing better agent-tuning methods and low-resource data usage techniques.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning [56.82041895921434]
Open-source pre-trained Large Language Models (LLMs) exhibit strong language understanding and generation capabilities.
When used as agents for dealing with complex problems in the real world, their performance is far inferior to large commercial models such as ChatGPT and GPT-4.
arXiv Detail & Related papers (2024-03-29T03:48:12Z) - TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned
Decision [32.24857534147114]
Large language model (LLM) agents have been built for different tasks like web navigation and online shopping.
In this paper, we propose a novel framework (TRAD) to address these issues.
TRAD conducts Thought Retrieval, achieving step-level demonstration selection via thought matching.
Then, TRAD introduces Aligned Decision, complementing retrieved demonstration steps with their previous or subsequent steps.
arXiv Detail & Related papers (2024-03-10T13:58:38Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Unlearnable Algorithms for In-context Learning [36.895152458323764]
In this paper, we focus on efficient unlearning methods for the task adaptation phase of a pretrained large language model.
We observe that an LLM's ability to do in-context learning for task adaptation allows for efficient exact unlearning of task adaptation training data.
We propose a new holistic measure of unlearning cost which accounts for varying inference costs.
arXiv Detail & Related papers (2024-02-01T16:43:04Z) - LLMaAA: Making Large Language Models as Active Annotators [32.57011151031332]
We propose LLMaAA, which takes large language models as annotators and puts them into an active learning loop to determine what to annotate efficiently.
We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction.
With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples.
arXiv Detail & Related papers (2023-10-30T14:54:15Z) - Contrastive Trajectory Similarity Learning with Dual-Feature Attention [24.445998309807965]
Tray similarity measures act as query predicates in trajectory databases.
We propose a contrastive learning-based trajectory modelling method named TrajCL.
TrajCL is consistently and significantly more accurate and faster than the state-of-the-art trajectory similarity measures.
arXiv Detail & Related papers (2022-10-11T05:25:14Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.