Related papers: O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models

O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models

URL: http://arxiv.org/abs/2310.14403v5
Date: Mon, 26 Feb 2024 18:29:45 GMT
Title: O3D: Offline Data-driven Discovery and Distillation for Sequential Decision-Making with Large Language Models
Authors: Yuchen Xiao, Yanchao Sun, Mengda Xu, Udari Madhushani, Jared Vann, Deepeka Garg, Sumitra Ganesh
Abstract summary: Offline Data-driven Discovery and Distillation (O3D) is proposed to improve large language models (LLMs) O3D automatically discovers reusable skills and distills generalizable knowledge across multiple tasks based on offline interaction data. Empirical results under two interactive decision-making benchmarks (ALFWorld and WebShop) verify that O3D can notably enhance the decision-making capabilities of LLMs.
Score: 16.91329676173649
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in large language models (LLMs) have exhibited promising performance in solving sequential decision-making problems. By imitating few-shot examples provided in the prompts (i.e., in-context learning), an LLM agent can interact with an external environment and complete given tasks without additional training. However, such few-shot examples are often insufficient to generate high-quality solutions for complex and long-horizon tasks, while the limited context length cannot consume larger-scale demonstrations with long interaction horizons. To this end, we propose an offline learning framework that utilizes offline data at scale (e.g, logs of human interactions) to improve LLM-powered policies without finetuning. The proposed method O3D (Offline Data-driven Discovery and Distillation) automatically discovers reusable skills and distills generalizable knowledge across multiple tasks based on offline interaction data, advancing the capability of solving downstream tasks. Empirical results under two interactive decision-making benchmarks (ALFWorld and WebShop) verify that O3D can notably enhance the decision-making capabilities of LLMs through the offline discovery and distillation process, and consistently outperform baselines across various LLMs.

Related papers

LANID: LLM-assisted New Intent Discovery [18.15557766598695]
New Intent Discovery (NID) is a crucial task that aims to identify novel intents while maintaining the capability to recognize existing ones. Previous efforts to adapt TODS to new intents have struggled with inadequate semantic representation. We propose LANID, a framework that enhances the semantic representation of lightweight NID encoders with the guidance of Large Language Models.
arXiv Detail & Related papers (2025-03-31T05:34:32Z)
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions [9.344348861402928]
MemoryCode is a dataset designed to test Large Language Models' ability to track and execute simple coding instructions amid irrelevant information. Our results highlight a fundamental limitation of current LLMs, restricting their ability to collaborate effectively in long interactions.
arXiv Detail & Related papers (2025-02-19T14:58:04Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation [55.21013307734612]
AoPS-Instruct is a dataset of more than 600,000 high-quality QA pairs. LiveAoPSBench is an evolving evaluation set with timestamps, derived from the latest forum data. Our work presents a scalable approach to creating and maintaining large-scale, high-quality datasets for advanced math reasoning.
arXiv Detail & Related papers (2025-01-24T06:39:38Z)
Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications. Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs. By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z)
Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making. Existing evaluations tend to rely solely on a final success rate. We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z)
EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs [11.664088080448593]
"LlamaDuo" is a pipeline for migrating knowledge and abilities from service-oriented large language models to smaller, locally manageable models. Our pipeline is crucial for ensuring service continuity in the presence of operational failures, strict privacy policies, or offline requirements.
arXiv Detail & Related papers (2024-08-24T05:03:08Z)
Practical Unlearning for Large Language Models [23.515444452866404]
Machine unlearning (MU) has emerged as a promising solution to address these issues. MU typically assumes full access to the original training data to preserve utility. Existing LLM unlearning methods often assume access to data most affected by undesired data unlearning. We propose the O3 framework to overcome these challenges and achieve practical LLM unlearning.
arXiv Detail & Related papers (2024-07-14T14:26:17Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
Sub-goal Distillation: A Method to Improve Small Language Agents [21.815417165548187]
Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks. We propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7%.
arXiv Detail & Related papers (2024-05-04T20:34:06Z)
Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs. We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.