O3D: Offline Data-driven Discovery and Distillation for Sequential
Decision-Making with Large Language Models
- URL: http://arxiv.org/abs/2310.14403v5
- Date: Mon, 26 Feb 2024 18:29:45 GMT
- Title: O3D: Offline Data-driven Discovery and Distillation for Sequential
Decision-Making with Large Language Models
- Authors: Yuchen Xiao, Yanchao Sun, Mengda Xu, Udari Madhushani, Jared Vann,
Deepeka Garg, Sumitra Ganesh
- Abstract summary: Offline Data-driven Discovery and Distillation (O3D) is proposed to improve large language models (LLMs)
O3D automatically discovers reusable skills and distills generalizable knowledge across multiple tasks based on offline interaction data.
Empirical results under two interactive decision-making benchmarks (ALFWorld and WebShop) verify that O3D can notably enhance the decision-making capabilities of LLMs.
- Score: 16.91329676173649
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in large language models (LLMs) have exhibited promising
performance in solving sequential decision-making problems. By imitating
few-shot examples provided in the prompts (i.e., in-context learning), an LLM
agent can interact with an external environment and complete given tasks
without additional training. However, such few-shot examples are often
insufficient to generate high-quality solutions for complex and long-horizon
tasks, while the limited context length cannot consume larger-scale
demonstrations with long interaction horizons. To this end, we propose an
offline learning framework that utilizes offline data at scale (e.g, logs of
human interactions) to improve LLM-powered policies without finetuning. The
proposed method O3D (Offline Data-driven Discovery and Distillation)
automatically discovers reusable skills and distills generalizable knowledge
across multiple tasks based on offline interaction data, advancing the
capability of solving downstream tasks. Empirical results under two interactive
decision-making benchmarks (ALFWorld and WebShop) verify that O3D can notably
enhance the decision-making capabilities of LLMs through the offline discovery
and distillation process, and consistently outperform baselines across various
LLMs.
Related papers
- From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions [9.344348861402928]
MemoryCode is a dataset designed to test Large Language Models' ability to track and execute simple coding instructions amid irrelevant information.
Our results highlight a fundamental limitation of current LLMs, restricting their ability to collaborate effectively in long interactions.
arXiv Detail & Related papers (2025-02-19T14:58:04Z) - Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation [55.21013307734612]
AoPS-Instruct is a dataset of more than 600,000 high-quality QA pairs.
LiveAoPSBench is an evolving evaluation set with timestamps, derived from the latest forum data.
Our work presents a scalable approach to creating and maintaining large-scale, high-quality datasets for advanced math reasoning.
arXiv Detail & Related papers (2025-01-24T06:39:38Z) - Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making.
Existing evaluations tend to rely solely on a final success rate.
We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z) - Practical Unlearning for Large Language Models [23.515444452866404]
Machine unlearning (MU) has emerged as a promising solution to address these issues.
MU typically assumes full access to the original training data to preserve utility.
Existing LLM unlearning methods often assume access to data most affected by undesired data unlearning.
We propose the O3 framework to overcome these challenges and achieve practical LLM unlearning.
arXiv Detail & Related papers (2024-07-14T14:26:17Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Sub-goal Distillation: A Method to Improve Small Language Agents [21.815417165548187]
Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks.
We propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model.
In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7%.
arXiv Detail & Related papers (2024-05-04T20:34:06Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.