O3D: Offline Data-driven Discovery and Distillation for Sequential
Decision-Making with Large Language Models
- URL: http://arxiv.org/abs/2310.14403v5
- Date: Mon, 26 Feb 2024 18:29:45 GMT
- Title: O3D: Offline Data-driven Discovery and Distillation for Sequential
Decision-Making with Large Language Models
- Authors: Yuchen Xiao, Yanchao Sun, Mengda Xu, Udari Madhushani, Jared Vann,
Deepeka Garg, Sumitra Ganesh
- Abstract summary: Offline Data-driven Discovery and Distillation (O3D) is proposed to improve large language models (LLMs)
O3D automatically discovers reusable skills and distills generalizable knowledge across multiple tasks based on offline interaction data.
Empirical results under two interactive decision-making benchmarks (ALFWorld and WebShop) verify that O3D can notably enhance the decision-making capabilities of LLMs.
- Score: 16.91329676173649
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in large language models (LLMs) have exhibited promising
performance in solving sequential decision-making problems. By imitating
few-shot examples provided in the prompts (i.e., in-context learning), an LLM
agent can interact with an external environment and complete given tasks
without additional training. However, such few-shot examples are often
insufficient to generate high-quality solutions for complex and long-horizon
tasks, while the limited context length cannot consume larger-scale
demonstrations with long interaction horizons. To this end, we propose an
offline learning framework that utilizes offline data at scale (e.g, logs of
human interactions) to improve LLM-powered policies without finetuning. The
proposed method O3D (Offline Data-driven Discovery and Distillation)
automatically discovers reusable skills and distills generalizable knowledge
across multiple tasks based on offline interaction data, advancing the
capability of solving downstream tasks. Empirical results under two interactive
decision-making benchmarks (ALFWorld and WebShop) verify that O3D can notably
enhance the decision-making capabilities of LLMs through the offline discovery
and distillation process, and consistently outperform baselines across various
LLMs.
Related papers
- Practical Unlearning for Large Language Models [23.515444452866404]
Machine unlearning (MU) has emerged as a promising solution to address these issues.
MU typically assumes full access to the original training data to preserve utility.
Existing LLM unlearning methods often assume access to data most affected by undesired data unlearning.
We propose the O3 framework to overcome these challenges and achieve practical LLM unlearning.
arXiv Detail & Related papers (2024-07-14T14:26:17Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Sub-goal Distillation: A Method to Improve Small Language Agents [21.815417165548187]
Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks.
We propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model.
In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7%.
arXiv Detail & Related papers (2024-05-04T20:34:06Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - Unmemorization in Large Language Models via Self-Distillation and
Deliberate Imagination [58.36408867180233]
Large Language Models (LLMs) struggle with crucial issues of privacy violation and unwanted exposure of sensitive data.
We introduce a novel approach termed deliberate imagination in the context of LLM unlearning.
Our results demonstrate the usefulness of this approach across different models and sizes, and also with parameter-efficient fine-tuning.
arXiv Detail & Related papers (2024-02-15T16:21:14Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset,
Framework, and Benchmark [81.42376626294812]
We present Language-Assisted Multi-Modal instruction tuning dataset, framework, and benchmark.
Our aim is to establish LAMM as a growing ecosystem for training and evaluating MLLMs.
We present a comprehensive dataset and benchmark, which cover a wide range of vision tasks for 2D and 3D vision.
arXiv Detail & Related papers (2023-06-11T14:01:17Z) - Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach [31.6589518077397]
Large language models (LLMs) encode a vast amount of world knowledge acquired from massive text datasets.
LLMs can assist an embodied agent in solving complex sequential decision making tasks by providing high-level instructions.
We propose When2Ask, a reinforcement learning based approach that learns when it is necessary to query LLMs for high-level instructions.
arXiv Detail & Related papers (2023-06-06T11:49:09Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.