CESAR: Automatic Induction of Compositional Instructions for Multi-turn
Dialogs
- URL: http://arxiv.org/abs/2311.17376v1
- Date: Wed, 29 Nov 2023 06:02:16 GMT
- Title: CESAR: Automatic Induction of Compositional Instructions for Multi-turn
Dialogs
- Authors: Taha Aksu, Devamanyu Hazarika, Shikib Mehri, Seokhwan Kim, Dilek
Hakkani-T\"ur, Yang Liu, Mahdi Namazifar
- Abstract summary: We propose a novel framework, CESAR, that unifies a large number of dialog tasks in the same format.
We apply CESAR on InstructDial, a benchmark for instruction-based dialog tasks.
- Score: 27.092581945832713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instruction-based multitasking has played a critical role in the success of
large language models (LLMs) in multi-turn dialog applications. While publicly
available LLMs have shown promising performance, when exposed to complex
instructions with multiple constraints, they lag against state-of-the-art
models like ChatGPT. In this work, we hypothesize that the availability of
large-scale complex demonstrations is crucial in bridging this gap. Focusing on
dialog applications, we propose a novel framework, CESAR, that unifies a large
number of dialog tasks in the same format and allows programmatic induction of
complex instructions without any manual effort.
We apply CESAR on InstructDial, a benchmark for instruction-based dialog
tasks. We further enhance InstructDial with new datasets and tasks and utilize
CESAR to induce complex tasks with compositional instructions. This results in
a new benchmark called InstructDial++, which includes 63 datasets with 86 basic
tasks and 68 composite tasks. Through rigorous experiments, we demonstrate the
scalability of CESAR in providing rich instructions. Models trained on
InstructDial++ can follow compositional prompts, such as prompts that ask for
multiple stylistic constraints.
Related papers
- CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation [10.438717413104062]
We introduce bench, a benchmark for evaluating Large Language Models' instruction-following capabilities.
bench incorporates nine types of verifiable instructions aligned with the real-world software development requirements.
We evaluate nine prominent LLMs using bench, and the experimental results reveal a significant disparity between their basic programming capability and instruction-following capability.
arXiv Detail & Related papers (2025-03-05T09:47:02Z) - HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models [13.963676467274109]
We extend the capabilities of HELPER by expanding its memory with a wider array of examples and prompts.
This simple expansion of HELPER into a shared memory enables the agent to work across domains executing plans from dialogue, natural language instruction, active question asking, and common room reorganization.
We evaluate the agent on four diverse interactive visual-language embodied agent: AChRED, TEA, DialFRED, and the Tidy Task.
arXiv Detail & Related papers (2024-04-29T19:12:42Z) - Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues [15.959842501166511]
We propose to explicitly capture the complex rules to help the user simulator pose diverse and in-depth instruction.
Experimental results show that our method can generate diverse and in-depth instructions.
arXiv Detail & Related papers (2024-04-17T06:26:32Z) - Context-dependent Instruction Tuning for Dialogue Response Generation [61.21790201307179]
Recent language models have achieved impressive performance in natural language computation tasks by incorporating instructions with task input during fine-tuning.
We introduce a context-based instruction fine-tuning framework for each multi-turn dialogue.
During the evaluation, the model generates instructions based on the previous context to self-guide the response.
arXiv Detail & Related papers (2023-11-13T01:25:30Z) - Ada-Instruct: Adapting Instruction Generators for Complex Reasoning [14.456571495691561]
We introduce Ada-Instruct, an adaptive instruction generator developed through fine-tuning.
We empirically validated Ada-Instruct's efficacy across different applications.
arXiv Detail & Related papers (2023-10-06T13:28:04Z) - Self-Explanation Prompting Improves Dialogue Understanding in Large
Language Models [52.24756457516834]
We propose a novel "Self-Explanation" prompting strategy to enhance the comprehension abilities of Large Language Models (LLMs)
This task-agnostic approach requires the model to analyze each dialogue utterance before task execution, thereby improving performance across various dialogue-centric tasks.
Experimental results from six benchmark datasets confirm that our method consistently outperforms other zero-shot prompts and matches or exceeds the efficacy of few-shot prompts.
arXiv Detail & Related papers (2023-09-22T15:41:34Z) - Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech [107.81472531864195]
Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions.
We present Dynamic-SUPERB, a benchmark for building universal speech models capable of leveraging instruction tuning to perform multiple tasks in a zero-shot fashion.
arXiv Detail & Related papers (2023-09-18T06:43:30Z) - Can Large Language Models Understand Real-World Complex Instructions? [54.86632921036983]
Large language models (LLMs) can understand human instructions, but struggle with complex instructions.
Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions.
We propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.
arXiv Detail & Related papers (2023-09-17T04:18:39Z) - Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks.
This modular structure allows each prompt to be optimized for its specific sub-task.
We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z) - Improving Zero and Few-shot Generalization in Dialogue through
Instruction Tuning [27.92734269206744]
InstructDial is an instruction tuning framework for dialogue.
It consists of a repository of 48 diverse dialogue tasks in a unified text-to-text format created from 59 openly available dialogue datasets.
Our analysis reveals that InstructDial enables good zero-shot performance on unseen datasets and tasks such as dialogue evaluation and intent detection, and even better performance in a few-shot setting.
arXiv Detail & Related papers (2022-05-25T11:37:06Z) - CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented
Dialog Systems [56.302581679816775]
This paper proposes Comprehensive Instruction (CINS) that exploits PLMs with task-specific instructions.
We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD.
Experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data.
arXiv Detail & Related papers (2021-09-10T03:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.