SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue
Systems
- URL: http://arxiv.org/abs/2110.06800v1
- Date: Wed, 13 Oct 2021 15:38:29 GMT
- Title: SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue
Systems
- Authors: Harrison Lee and Raghav Gupta and Abhinav Rastogi and Yuan Cao and Bin
Zhang and Yonghui Wu
- Abstract summary: We release SGD-X, a benchmark for measuring robustness of dialogue systems to linguistic variations in schemas.
We evaluate two dialogue state tracking models on SGD-X and observe that neither generalizes well across schema variations.
We present a simple model-agnostic data augmentation method to improve schema robustness and zero-shot generalization to unseen services.
- Score: 26.14268488547028
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Zero/few-shot transfer to unseen services is a critical challenge in
task-oriented dialogue research. The Schema-Guided Dialogue (SGD) dataset
introduced a paradigm for enabling models to support an unlimited number of
services without additional data collection or re-training through the use of
schemas. Schemas describe service APIs in natural language, which models
consume to understand the services they need to support. However, the impact of
the choice of language in these schemas on model performance remains
unexplored. We address this by releasing SGD-X, a benchmark for measuring the
robustness of dialogue systems to linguistic variations in schemas. SGD-X
extends the SGD dataset with crowdsourced variants for every schema, where
variants are semantically similar yet stylistically diverse. We evaluate two
dialogue state tracking models on SGD-X and observe that neither generalizes
well across schema variations, measured by joint goal accuracy and a novel
metric for measuring schema sensitivity. Furthermore, we present a simple
model-agnostic data augmentation method to improve schema robustness and
zero-shot generalization to unseen services.
Related papers
- Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds.
Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines.
We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z) - Schema Graph-Guided Prompt for Multi-Domain Dialogue State Tracking [16.955887768832046]
We propose a graph-based framework that learns domain-specific prompts by incorporating the dialogue schema.
Specifically, we embed domain-specific schema encoded by a graph neural network into the pre-trained language model.
Our experiments demonstrate that the proposed graph-based method outperforms other multi-domain DST approaches.
arXiv Detail & Related papers (2023-11-10T19:00:02Z) - Span-Selective Linear Attention Transformers for Effective and Robust
Schema-Guided Dialogue State Tracking [7.176787451868171]
We introduce SPLAT, a novel architecture which achieves better generalization and efficiency than prior approaches.
We demonstrate the effectiveness of our model on theGuided Dialogue (SGD) and MultiWOZ datasets.
arXiv Detail & Related papers (2023-06-15T17:59:31Z) - More Robust Schema-Guided Dialogue State Tracking via Tree-Based
Paraphrase Ranking [0.0]
Fine-tuned language models excel at building schema-guided dialogue state tracking (DST)
We propose a framework for generating synthetic schemas which uses tree-based ranking to jointly optimise diversity and semantic faithfulness.
arXiv Detail & Related papers (2023-03-17T11:43:08Z) - A Multi-Task BERT Model for Schema-Guided Dialogue State Tracking [78.2700757742992]
Task-oriented dialogue systems often employ a Dialogue State Tracker (DST) to successfully complete conversations.
Recent state-of-the-art DST implementations rely on schemata of diverse services to improve model robustness.
We propose a single multi-task BERT-based model that jointly solves the three DST tasks of intent prediction, requested slot prediction and slot filling.
arXiv Detail & Related papers (2022-07-02T13:27:59Z) - Show, Don't Tell: Demonstrations Outperform Descriptions for
Schema-Guided Task-Oriented Dialogue [27.43338545216015]
Show, Don't Tell is a prompt format for seq2seq modeling which uses a short labeled example dialogue to show the semantics of schema elements.
While requiring similar effort from service developers, we show that using short examples as schema representations with large language models results in stronger performance and better generalization.
arXiv Detail & Related papers (2022-04-08T23:27:18Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services [15.21976869687864]
We propose SGD-QA, a model for schema-guided dialogue state tracking based on a question answering approach.
The proposed multi-pass model shares a single encoder between the domain information and dialogue utterance.
The model improves performance on unseen services by at least 1.6x compared to single-pass baseline models.
arXiv Detail & Related papers (2021-05-17T17:54:32Z) - RADDLE: An Evaluation Benchmark and Analysis Platform for Robust
Task-oriented Dialog Systems [75.87418236410296]
We introduce the RADDLE benchmark, a collection of corpora and tools for evaluating the performance of models across a diverse set of domains.
RADDLE is designed to favor and encourage models with a strong generalization ability.
We evaluate recent state-of-the-art systems based on pre-training and fine-tuning, and find that grounded pre-training on heterogeneous dialog corpora performs better than training a separate model per domain.
arXiv Detail & Related papers (2020-12-29T08:58:49Z) - Few-shot Natural Language Generation for Task-Oriented Dialog [113.07438787659859]
We present FewShotWoz, the first NLG benchmark to simulate the few-shot learning setting in task-oriented dialog systems.
We develop the SC-GPT model, which is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability.
Experiments on FewShotWoz and the large Multi-Domain-WOZ datasets show that the proposed SC-GPT significantly outperforms existing methods.
arXiv Detail & Related papers (2020-02-27T18:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.