Schema-Guided Semantic Accuracy: Faithfulness in Task-Oriented Dialogue
Response Generation
- URL: http://arxiv.org/abs/2301.12568v1
- Date: Sun, 29 Jan 2023 22:32:48 GMT
- Title: Schema-Guided Semantic Accuracy: Faithfulness in Task-Oriented Dialogue
Response Generation
- Authors: Jinghong Chen, Weizhe Lin and Bill Byrne
- Abstract summary: We propose-Guided Semantic Accuracy (SGSAcc) to evaluate utterances generated from both categorical and non-categorical slots.
We show that SGSAcc can be applied to evaluate utterances generated from a wide range of dialogue actions with good agreement with human judgment.
We also identify a previously overlooked weakness in generating faithful utterances from categorical slots in unseen domains.
- Score: 12.165005406799134
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ensuring that generated utterances are faithful to dialogue actions is
crucial for Task-Oriented Dialogue Response Generation. Slot Error Rate (SER)
only partially measures generation quality in that it solely assesses
utterances generated from non-categorical slots whose values are expected to be
reproduced exactly. Utterances generated from categorical slots, which are more
variable, are not assessed by SER. We propose Schema-Guided Semantic Accuracy
(SGSAcc) to evaluate utterances generated from both categorical and
non-categorical slots by recognizing textual entailment. We show that SGSAcc
can be applied to evaluate utterances generated from a wide range of dialogue
actions in the Schema Guided Dialogue (SGD) dataset with good agreement with
human judgment. We also identify a previously overlooked weakness in generating
faithful utterances from categorical slots in unseen domains. We show that
prefix tuning applied to T5 generation can address this problem. We further
build an ensemble of prefix-tuning and fine-tuning models that achieves the
lowest SER reported and high SGSAcc on the SGD dataset.
Related papers
- Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems.
Previous studies suggest using only a portion of the dialogue context in the annotation process.
This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z) - SIG: Speaker Identification in Literature via Prompt-Based Generation [13.042070464592374]
We propose a generation-based method that verbalizes the task and quotation input based on designed prompt templates.
The prediction can either come from direct generation by the model, or be determined by the highest generation probability of each speaker candidate.
We perform both cross-domain evaluation and in-domain evaluation on PDNC, the largest dataset of this task.
arXiv Detail & Related papers (2023-12-22T10:29:18Z) - InstructTODS: Large Language Models for End-to-End Task-Oriented
Dialogue Systems [60.53276524369498]
Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP)
We present InstructTODS, a novel framework for zero-shot end-to-end task-oriented dialogue systems.
InstructTODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries.
arXiv Detail & Related papers (2023-10-13T06:36:26Z) - Unsupervised Dialogue Topic Segmentation with Topic-aware Utterance
Representation [51.22712675266523]
Dialogue Topic (DTS) plays an essential role in a variety of dialogue modeling tasks.
We propose a novel unsupervised DTS framework, which learns topic-aware utterance representations from unlabeled dialogue data.
arXiv Detail & Related papers (2023-05-04T11:35:23Z) - More Robust Schema-Guided Dialogue State Tracking via Tree-Based
Paraphrase Ranking [0.0]
Fine-tuned language models excel at building schema-guided dialogue state tracking (DST)
We propose a framework for generating synthetic schemas which uses tree-based ranking to jointly optimise diversity and semantic faithfulness.
arXiv Detail & Related papers (2023-03-17T11:43:08Z) - SWING: Balancing Coverage and Faithfulness for Dialogue Summarization [67.76393867114923]
We propose to utilize natural language inference (NLI) models to improve coverage while avoiding factual inconsistencies.
We use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered.
Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-01-25T09:33:11Z) - Dialogue Meaning Representation for Task-Oriented Dialogue Systems [51.91615150842267]
We propose Dialogue Meaning Representation (DMR), a flexible and easily extendable representation for task-oriented dialogue.
Our representation contains a set of nodes and edges with inheritance hierarchy to represent rich semantics for compositional semantics and task-specific concepts.
We propose two evaluation tasks to evaluate different machine learning based dialogue models, and further propose a novel coreference resolution model GNNCoref for the graph-based coreference resolution task.
arXiv Detail & Related papers (2022-04-23T04:17:55Z) - SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue
Systems [26.14268488547028]
We release SGD-X, a benchmark for measuring robustness of dialogue systems to linguistic variations in schemas.
We evaluate two dialogue state tracking models on SGD-X and observe that neither generalizes well across schema variations.
We present a simple model-agnostic data augmentation method to improve schema robustness and zero-shot generalization to unseen services.
arXiv Detail & Related papers (2021-10-13T15:38:29Z) - Zero-shot Generalization in Dialog State Tracking through Generative
Question Answering [10.81203437307028]
We introduce a novel framework that supports natural language queries for unseen constraints and slots in task-oriented dialogs.
Our approach is based on generative question-answering using a conditional domain model pre-trained on substantive English sentences.
arXiv Detail & Related papers (2021-01-20T21:47:20Z) - End-to-end speech-to-dialog-act recognition [38.58540444573232]
We present an end-to-end model which directly converts speech into dialog acts without the deterministic transcription process.
In the proposed model, the dialog act recognition network is conjunct with an acoustic-to-word ASR model at its latent layer.
The entire network is fine-tuned in an end-to-end manner.
arXiv Detail & Related papers (2020-04-23T18:44:27Z) - Few-shot Natural Language Generation for Task-Oriented Dialog [113.07438787659859]
We present FewShotWoz, the first NLG benchmark to simulate the few-shot learning setting in task-oriented dialog systems.
We develop the SC-GPT model, which is pre-trained on a large set of annotated NLG corpus to acquire the controllable generation ability.
Experiments on FewShotWoz and the large Multi-Domain-WOZ datasets show that the proposed SC-GPT significantly outperforms existing methods.
arXiv Detail & Related papers (2020-02-27T18:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.