Related papers: Schema-Guided Semantic Accuracy: Faithfulness in Task-Oriented Dialogue Response Generation

Schema-Guided Semantic Accuracy: Faithfulness in Task-Oriented Dialogue Response Generation

URL: http://arxiv.org/abs/2301.12568v1
Date: Sun, 29 Jan 2023 22:32:48 GMT
Title: Schema-Guided Semantic Accuracy: Faithfulness in Task-Oriented Dialogue Response Generation
Authors: Jinghong Chen, Weizhe Lin and Bill Byrne
Abstract summary: We propose-Guided Semantic Accuracy (SGSAcc) to evaluate utterances generated from both categorical and non-categorical slots. We show that SGSAcc can be applied to evaluate utterances generated from a wide range of dialogue actions with good agreement with human judgment. We also identify a previously overlooked weakness in generating faithful utterances from categorical slots in unseen domains.
Score: 12.165005406799134
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensuring that generated utterances are faithful to dialogue actions is crucial for Task-Oriented Dialogue Response Generation. Slot Error Rate (SER) only partially measures generation quality in that it solely assesses utterances generated from non-categorical slots whose values are expected to be reproduced exactly. Utterances generated from categorical slots, which are more variable, are not assessed by SER. We propose Schema-Guided Semantic Accuracy (SGSAcc) to evaluate utterances generated from both categorical and non-categorical slots by recognizing textual entailment. We show that SGSAcc can be applied to evaluate utterances generated from a wide range of dialogue actions in the Schema Guided Dialogue (SGD) dataset with good agreement with human judgment. We also identify a previously overlooked weakness in generating faithful utterances from categorical slots in unseen domains. We show that prefix tuning applied to T5 generation can address this problem. We further build an ensemble of prefix-tuning and fine-tuning models that achieves the lowest SER reported and high SGSAcc on the SGD dataset.

Related papers

Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation. We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation. We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z)
Transforming Slot Schema Induction with Generative Dialogue State Inference [14.06505399101404]
Slot Induction (SSI) aims to automatically induce slots from unlabeled dialogue data. Our SSI method discovers high-quality candidate information for representing dialogue state. Experimental comparisons on the MultiWOZ and SGD datasets demonstrate that Generative Dialogue State Inference (GenDSI) outperforms the previous state-of-the-art.
arXiv Detail & Related papers (2024-08-03T02:41:10Z)
Context Does Matter: Implications for Crowdsourced Evaluation Labels in Task-Oriented Dialogue Systems [57.16442740983528]
Crowdsourced labels play a crucial role in evaluating task-oriented dialogue systems. Previous studies suggest using only a portion of the dialogue context in the annotation process. This study investigates the influence of dialogue context on annotation quality.
arXiv Detail & Related papers (2024-04-15T17:56:39Z)
SIG: Speaker Identification in Literature via Prompt-Based Generation [13.042070464592374]
We propose a generation-based method that verbalizes the task and quotation input based on designed prompt templates. The prediction can either come from direct generation by the model, or be determined by the highest generation probability of each speaker candidate. We perform both cross-domain evaluation and in-domain evaluation on PDNC, the largest dataset of this task.
arXiv Detail & Related papers (2023-12-22T10:29:18Z)
InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems [60.53276524369498]
Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP) We present InstructTODS, a novel framework for zero-shot end-to-end task-oriented dialogue systems. InstructTODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries.
arXiv Detail & Related papers (2023-10-13T06:36:26Z)
Unsupervised Dialogue Topic Segmentation with Topic-aware Utterance Representation [51.22712675266523]
Dialogue Topic (DTS) plays an essential role in a variety of dialogue modeling tasks. We propose a novel unsupervised DTS framework, which learns topic-aware utterance representations from unlabeled dialogue data.
arXiv Detail & Related papers (2023-05-04T11:35:23Z)
More Robust Schema-Guided Dialogue State Tracking via Tree-Based Paraphrase Ranking [0.0]
Fine-tuned language models excel at building schema-guided dialogue state tracking (DST) We propose a framework for generating synthetic schemas which uses tree-based ranking to jointly optimise diversity and semantic faithfulness.
arXiv Detail & Related papers (2023-03-17T11:43:08Z)
SWING: Balancing Coverage and Faithfulness for Dialogue Summarization [67.76393867114923]
We propose to utilize natural language inference (NLI) models to improve coverage while avoiding factual inconsistencies. We use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered. Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-01-25T09:33:11Z)
Dialogue Meaning Representation for Task-Oriented Dialogue Systems [51.91615150842267]
We propose Dialogue Meaning Representation (DMR), a flexible and easily extendable representation for task-oriented dialogue. Our representation contains a set of nodes and edges with inheritance hierarchy to represent rich semantics for compositional semantics and task-specific concepts. We propose two evaluation tasks to evaluate different machine learning based dialogue models, and further propose a novel coreference resolution model GNNCoref for the graph-based coreference resolution task.
arXiv Detail & Related papers (2022-04-23T04:17:55Z)
SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems [26.14268488547028]
We release SGD-X, a benchmark for measuring robustness of dialogue systems to linguistic variations in schemas. We evaluate two dialogue state tracking models on SGD-X and observe that neither generalizes well across schema variations. We present a simple model-agnostic data augmentation method to improve schema robustness and zero-shot generalization to unseen services.
arXiv Detail & Related papers (2021-10-13T15:38:29Z)
End-to-end speech-to-dialog-act recognition [38.58540444573232]
We present an end-to-end model which directly converts speech into dialog acts without the deterministic transcription process. In the proposed model, the dialog act recognition network is conjunct with an acoustic-to-word ASR model at its latent layer. The entire network is fine-tuned in an end-to-end manner.
arXiv Detail & Related papers (2020-04-23T18:44:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.