TaskDiff: A Similarity Metric for Task-Oriented Conversations
- URL: http://arxiv.org/abs/2310.15298v2
- Date: Wed, 25 Oct 2023 06:10:07 GMT
- Title: TaskDiff: A Similarity Metric for Task-Oriented Conversations
- Authors: Ankita Bhaumik, Praveen Venkateswaran, Yara Rizk, Vatche Isahagian
- Abstract summary: We present TaskDiff, a novel conversational similarity metric.
It uses different dialogue components (utterances, intents, and slots) and their distributions to compute similarity.
- Score: 6.136198298002772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The popularity of conversational digital assistants has resulted in the
availability of large amounts of conversational data which can be utilized for
improved user experience and personalized response generation. Building these
assistants using popular large language models like ChatGPT also require
additional emphasis on prompt engineering and evaluation methods. Textual
similarity metrics are a key ingredient for such analysis and evaluations.
While many similarity metrics have been proposed in the literature, they have
not proven effective for task-oriented conversations as they do not take
advantage of unique conversational features. To address this gap, we present
TaskDiff, a novel conversational similarity metric that utilizes different
dialogue components (utterances, intents, and slots) and their distributions to
compute similarity. Extensive experimental evaluation of TaskDiff on a
benchmark dataset demonstrates its superior performance and improved robustness
over other related approaches.
Related papers
- MetricPrompt: Prompting Model as a Relevance Metric for Few-shot Text
Classification [65.51149771074944]
MetricPrompt eases verbalizer design difficulty by reformulating few-shot text classification task into text pair relevance estimation task.
We conduct experiments on three widely used text classification datasets across four few-shot settings.
Results show that MetricPrompt outperforms manual verbalizer and other automatic verbalizer design methods across all few-shot settings.
arXiv Detail & Related papers (2023-06-15T06:51:35Z) - Unsupervised Dialogue Topic Segmentation with Topic-aware Utterance
Representation [51.22712675266523]
Dialogue Topic (DTS) plays an essential role in a variety of dialogue modeling tasks.
We propose a novel unsupervised DTS framework, which learns topic-aware utterance representations from unlabeled dialogue data.
arXiv Detail & Related papers (2023-05-04T11:35:23Z) - FCC: Fusing Conversation History and Candidate Provenance for Contextual
Response Ranking in Dialogue Systems [53.89014188309486]
We present a flexible neural framework that can integrate contextual information from multiple channels.
We evaluate our model on the MSDialog dataset widely used for evaluating conversational response ranking tasks.
arXiv Detail & Related papers (2023-03-31T23:58:28Z) - Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function.
This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z) - KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains.
We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z) - We've had this conversation before: A Novel Approach to Measuring Dialog
Similarity [9.218829323265371]
We propose a novel adaptation of the edit distance metric to the scenario of dialog similarity.
Our approach takes into account various conversation aspects such as utterance semantics, conversation flow, and the participants.
arXiv Detail & Related papers (2021-10-12T07:24:12Z) - Phonetic Word Embeddings [1.2192936362342826]
We present a novel methodology for calculating the phonetic similarity between words taking motivation from the human perception of sounds.
This metric is employed to learn a continuous vector embedding space that groups similar sounding words together.
The efficacy of the method is presented for two different languages (English, Hindi) and performance gains over previous reported works are discussed.
arXiv Detail & Related papers (2021-09-30T01:46:01Z) - POSSCORE: A Simple Yet Effective Evaluation of Conversational Search
with Part of Speech Labelling [25.477834359694473]
Conversational search systems, such as Google Assistant and Microsoft Cortana, provide a new search paradigm where users are allowed, via natural language dialogues, to communicate with search systems.
We propose POSSCORE, a simple yet effective automatic evaluation method for conversational search.
We show that our metrics can correlate with human preference, achieving significant improvements over state-of-the-art baseline metrics.
arXiv Detail & Related papers (2021-09-07T12:31:29Z) - Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on
Spoken Language Understanding [101.24748444126982]
Decomposable tasks are complex and comprise of a hierarchy of sub-tasks.
Existing benchmarks, however, typically hold out examples for only the surface-level sub-task.
We propose a framework to construct robust test sets using coordinate ascent over sub-task specific utility functions.
arXiv Detail & Related papers (2021-06-29T02:53:59Z) - Meta-evaluation of Conversational Search Evaluation Metrics [15.942419892035124]
We systematically meta-evaluate a variety of conversational search metrics.
We find that METEOR is the best existing single-turn metric considering all three perspectives.
We also demonstrate that adapted session-based evaluation metrics can be used to measure multi-turn conversational search.
arXiv Detail & Related papers (2021-04-27T20:01:03Z) - Learning an Unreferenced Metric for Online Dialogue Evaluation [53.38078951628143]
We propose an unreferenced automated evaluation metric that uses large pre-trained language models to extract latent representations of utterances.
We show that our model achieves higher correlation with human annotations in an online setting, while not requiring true responses for comparison during inference.
arXiv Detail & Related papers (2020-05-01T20:01:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.