Dialogue Term Extraction using Transfer Learning and Topological Data
Analysis
- URL: http://arxiv.org/abs/2208.10448v1
- Date: Mon, 22 Aug 2022 17:04:04 GMT
- Title: Dialogue Term Extraction using Transfer Learning and Topological Data
Analysis
- Authors: Renato Vukovic, Michael Heck, Benjamin Matthias Ruppik, Carel van
Niekerk, Marcus Zibrowius, Milica Ga\v{s}i\'c
- Abstract summary: We explore different features that can enable systems to discover realizations of domains, slots, and values in dialogues in a purely data-driven fashion.
To examine the utility of each feature set, we train a seed model based on the widely used MultiWOZ data-set.
Our method outperforms the previously proposed approach that relies solely on word embeddings.
- Score: 0.8185867455104834
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Goal oriented dialogue systems were originally designed as a natural language
interface to a fixed data-set of entities that users might inquire about,
further described by domain, slots, and values. As we move towards adaptable
dialogue systems where knowledge about domains, slots, and values may change,
there is an increasing need to automatically extract these terms from raw
dialogues or related non-dialogue data on a large scale. In this paper, we take
an important step in this direction by exploring different features that can
enable systems to discover realizations of domains, slots, and values in
dialogues in a purely data-driven fashion. The features that we examine stem
from word embeddings, language modelling features, as well as topological
features of the word embedding space. To examine the utility of each feature
set, we train a seed model based on the widely used MultiWOZ data-set. Then, we
apply this model to a different corpus, the Schema-Guided Dialogue data-set.
Our method outperforms the previously proposed approach that relies solely on
word embeddings. We also demonstrate that each of the features is responsible
for discovering different kinds of content. We believe our results warrant
further research towards ontology induction, and continued harnessing of
topological data analysis for dialogue and natural language processing
research.
Related papers
- Bridging Information Gaps in Dialogues With Grounded Exchanges Using Knowledge Graphs [4.449835214520727]
We study the potential of large language models for conversational grounding.
Our approach involves annotating human conversations across five knowledge domains to create a new dialogue corpus called BridgeKG.
Our findings offer insights into how these models use in-context learning for conversational grounding tasks and common prediction errors.
arXiv Detail & Related papers (2024-08-02T08:07:15Z) - CGoDial: A Large-Scale Benchmark for Chinese Goal-oriented Dialog
Evaluation [75.60156479374416]
CGoDial is a new challenging and comprehensive Chinese benchmark for Goal-oriented Dialog evaluation.
It contains 96,763 dialog sessions and 574,949 dialog turns totally, covering three datasets with different knowledge sources.
To bridge the gap between academic benchmarks and spoken dialog scenarios, we either collect data from real conversations or add spoken features to existing datasets via crowd-sourcing.
arXiv Detail & Related papers (2022-11-21T16:21:41Z) - Manual-Guided Dialogue for Flexible Conversational Agents [84.46598430403886]
How to build and use dialogue data efficiently, and how to deploy models in different domains at scale can be critical issues in building a task-oriented dialogue system.
We propose a novel manual-guided dialogue scheme, where the agent learns the tasks from both dialogue and manuals.
Our proposed scheme reduces the dependence of dialogue models on fine-grained domain ontology, and makes them more flexible to adapt to various domains.
arXiv Detail & Related papers (2022-08-16T08:21:12Z) - Building a Role Specified Open-Domain Dialogue System Leveraging
Large-Scale Language Models [15.062014096238803]
We study the challenge of imposing roles on open-domain dialogue systems.
We propose an efficient data collection framework for building role-satisfying dialogue dataset from scratch.
Our models return few out-of-bounds utterances, keeping competitive performance on general metrics.
arXiv Detail & Related papers (2022-04-30T06:23:06Z) - Structure Extraction in Task-Oriented Dialogues with Slot Clustering [94.27806592467537]
In task-oriented dialogues, dialogue structure has often been considered as transition graphs among dialogue states.
We propose a simple yet effective approach for structure extraction in task-oriented dialogues.
arXiv Detail & Related papers (2022-02-28T20:18:12Z) - Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation [70.81596088969378]
Cross-lingual Outline-based Dialogue dataset (termed COD) enables natural language understanding.
COD enables dialogue state tracking, and end-to-end dialogue modelling and evaluation in 4 diverse languages.
arXiv Detail & Related papers (2022-01-31T18:11:21Z) - Language Model as an Annotator: Exploring DialoGPT for Dialogue
Summarization [29.887562761942114]
We show how DialoGPT, a pre-trained model for conversational response generation, can be developed as an unsupervised dialogue annotator.
We apply DialoGPT to label three types of features on two dialogue summarization datasets, SAMSum and AMI, and employ pre-trained and non pre-trained models as our summarizes.
arXiv Detail & Related papers (2021-05-26T13:50:13Z) - RiSAWOZ: A Large-Scale Multi-Domain Wizard-of-Oz Dataset with Rich
Semantic Annotations for Task-Oriented Dialogue Modeling [35.75880078666584]
RiSAWOZ is a large-scale multi-domain Chinese Wizard-of-Oz dataset with Rich Semantic s.
It contains 11.2K human-to-human (H2H) multi-turn semantically annotated dialogues, with more than 150K utterances spanning over 12 domains.
arXiv Detail & Related papers (2020-10-17T08:18:59Z) - Meta-Context Transformers for Domain-Specific Response Generation [4.377737808397113]
We present DSRNet, a transformer-based model for dialogue response generation by reinforcing domain-specific attributes.
We study the use of DSRNet in a multi-turn multi-interlocutor environment for domain-specific response generation.
Our model shows significant improvement over the state-of-the-art for multi-turn dialogue systems supported by better BLEU and semantic similarity (BertScore) scores.
arXiv Detail & Related papers (2020-10-12T09:49:27Z) - Ranking Enhanced Dialogue Generation [77.8321855074999]
How to effectively utilize the dialogue history is a crucial problem in multi-turn dialogue generation.
Previous works usually employ various neural network architectures to model the history.
This paper proposes a Ranking Enhanced Dialogue generation framework.
arXiv Detail & Related papers (2020-08-13T01:49:56Z) - Variational Hierarchical Dialog Autoencoder for Dialog State Tracking
Data Augmentation [59.174903564894954]
In this work, we extend this approach to the task of dialog state tracking for goal-oriented dialogs.
We propose the Variational Hierarchical Dialog Autoencoder (VHDA) for modeling the complete aspects of goal-oriented dialogs.
Experiments on various dialog datasets show that our model improves the downstream dialog trackers' robustness via generative data augmentation.
arXiv Detail & Related papers (2020-01-23T15:34:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.