Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
- URL: http://arxiv.org/abs/2009.11152v3
- Date: Mon, 8 Feb 2021 13:49:19 GMT
- Title: Hierarchical Pre-training for Sequence Labelling in Spoken Dialog
- Authors: Emile Chapuis and Pierre Colombo, Matteo Manica, Matthieu Labeau,
Chloe Clavel
- Abstract summary: We propose a new approach to learn generic representations adapted to spoken dialog.
We obtain our representations with a hierarchical encoder based on transformer architectures.
Pre-training is performed on OpenSubtitles: a large corpus of spoken dialog containing over $2.3$ billion of tokens.
- Score: 10.216901061363641
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence labelling tasks like Dialog Act and Emotion/Sentiment identification
are a key component of spoken dialog systems. In this work, we propose a new
approach to learn generic representations adapted to spoken dialog, which we
evaluate on a new benchmark we call Sequence labellIng evaLuatIon benChmark fOr
spoken laNguagE benchmark (\texttt{SILICONE}). \texttt{SILICONE} is
model-agnostic and contains 10 different datasets of various sizes. We obtain
our representations with a hierarchical encoder based on transformer
architectures, for which we extend two well-known pre-training objectives.
Pre-training is performed on OpenSubtitles: a large corpus of spoken dialog
containing over $2.3$ billion of tokens. We demonstrate how hierarchical
encoders achieve competitive results with consistently fewer parameters
compared to state-of-the-art models and we show their importance for both
pre-training and fine-tuning.
Related papers
- TokenSplit: Using Discrete Speech Representations for Direct, Refined,
and Transcript-Conditioned Speech Separation and Recognition [51.565319173790314]
TokenSplit is a sequence-to-sequence encoder-decoder model that uses the Transformer architecture.
We show that our model achieves excellent performance in terms of separation, both with or without transcript conditioning.
We also measure the automatic speech recognition (ASR) performance and provide audio samples of speech synthesis to demonstrate the additional utility of our model.
arXiv Detail & Related papers (2023-08-21T01:52:01Z) - Hierarchical Dialogue Understanding with Special Tokens and Turn-level
Attention [19.03781524017955]
We propose a simple but effective Hierarchical Dialogue Understanding model, HiDialog.
We first insert multiple special tokens into a dialogue and propose the turn-level attention to learn turn embeddings hierarchically.
We evaluate our model on various dialogue understanding tasks including dialogue relation extraction, dialogue emotion recognition, and dialogue act classification.
arXiv Detail & Related papers (2023-04-29T13:53:48Z) - DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization [127.714919036388]
DIONYSUS is a pre-trained encoder-decoder model for summarizing dialogues in any new domain.
Our experiments show that DIONYSUS outperforms existing methods on six datasets.
arXiv Detail & Related papers (2022-12-20T06:21:21Z) - SPACE-2: Tree-Structured Semi-Supervised Contrastive Pre-training for
Task-Oriented Dialog Understanding [68.94808536012371]
We propose a tree-structured pre-trained conversation model, which learns dialog representations from limited labeled dialogs and large-scale unlabeled dialog corpora.
Our method can achieve new state-of-the-art results on the DialoGLUE benchmark consisting of seven datasets and four popular dialog understanding tasks.
arXiv Detail & Related papers (2022-09-14T13:42:50Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Structure Extraction in Task-Oriented Dialogues with Slot Clustering [94.27806592467537]
In task-oriented dialogues, dialogue structure has often been considered as transition graphs among dialogue states.
We propose a simple yet effective approach for structure extraction in task-oriented dialogues.
arXiv Detail & Related papers (2022-02-28T20:18:12Z) - GALAXY: A Generative Pre-trained Model for Task-Oriented Dialog with
Semi-Supervised Learning and Explicit Policy Injection [36.77204909711832]
We propose a novel pre-trained dialog model that explicitly learns dialog policy from limited labeled dialogs and large-scale unlabeled dialog corpora.
Specifically, we introduce a dialog act prediction task for policy optimization during pre-training and employ a consistency regularization term to refine the learned representation.
Empirical results show that GALAXY substantially improves the performance of task-oriented dialog systems.
arXiv Detail & Related papers (2021-11-29T15:24:36Z) - What Helps Transformers Recognize Conversational Structure? Importance
of Context, Punctuation, and Labels in Dialog Act Recognition [41.1669799542627]
We apply two pre-trained transformer models to structure a conversational transcript as a sequence of dialog acts.
We find that the inclusion of a broader conversational context helps disambiguate many dialog act classes.
A detailed analysis reveals specific segmentation patterns observed in its absence.
arXiv Detail & Related papers (2021-07-05T21:56:00Z) - DialogBERT: Discourse-Aware Response Generation via Learning to Recover
and Rank Utterances [18.199473005335093]
This paper presents DialogBERT, a novel conversational response generation model that enhances previous PLM-based dialogue models.
To efficiently capture the discourse-level coherence among utterances, we propose two training objectives, including masked utterance regression.
Experiments on three multi-turn conversation datasets show that our approach remarkably outperforms the baselines.
arXiv Detail & Related papers (2020-12-03T09:06:23Z) - Variational Hierarchical Dialog Autoencoder for Dialog State Tracking
Data Augmentation [59.174903564894954]
In this work, we extend this approach to the task of dialog state tracking for goal-oriented dialogs.
We propose the Variational Hierarchical Dialog Autoencoder (VHDA) for modeling the complete aspects of goal-oriented dialogs.
Experiments on various dialog datasets show that our model improves the downstream dialog trackers' robustness via generative data augmentation.
arXiv Detail & Related papers (2020-01-23T15:34:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.