Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken
Conversations
- URL: http://arxiv.org/abs/2203.04045v1
- Date: Tue, 8 Mar 2022 12:26:57 GMT
- Title: Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken
Conversations
- Authors: Ruijie Yan, Shuang Peng, Haitao Mi, Liang Jiang, Shihui Yang, Yuchi
Zhang, Jiajun Li, Liangrui Peng, Yongliang Wang, Zujie Wen
- Abstract summary: This paper presents our approach to build generalized models for the Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations Challenge of DSTC-10.
We employ extensive data augmentation strategies on written data, including artificial error injection and round-trip text-speech transformation.
Our approach ranks third on the objective evaluation and second on the final official human evaluation.
- Score: 22.894541507068933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Building robust and general dialogue models for spoken conversations is
challenging due to the gap in distributions of spoken and written data. This
paper presents our approach to build generalized models for the
Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations
Challenge of DSTC-10. In order to mitigate the discrepancies between spoken and
written text, we mainly employ extensive data augmentation strategies on
written data, including artificial error injection and round-trip text-speech
transformation. To train robust models for spoken conversations, we improve
pre-trained language models, and apply ensemble algorithms for each sub-task.
Typically, for the detection task, we fine-tune \roberta and ELECTRA, and run
an error-fixing ensemble algorithm. For the selection task, we adopt a
two-stage framework that consists of entity tracking and knowledge ranking, and
propose a multi-task learning method to learn multi-level semantic information
by domain classification and entity selection. For the generation task, we
adopt a cross-validation data process to improve pre-trained generative
language models, followed by a consensus decoding algorithm, which can add
arbitrary features like relative \rouge metric, and tune associated feature
weights toward \bleu directly. Our approach ranks third on the objective
evaluation and second on the final official human evaluation.
Related papers
- Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies.
We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z) - Integrating Self-supervised Speech Model with Pseudo Word-level Targets
from Visually-grounded Speech Model [57.78191634042409]
We propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process.
Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information.
arXiv Detail & Related papers (2024-02-08T16:55:21Z) - Pre-training Multi-party Dialogue Models with Latent Discourse Inference [85.9683181507206]
We pre-train a model that understands the discourse structure of multi-party dialogues, namely, to whom each utterance is replying.
To fully utilize the unlabeled data, we propose to treat the discourse structures as latent variables, then jointly infer them and pre-train the discourse-aware model.
arXiv Detail & Related papers (2023-05-24T14:06:27Z) - Conversation Disentanglement with Bi-Level Contrastive Learning [26.707584899718288]
Existing methods have two main drawbacks. First, they overemphasize pairwise utterance relations but pay inadequate attention to the utterance-to-context relation modeling.
We propose a general disentangle model based on bi-level contrastive learning. It brings closer utterances in the same session while encourages each utterance to be near its clustered session prototypes in the representation space.
arXiv Detail & Related papers (2022-10-27T08:41:46Z) - OPAL: Ontology-Aware Pretrained Language Model for End-to-End
Task-Oriented Dialogue [40.62090743056549]
This paper presents an ontology-aware pretrained language model (OPAL) for end-to-end task-oriented dialogue (TOD)
Unlike chit-chat dialogue models, task-oriented dialogue models fulfill at least two task-specific modules: dialogue state tracker (DST) and response generator (RG)
arXiv Detail & Related papers (2022-09-10T04:38:27Z) - Probing Task-Oriented Dialogue Representation from Language Models [106.02947285212132]
This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.
We fine-tune a feed-forward layer as the classifier probe on top of a fixed pre-trained language model with annotated labels in a supervised way.
arXiv Detail & Related papers (2020-10-26T21:34:39Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - An Empirical Investigation of Pre-Trained Transformer Language Models
for Open-Domain Dialogue Generation [23.343006562849126]
We present an empirical investigation of pre-trained Transformer-based auto-regressive language models for the task of open-domain dialogue generation.
Training paradigm of pre-training and fine-tuning is employed to conduct learning.
Experiments are conducted on the typical single-turn and multi-turn dialogue corpora such as Weibo, Douban, Reddit, DailyDialog, and Persona-Chat.
arXiv Detail & Related papers (2020-03-09T15:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.