Towards Making the Most of Dialogue Characteristics for Neural Chat
Translation
- URL: http://arxiv.org/abs/2109.00668v1
- Date: Thu, 2 Sep 2021 02:04:00 GMT
- Title: Towards Making the Most of Dialogue Characteristics for Neural Chat
Translation
- Authors: Yunlong Liang, Chulun Zhou, Fandong Meng, Jinan Xu, Yufeng Chen,
Jinsong Su and Jie Zhou
- Abstract summary: We propose introducing to promote the chat translation by the modeling of dialogue characteristics into the NCT model.
We optimize the NCT model through the training objectives of all these tasks.
Comprehensive experiments on four language directions verify the effectiveness and superiority of the proposed approach.
- Score: 39.995680617671184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Chat Translation (NCT) aims to translate conversational text between
speakers of different languages. Despite the promising performance of
sentence-level and context-aware neural machine translation models, there still
remain limitations in current NCT models because the inherent dialogue
characteristics of chat, such as dialogue coherence and speaker personality,
are neglected. In this paper, we propose to promote the chat translation by
introducing the modeling of dialogue characteristics into the NCT model. To
this end, we design four auxiliary tasks including monolingual response
generation, cross-lingual response generation, next utterance discrimination,
and speaker identification. Together with the main chat translation task, we
optimize the NCT model through the training objectives of all these tasks. By
this means, the NCT model can be enhanced by capturing the inherent dialogue
characteristics, thus generating more coherent and speaker-relevant
translations. Comprehensive experiments on four language directions
(English-German and English-Chinese) verify the effectiveness and superiority
of the proposed approach.
Related papers
- FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for
Task-Oriented Dialogue [20.79359173822053]
We propose a novel dialogue pre-training model, FutureTOD, which distills future knowledge to the representation of the previous dialogue context.
Our intuition is that a good dialogue representation both learns local context information and predicts future information.
arXiv Detail & Related papers (2023-06-17T10:40:07Z) - A Multi-task Multi-stage Transitional Training Framework for Neural Chat
Translation [84.59697583372888]
Neural chat translation (NCT) aims to translate a cross-lingual chat between speakers of different languages.
Existing context-aware NMT models cannot achieve satisfactory performances due to limited resources of annotated bilingual dialogues.
We propose a multi-task multi-stage transitional (MMT) training framework, where an NCT model is trained using the bilingual chat translation dataset and additional monolingual dialogues.
arXiv Detail & Related papers (2023-01-27T14:41:16Z) - STRUDEL: Structured Dialogue Summarization for Dialogue Comprehension [42.57581945778631]
Abstractive dialogue summarization has long been viewed as an important standalone task in natural language processing.
We propose a novel type of dialogue summarization task - STRUctured DiaLoguE Summarization.
We show that our STRUDEL dialogue comprehension model can significantly improve the dialogue comprehension performance of transformer encoder language models.
arXiv Detail & Related papers (2022-12-24T04:39:54Z) - Scheduled Multi-task Learning for Neural Chat Translation [66.81525961469494]
We propose a scheduled multi-task learning framework for Neural Chat Translation (NCT)
Specifically, we devise a three-stage training framework to incorporate the large-scale in-domain chat translation data into training.
Extensive experiments in four language directions verify the effectiveness and superiority of the proposed approach.
arXiv Detail & Related papers (2022-05-08T02:57:28Z) - Back to the Future: Bidirectional Information Decoupling Network for
Multi-turn Dialogue Modeling [80.51094098799736]
We propose Bidirectional Information Decoupling Network (BiDeN) as a universal dialogue encoder.
BiDeN explicitly incorporates both the past and future contexts and can be generalized to a wide range of dialogue-related tasks.
Experimental results on datasets of different downstream tasks demonstrate the universality and effectiveness of our BiDeN.
arXiv Detail & Related papers (2022-04-18T03:51:46Z) - Modeling Bilingual Conversational Characteristics for Neural Chat
Translation [24.94474722693084]
We aim to promote the translation quality of conversational text by modeling the above properties.
We evaluate our approach on the benchmark dataset BConTrasT (English-German) and a self-collected bilingual dialogue corpus, named BMELD (English-Chinese)
Our approach notably boosts the performance over strong baselines by a large margin and significantly surpasses some state-of-the-art context-aware NMT models in terms of BLEU and TER.
arXiv Detail & Related papers (2021-07-23T12:23:34Z) - Structured Attention for Unsupervised Dialogue Structure Induction [110.12561786644122]
We propose to incorporate structured attention layers into a Variational Recurrent Neural Network (VRNN) model with discrete latent states to learn dialogue structure in an unsupervised fashion.
Compared to a vanilla VRNN, structured attention enables a model to focus on different parts of the source sentence embeddings while enforcing a structural inductive bias.
arXiv Detail & Related papers (2020-09-17T23:07:03Z) - TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented
Dialogue [113.45485470103762]
In this work, we unify nine human-human and multi-turn task-oriented dialogue datasets for language modeling.
To better model dialogue behavior during pre-training, we incorporate user and system tokens into the masked language modeling.
arXiv Detail & Related papers (2020-04-15T04:09:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.