A Multi-task Multi-stage Transitional Training Framework for Neural Chat
Translation
- URL: http://arxiv.org/abs/2301.11749v1
- Date: Fri, 27 Jan 2023 14:41:16 GMT
- Title: A Multi-task Multi-stage Transitional Training Framework for Neural Chat
Translation
- Authors: Chulun Zhou, Yunlong Liang, Fandong Meng, Jie Zhou, Jinan Xu, Hongji
Wang, Min Zhang and Jinsong Su
- Abstract summary: Neural chat translation (NCT) aims to translate a cross-lingual chat between speakers of different languages.
Existing context-aware NMT models cannot achieve satisfactory performances due to limited resources of annotated bilingual dialogues.
We propose a multi-task multi-stage transitional (MMT) training framework, where an NCT model is trained using the bilingual chat translation dataset and additional monolingual dialogues.
- Score: 84.59697583372888
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Neural chat translation (NCT) aims to translate a cross-lingual chat between
speakers of different languages. Existing context-aware NMT models cannot
achieve satisfactory performances due to the following inherent problems: 1)
limited resources of annotated bilingual dialogues; 2) the neglect of modelling
conversational properties; 3) training discrepancy between different stages. To
address these issues, in this paper, we propose a multi-task multi-stage
transitional (MMT) training framework, where an NCT model is trained using the
bilingual chat translation dataset and additional monolingual dialogues. We
elaborately design two auxiliary tasks, namely utterance discrimination and
speaker discrimination, to introduce the modelling of dialogue coherence and
speaker characteristic into the NCT model. The training process consists of
three stages: 1) sentence-level pre-training on large-scale parallel corpus; 2)
intermediate training with auxiliary tasks using additional monolingual
dialogues; 3) context-aware fine-tuning with gradual transition. Particularly,
the second stage serves as an intermediate phase that alleviates the training
discrepancy between the pre-training and fine-tuning stages. Moreover, to make
the stage transition smoother, we train the NCT model using a gradual
transition strategy, i.e., gradually transiting from using monolingual to
bilingual dialogues. Extensive experiments on two language pairs demonstrate
the effectiveness and superiority of our proposed training framework.
Related papers
- VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - Cross-Align: Modeling Deep Cross-lingual Interactions for Word Alignment [63.0407314271459]
The proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
Experiments show that the proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.
arXiv Detail & Related papers (2022-10-09T02:24:35Z) - Scheduled Multi-task Learning for Neural Chat Translation [66.81525961469494]
We propose a scheduled multi-task learning framework for Neural Chat Translation (NCT)
Specifically, we devise a three-stage training framework to incorporate the large-scale in-domain chat translation data into training.
Extensive experiments in four language directions verify the effectiveness and superiority of the proposed approach.
arXiv Detail & Related papers (2022-05-08T02:57:28Z) - MVP: Multi-Stage Vision-Language Pre-Training via Multi-Level Semantic
Alignment [24.720485548282845]
We introduce concepts in both modalities to construct two-level semantic representations for language and vision.
We train the cross-modality model in two stages, namely, uni-modal learning and cross-modal learning.
Our model generates the-state-of-the-art results on several vision and language tasks.
arXiv Detail & Related papers (2022-01-29T14:30:59Z) - Towards Making the Most of Dialogue Characteristics for Neural Chat
Translation [39.995680617671184]
We propose introducing to promote the chat translation by the modeling of dialogue characteristics into the NCT model.
We optimize the NCT model through the training objectives of all these tasks.
Comprehensive experiments on four language directions verify the effectiveness and superiority of the proposed approach.
arXiv Detail & Related papers (2021-09-02T02:04:00Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.