Auto Correcting in the Process of Translation -- Multi-task Learning
Improves Dialogue Machine Translation
- URL: http://arxiv.org/abs/2103.16189v1
- Date: Tue, 30 Mar 2021 09:12:47 GMT
- Title: Auto Correcting in the Process of Translation -- Multi-task Learning
Improves Dialogue Machine Translation
- Authors: Tao Wang, Chengqi Zhao, Mingxuan Wang, Lei Li, Deyi Xiong
- Abstract summary: We conduct a deep analysis of a dialogue corpus and summarize three major issues on dialogue translation.
We propose a joint learning method to identify omission and typo, and utilize context to translate dialogue utterances.
Our experiments show that the proposed method improves translation quality by 3.2 BLEU over the baselines.
- Score: 31.247920419523066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic translation of dialogue texts is a much needed demand in many real
life scenarios. However, the currently existing neural machine translation
delivers unsatisfying results. In this paper, we conduct a deep analysis of a
dialogue corpus and summarize three major issues on dialogue translation,
including pronoun dropping (\droppro), punctuation dropping (\droppun), and
typos (\typo). In response to these challenges, we propose a joint learning
method to identify omission and typo, and utilize context to translate dialogue
utterances. To properly evaluate the performance, we propose a manually
annotated dataset with 1,931 Chinese-English parallel utterances from 300
dialogues as a benchmark testbed for dialogue translation. Our experiments show
that the proposed method improves translation quality by 3.2 BLEU over the
baselines. It also elevates the recovery rate of omitted pronouns from 26.09%
to 47.16%. We will publish the code and dataset publicly at
https://github.com/rgwt123/DialogueMT.
Related papers
- Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective.
We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way.
We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z) - Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue
Questions with LLMs [59.74002011562726]
We propose a novel linguistic cue-based chain-of-thoughts (textitCue-CoT) to provide a more personalized and engaging response.
We build a benchmark with in-depth dialogue questions, consisting of 6 datasets in both Chinese and English.
Empirical results demonstrate our proposed textitCue-CoT method outperforms standard prompting methods in terms of both textithelpfulness and textitacceptability on all datasets.
arXiv Detail & Related papers (2023-05-19T16:27:43Z) - HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context.
We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z) - Don't Copy the Teacher: Data and Model Challenges in Embodied Dialogue [92.01165203498299]
Embodied dialogue instruction following requires an agent to complete a complex sequence of tasks from a natural language exchange.
This paper argues that imitation learning (IL) and related low-level metrics are actually misleading and do not align with the goals of embodied dialogue research.
arXiv Detail & Related papers (2022-10-10T05:51:40Z) - Controllable Dialogue Simulation with In-Context Learning [39.04491297557292]
textscDialogic is a dialogue simulation method based on large language model in-context learning.
Our method can rapidly expand a small set of dialogue data with minimum or zero human involvement.
Our simulated dialogues have near-human fluency and annotation accuracy.
arXiv Detail & Related papers (2022-10-09T06:32:58Z) - Contextual Semantic Parsing for Multilingual Task-Oriented Dialogues [7.8378818005171125]
Given a large-scale dialogue data set in one language, we can automatically produce an effective semantic for other languages using machine translation.
We propose automatic translation of dialogue datasets with alignment to ensure faithful translation of slot values.
We show that the succinct representation reduces the compounding effect of translation errors.
arXiv Detail & Related papers (2021-11-04T01:08:14Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Document-aligned Japanese-English Conversation Parallel Corpus [4.793904440030568]
Sentence-level (SL) machine translation (MT) has reached acceptable quality for many high-resourced languages, but not document-level (DL) MT.
We present a document-aligned Japanese-English conversation corpus, including balanced, high-quality business conversation data for tuning and testing.
We train MT models using our corpus to demonstrate how using context leads to improvements.
arXiv Detail & Related papers (2020-12-11T06:03:33Z) - Rethinking Dialogue State Tracking with Reasoning [76.0991910623001]
This paper proposes to track dialogue states gradually with reasoning over dialogue turns with the help of the back-end data.
Empirical results demonstrate that our method significantly outperforms the state-of-the-art methods by 38.6% in terms of joint belief accuracy for MultiWOZ 2.1.
arXiv Detail & Related papers (2020-05-27T02:05:33Z) - Diversifying Dialogue Generation with Non-Conversational Text [38.03510529185192]
We propose a new perspective to diversify dialogue generation by leveraging non-conversational text.
We collect a large-scale non-conversational corpus from multi sources including forum comments, idioms and book snippets.
The resulting model is tested on two conversational datasets and is shown to produce significantly more diverse responses without sacrificing the relevance with context.
arXiv Detail & Related papers (2020-05-09T02:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.