Towards Personalised and Document-level Machine Translation of Dialogue
- URL: http://arxiv.org/abs/2102.10979v1
- Date: Thu, 11 Feb 2021 09:18:20 GMT
- Title: Towards Personalised and Document-level Machine Translation of Dialogue
- Authors: Sebastian T. Vincent
- Abstract summary: This thesis proposal focuses on PersNMT and DocNMT for the domain of dialogue extracted from TV subtitles in five languages.
Three main challenges are addressed: (1) incorporating extra-textual information directly into NMT systems; (2) improving the machine translation of cohesion devices; and (3) reliable evaluation for PersNMT and DocNMT.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art (SOTA) neural machine translation (NMT) systems translate
texts at sentence level, ignoring context: intra-textual information, like the
previous sentence, and extra-textual information, like the gender of the
speaker. Because of that, some sentences are translated incorrectly.
Personalised NMT (PersNMT) and document-level NMT (DocNMT) incorporate this
information into the translation process. Both fields are relatively new and
previous work within them is limited. Moreover, there are no readily available
robust evaluation metrics for them, which makes it difficult to develop better
systems, as well as track global progress and compare different methods. This
thesis proposal focuses on PersNMT and DocNMT for the domain of dialogue
extracted from TV subtitles in five languages: English, Brazilian Portuguese,
German, French and Polish. Three main challenges are addressed: (1)
incorporating extra-textual information directly into NMT systems; (2)
improving the machine translation of cohesion devices; (3) reliable evaluation
for PersNMT and DocNMT.
Related papers
- Importance-Aware Data Augmentation for Document-Level Neural Machine
Translation [51.74178767827934]
Document-level neural machine translation (DocNMT) aims to generate translations that are both coherent and cohesive.
Due to its longer input length and limited availability of training data, DocNMT often faces the challenge of data sparsity.
We propose a novel Importance-Aware Data Augmentation (IADA) algorithm for DocNMT that augments the training data based on token importance information estimated by the norm of hidden states and training gradients.
arXiv Detail & Related papers (2024-01-27T09:27:47Z) - Improving Long Context Document-Level Machine Translation [51.359400776242786]
Document-level context for neural machine translation (NMT) is crucial to improve translation consistency and cohesion.
Many works have been published on the topic of document-level NMT, but most restrict the system to just local context.
We propose a constrained attention variant that focuses the attention on the most relevant parts of the sequence, while simultaneously reducing the memory consumption.
arXiv Detail & Related papers (2023-06-08T13:28:48Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Phrase-level Active Learning for Neural Machine Translation [107.28450614074002]
We propose an active learning setting where we can spend a given budget on translating in-domain data.
We select both full sentences and individual phrases from unlabelled data in the new domain for routing to human translators.
In a German-English translation task, our active learning approach achieves consistent improvements over uncertainty-based sentence selection methods.
arXiv Detail & Related papers (2021-06-21T19:20:42Z) - Diving Deep into Context-Aware Neural Machine Translation [36.17847243492193]
This paper analyzes the performance of document-level NMT models on four diverse domains.
We find that there is no single best approach to document-level NMT, but rather that different architectures come out on top on different tasks.
arXiv Detail & Related papers (2020-10-19T13:23:12Z) - SJTU-NICT's Supervised and Unsupervised Neural Machine Translation
Systems for the WMT20 News Translation Task [111.91077204077817]
We participated in four translation directions of three language pairs: English-Chinese, English-Polish, and German-Upper Sorbian.
Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques.
In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.
arXiv Detail & Related papers (2020-10-11T00:40:05Z) - Neural Machine Translation: Challenges, Progress and Future [62.75523637241876]
Machine translation (MT) is a technique that leverages computers to translate human languages automatically.
neural machine translation (NMT) models direct mapping between source and target languages with deep neural networks.
This article makes a review of NMT framework, discusses the challenges in NMT and introduces some exciting recent progresses.
arXiv Detail & Related papers (2020-04-13T07:53:57Z) - A Comprehensive Survey of Multilingual Neural Machine Translation [22.96845346423759]
We present a survey on multilingual neural machine translation (MNMT)
MNMT is more promising than its statistical machine translation counterpart because end-to-end modeling and distributed representations open new avenues for research on machine translation.
We first categorize various approaches based on their central use-case and then further categorize them based on resource scenarios, underlying modeling principles, core-issues and challenges.
arXiv Detail & Related papers (2020-01-04T19:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.