Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus
- URL: http://arxiv.org/abs/2305.11142v1
- Date: Thu, 18 May 2023 17:36:41 GMT
- Title: Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus
- Authors: Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya
Sachan, Ryan Cotterell
- Abstract summary: This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
- Score: 82.07304301996562
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Several recent papers claim human parity at sentence-level Machine
Translation (MT), especially in high-resource languages. Thus, in response, the
MT community has, in part, shifted its focus to document-level translation.
Translating documents requires a deeper understanding of the structure and
meaning of text, which is often captured by various kinds of discourse
phenomena such as consistency, coherence, and cohesion. However, this renders
conventional sentence-level MT evaluation benchmarks inadequate for evaluating
the performance of context-aware MT systems. This paper presents a new dataset
with rich discourse annotations, built upon the large-scale parallel corpus BWB
introduced in Jiang et al. (2022). The new BWB annotation introduces four extra
evaluation aspects, i.e., entity, terminology, coreference, and quotation,
covering 15,095 entity mentions in both languages. Using these annotations, we
systematically investigate the similarities and differences between the
discourse structures of source and target languages, and the challenges they
pose to MT. We discover that MT outputs differ fundamentally from human
translations in terms of their latent discourse structures. This gives us a new
perspective on the challenges and opportunities in document-level MT. We make
our resource publicly available to spur future research in document-level MT
and the generalization to other language translation tasks.
Related papers
- Improving Long Context Document-Level Machine Translation [51.359400776242786]
Document-level context for neural machine translation (NMT) is crucial to improve translation consistency and cohesion.
Many works have been published on the topic of document-level NMT, but most restrict the system to just local context.
We propose a constrained attention variant that focuses the attention on the most relevant parts of the sequence, while simultaneously reducing the memory consumption.
arXiv Detail & Related papers (2023-06-08T13:28:48Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - A Bilingual Parallel Corpus with Discourse Annotations [82.07304301996562]
This paper describes BWB, a large parallel corpus first introduced in Jiang et al. (2022), along with an annotated test set.
The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.
arXiv Detail & Related papers (2022-10-26T12:33:53Z) - DivEMT: Neural Machine Translation Post-Editing Effort Across
Typologically Diverse Languages [5.367993194110256]
DivEMT is the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages.
We assess the impact on translation productivity of two state-of-the-art NMT systems, namely: Google Translate and the open-source multilingual model mBART50.
arXiv Detail & Related papers (2022-05-24T17:22:52Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - Diving Deep into Context-Aware Neural Machine Translation [36.17847243492193]
This paper analyzes the performance of document-level NMT models on four diverse domains.
We find that there is no single best approach to document-level NMT, but rather that different architectures come out on top on different tasks.
arXiv Detail & Related papers (2020-10-19T13:23:12Z) - On the Limitations of Cross-lingual Encoders as Exposed by
Reference-Free Machine Translation Evaluation [55.02832094101173]
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual transfer in supervised downstream tasks or via unsupervised cross-lingual similarity.
This paper concerns ourselves with reference-free machine translation (MT) evaluation where we directly compare source texts to (sometimes low-quality) system translations.
We systematically investigate a range of metrics based on state-of-the-art cross-lingual semantic representations obtained with pretrained M-BERT and LASER.
We find that they perform poorly as semantic encoders for reference-free MT evaluation and identify their two key limitations.
arXiv Detail & Related papers (2020-05-03T22:10:23Z) - Can Your Context-Aware MT System Pass the DiP Benchmark Tests? :
Evaluation Benchmarks for Discourse Phenomena in Machine Translation [7.993547048820065]
We introduce the first of their kind MT benchmark datasets that aim to track and hail improvements across four main discourse phenomena.
Surprisingly, we find that existing context-aware models do not improve discourse-related translations consistently across languages and phenomena.
arXiv Detail & Related papers (2020-04-30T07:15:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.