Findings of the WMT 2023 Shared Task on Discourse-Level Literary
Translation: A Fresh Orb in the Cosmos of LLMs
- URL: http://arxiv.org/abs/2311.03127v1
- Date: Mon, 6 Nov 2023 14:23:49 GMT
- Title: Findings of the WMT 2023 Shared Task on Discourse-Level Literary
Translation: A Fresh Orb in the Cosmos of LLMs
- Authors: Longyue Wang, Zhaopeng Tu, Yan Gu, Siyou Liu, Dian Yu, Qingsong Ma,
Chenyang Lyu, Liting Zhou, Chao-Hong Liu, Yufeng Ma, Weiyu Chen, Yvette
Graham, Bonnie Webber, Philipp Koehn, Andy Way, Yulin Yuan, Shuming Shi
- Abstract summary: We release a copyrighted and document-level Chinese-English web novel corpus.
This year, we totally received 14 submissions from 7 academia and industry teams.
The official ranking of the systems is based on the overall human judgments.
- Score: 80.05205710881789
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Translating literary works has perennially stood as an elusive dream in
machine translation (MT), a journey steeped in intricate challenges. To foster
progress in this domain, we hold a new shared task at WMT 2023, the first
edition of the Discourse-Level Literary Translation. First, we (Tencent AI Lab
and China Literature Ltd.) release a copyrighted and document-level
Chinese-English web novel corpus. Furthermore, we put forth an
industry-endorsed criteria to guide human evaluation process. This year, we
totally received 14 submissions from 7 academia and industry teams. We employ
both automatic and human evaluations to measure the performance of the
submitted systems. The official ranking of the systems is based on the overall
human judgments. In addition, our extensive analysis reveals a series of
interesting findings on literary and discourse-aware MT. We release data,
system outputs, and leaderboard at
http://www2.statmt.org/wmt23/literary-translation-task.html.
Related papers
- How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs [23.247387152595067]
LITEVAL-CORPUS is a parallel corpus comprising multiple verified human translations and outputs from 9 machine translation systems.
We find that Multidimensional Quality Metrics (MQM), as the de facto standard in non-literary human MT evaluation, is inadequate for literary translation.
arXiv Detail & Related papers (2024-10-24T12:48:03Z) - (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [52.18246881218829]
We introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents.
To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP)
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - LFED: A Literary Fiction Evaluation Dataset for Large Language Models [58.85989777743013]
We collect 95 literary fictions that are either originally written in Chinese or translated into Chinese, covering a wide range of topics across several centuries.
We define a question taxonomy with 8 question categories to guide the creation of 1,304 questions.
We conduct an in-depth analysis to ascertain how specific attributes of literary fictions (e.g., novel types, character numbers, the year of publication) impact LLM performance in evaluations.
arXiv Detail & Related papers (2024-05-16T15:02:24Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models [57.80514758695275]
Using large language models (LLMs) for assessing the quality of machine translation (MT) achieves state-of-the-art performance at the system level.
We propose a new prompting method called textbftextttError Analysis Prompting (EAPrompt)
This technique emulates the commonly accepted human evaluation framework - Multidimensional Quality Metrics (MQM) and textitproduces explainable and reliable MT evaluations at both the system and segment level.
arXiv Detail & Related papers (2023-03-24T05:05:03Z) - A Bilingual Parallel Corpus with Discourse Annotations [82.07304301996562]
This paper describes BWB, a large parallel corpus first introduced in Jiang et al. (2022), along with an annotated test set.
The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.
arXiv Detail & Related papers (2022-10-26T12:33:53Z) - Exploring Document-Level Literary Machine Translation with Parallel
Paragraphs from World Literature [35.1398797683712]
We show that literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%.
We train a post-editing model whose output is preferred over normal MT output at a rate of 69% by experts.
arXiv Detail & Related papers (2022-10-25T18:03:34Z) - Learning to Evaluate Translation Beyond English: BLEURT Submissions to
the WMT Metrics 2020 Shared Task [30.889496911261677]
This paper describes our contribution to the WMT 2020 Metrics Shared Task.
We make several submissions based on BLEURT, a metric based on transfer learning.
We show how to combine BLEURT's predictions with those of YiSi and use alternative reference translations to enhance the performance.
arXiv Detail & Related papers (2020-10-08T23:16:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.