CUNI systems for WMT21: Terminology translation Shared Task
- URL: http://arxiv.org/abs/2109.09350v1
- Date: Mon, 20 Sep 2021 08:05:39 GMT
- Title: CUNI systems for WMT21: Terminology translation Shared Task
- Authors: Josef Jon, Michal Nov\'ak, Jo\~ao Paulo Aires, Du\v{s}an Vari\v{s} and
Ond\v{r}ej Bojar
- Abstract summary: The objective of this task is to design a system which translates certain terms based on a provided terminology database.
Our approach is based on providing the desired translations alongside the input sentence and training the model to use these provided terms.
We lemmatize the terms both during the training and inference, to allow the model to learn how to produce correct surface forms of the words.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes Charles University submission for Terminology
translation Shared Task at WMT21. The objective of this task is to design a
system which translates certain terms based on a provided terminology database,
while preserving high overall translation quality. We competed in
English-French language pair. Our approach is based on providing the desired
translations alongside the input sentence and training the model to use these
provided terms. We lemmatize the terms both during the training and inference,
to allow the model to learn how to produce correct surface forms of the words,
when they differ from the forms provided in the terminology database. Our
submission ranked second in Exact Match metric which evaluates the ability of
the model to produce desired terms in the translation.
Related papers
- Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation [0.0]
This paper addresses the challenge of accurately translating technical terms, which are crucial for clear communication in specialized fields.
We introduce the Parenthetical Terminology Translation (PTT) task, designed to mitigate potential inaccuracies by displaying the original term in parentheses alongside its translation.
We developed a novel evaluation metric to assess both overall translation accuracy and the correct parenthetical presentation of terms.
arXiv Detail & Related papers (2024-10-01T13:40:28Z) - Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - Domain Terminology Integration into Machine Translation: Leveraging
Large Language Models [3.178046741931973]
This paper discusses the methods that we used for our submissions to the WMT 2023 Terminology Shared Task for German-to-English (DE-EN), English-to-Czech (EN-CS), and Chinese-to-English (ZH-EN) language pairs.
The task aims to advance machine translation (MT) by challenging participants to develop systems that accurately translate technical terms.
arXiv Detail & Related papers (2023-10-22T23:25:28Z) - Terminology-Aware Translation with Constrained Decoding and Large
Language Model Prompting [11.264272119913311]
We submit to the WMT 2023 terminology translation task.
We adopt a translate-then-refine approach which can be domain-independent and requires minimal manual efforts.
Results show that our terminology-aware model learns to incorporate terminologies effectively.
arXiv Detail & Related papers (2023-10-09T16:08:23Z) - Unify word-level and span-level tasks: NJUNLP's Participation for the
WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task.
Our team submitted predictions for the English-German language pair on all two sub-tasks.
Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z) - KIT's Multilingual Speech Translation System for IWSLT 2023 [58.5152569458259]
We describe our speech translation system for the multilingual track of IWSLT 2023.
The task requires translation into 10 languages of varying amounts of resources.
Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation.
arXiv Detail & Related papers (2023-06-08T16:13:20Z) - Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task [61.34108034582074]
We build our system based on the core idea of UNITE (Unified Translation Evaluation)
During the model pre-training phase, we first apply the pseudo-labeled data examples to continuously pre-train UNITE.
During the fine-tuning phase, we use both Direct Assessment (DA) and Multidimensional Quality Metrics (MQM) data from past years' WMT competitions.
arXiv Detail & Related papers (2022-10-18T08:51:25Z) - Lingua Custodia's participation at the WMT 2021 Machine Translation
using Terminologies shared task [3.3108924994485096]
We consider three directions, namely English to French, Russian, and Chinese.
We introduce two main changes to the standard procedure to handle terminologies.
Our method satisfies most terminology constraints while maintaining high translation quality.
arXiv Detail & Related papers (2021-11-03T10:36:32Z) - Facilitating Terminology Translation with Target Lemma Annotations [4.492630871726495]
We train machine translation systems using a source-side data augmentation method that annotates randomly selected source language words with their target language lemmas.
Experiments on terminology translation into the morphologically complex Baltic and Uralic languages show an improvement of up to 7 BLEU points over baseline systems.
Results of the human evaluation indicate a 47.7% absolute improvement over the previous work in term translation accuracy when translating into Latvian.
arXiv Detail & Related papers (2021-01-25T12:07:20Z) - DiDi's Machine Translation System for WMT2020 [51.296629834996246]
We participate in the translation direction of Chinese->English.
In this direction, we use the Transformer as our baseline model.
As a result, our submission achieves a BLEU score of $36.6$ in Chinese->English.
arXiv Detail & Related papers (2020-10-16T06:25:48Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.