Lingua Custodia's participation at the WMT 2021 Machine Translation
using Terminologies shared task
- URL: http://arxiv.org/abs/2111.02120v1
- Date: Wed, 3 Nov 2021 10:36:32 GMT
- Title: Lingua Custodia's participation at the WMT 2021 Machine Translation
using Terminologies shared task
- Authors: Melissa Ailem, Jinghsu Liu, Raheel Qader
- Abstract summary: We consider three directions, namely English to French, Russian, and Chinese.
We introduce two main changes to the standard procedure to handle terminologies.
Our method satisfies most terminology constraints while maintaining high translation quality.
- Score: 3.3108924994485096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes Lingua Custodia's submission to the WMT21 shared task on
machine translation using terminologies. We consider three directions, namely
English to French, Russian, and Chinese. We rely on a Transformer-based
architecture as a building block, and we explore a method which introduces two
main changes to the standard procedure to handle terminologies. The first one
consists in augmenting the training data in such a way as to encourage the
model to learn a copy behavior when it encounters terminology constraint terms.
The second change is constraint token masking, whose purpose is to ease copy
behavior learning and to improve model generalization. Empirical results show
that our method satisfies most terminology constraints while maintaining high
translation quality.
Related papers
- Efficient Technical Term Translation: A Knowledge Distillation Approach for Parenthetical Terminology Translation [0.0]
This paper addresses the challenge of accurately translating technical terms, which are crucial for clear communication in specialized fields.
We introduce the Parenthetical Terminology Translation (PTT) task, designed to mitigate potential inaccuracies by displaying the original term in parentheses alongside its translation.
We developed a novel evaluation metric to assess both overall translation accuracy and the correct parenthetical presentation of terms.
arXiv Detail & Related papers (2024-10-01T13:40:28Z) - Domain Terminology Integration into Machine Translation: Leveraging
Large Language Models [3.178046741931973]
This paper discusses the methods that we used for our submissions to the WMT 2023 Terminology Shared Task for German-to-English (DE-EN), English-to-Czech (EN-CS), and Chinese-to-English (ZH-EN) language pairs.
The task aims to advance machine translation (MT) by challenging participants to develop systems that accurately translate technical terms.
arXiv Detail & Related papers (2023-10-22T23:25:28Z) - Terminology-Aware Translation with Constrained Decoding and Large
Language Model Prompting [11.264272119913311]
We submit to the WMT 2023 terminology translation task.
We adopt a translate-then-refine approach which can be domain-independent and requires minimal manual efforts.
Results show that our terminology-aware model learns to incorporate terminologies effectively.
arXiv Detail & Related papers (2023-10-09T16:08:23Z) - Dual-Alignment Pre-training for Cross-lingual Sentence Embedding [79.98111074307657]
We propose a dual-alignment pre-training (DAP) framework for cross-lingual sentence embedding.
We introduce a novel representation translation learning (RTL) task, where the model learns to use one-side contextualized token representation to reconstruct its translation counterpart.
Our approach can significantly improve sentence embedding.
arXiv Detail & Related papers (2023-05-16T03:53:30Z) - Modeling Target-Side Morphology in Neural Machine Translation: A
Comparison of Strategies [72.56158036639707]
Morphologically rich languages pose difficulties to machine translation.
A large amount of differently inflected word surface forms entails a larger vocabulary.
Some inflected forms of infrequent terms typically do not appear in the training corpus.
Linguistic agreement requires the system to correctly match the grammatical categories between inflected word forms in the output sentence.
arXiv Detail & Related papers (2022-03-25T10:13:20Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - CUNI systems for WMT21: Terminology translation Shared Task [0.0]
The objective of this task is to design a system which translates certain terms based on a provided terminology database.
Our approach is based on providing the desired translations alongside the input sentence and training the model to use these provided terms.
We lemmatize the terms both during the training and inference, to allow the model to learn how to produce correct surface forms of the words.
arXiv Detail & Related papers (2021-09-20T08:05:39Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Sign Language Transformers: Joint End-to-end Sign Language Recognition
and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation.
We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset.
Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z) - Learning Coupled Policies for Simultaneous Machine Translation using
Imitation Learning [85.70547744787]
We present an approach to efficiently learn a simultaneous translation model with coupled programmer-interpreter policies.
Experiments on six language-pairs show our method outperforms strong baselines in terms of translation quality.
arXiv Detail & Related papers (2020-02-11T10:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.