Predicting Anchored Text from Translation Memories for Machine
Translation Using Deep Learning Methods
- URL: http://arxiv.org/abs/2409.17939v1
- Date: Thu, 26 Sep 2024 15:12:59 GMT
- Title: Predicting Anchored Text from Translation Memories for Machine
Translation Using Deep Learning Methods
- Authors: Richard Yue, John E. Ortega
- Abstract summary: We show that for anchored words that follow the continuous bag-of-words (CBOW) paradigm, Word2Vec, BERT, and GPT-4 can be used.
For some cases, better results than neural machine translation for translating anchored words from French to English.
- Score: 2.44755919161855
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Translation memories (TMs) are the backbone for professional translation
tools called computer-aided translation (CAT) tools. In order to perform a
translation using a CAT tool, a translator uses the TM to gather translations
similar to the desired segment to translate (s'). Many CAT tools offer a
fuzzy-match algorithm to locate segments (s) in the TM that are close in
distance to s'. After locating two similar segments, the CAT tool will present
parallel segments (s, t) that contain one segment in the source language along
with its translation in the target language. Additionally, CAT tools contain
fuzzy-match repair (FMR) techniques that will automatically use the parallel
segments from the TM to create new TM entries containing a modified version of
the original with the idea in mind that it will be the translation of s'. Most
FMR techniques use machine translation as a way of "repairing" those words that
have to be modified. In this article, we show that for a large part of those
words which are anchored, we can use other techniques that are based on machine
learning approaches such as Word2Vec. BERT, and even ChatGPT. Specifically, we
show that for anchored words that follow the continuous bag-of-words (CBOW)
paradigm, Word2Vec, BERT, and GPT-4 can be used to achieve similar and, for
some cases, better results than neural machine translation for translating
anchored words from French to English.
Related papers
- DRT: Deep Reasoning Translation via Long Chain-of-Thought [73.21414200780171]
We introduce DRT, an attempt to bring the success of long chain-of-thought to neural machine translation (MT)<n>We first mine sentences containing similes or metaphors from existing literature books, and then develop a multi-agent framework to translate these sentences via long thought.<n>Using Qwen2.5 and LLama-3.1 as the backbones, DRT models can learn the thought process during machine translation.
arXiv Detail & Related papers (2024-12-23T11:55:33Z) - Creating Domain-Specific Translation Memories for Machine Translation Fine-tuning: The TRENCARD Bilingual Cardiology Corpus [0.0]
The article introduces a semi-automatic TM preparation methodology leveraging primarily translation tools used by translators.
The resulting corpus called TRENCARD Corpus has approximately 800,000 source words and 50,000 sentences.
arXiv Detail & Related papers (2024-09-04T12:48:30Z) - LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries.
Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - Subword Segmental Machine Translation: Unifying Segmentation and Target
Sentence Generation [7.252933737829635]
Subword segmental machine translation (SSMT) learns to segment target sentence words while jointly learning to generate target sentences.
Experiments across 6 translation directions show that SSMT improves chrF scores for morphologically rich agglutinative languages.
arXiv Detail & Related papers (2023-05-11T17:44:29Z) - Neural Machine Translation with Contrastive Translation Memories [71.86990102704311]
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios.
We propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence.
In training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence.
arXiv Detail & Related papers (2022-12-06T17:10:17Z) - Multilingual Domain Adaptation for NMT: Decoupling Language and Domain
Information with Adapters [66.7986513246294]
We study the compositionality of language and domain adapters in the context of Machine Translation.
We find that in the partial resource scenario a naive combination of domain-specific and language-specific adapters often results in catastrophic forgetting' of the missing languages.
arXiv Detail & Related papers (2021-10-18T18:55:23Z) - Neural Machine Translation with Monolingual Translation Memory [58.98657907678992]
We propose a new framework that uses monolingual memory and performs learnable memory retrieval in a cross-lingual manner.
Experiments show that the proposed method obtains substantial improvements.
arXiv Detail & Related papers (2021-05-24T13:35:19Z) - Bootstrapping a Crosslingual Semantic Parser [74.99223099702157]
We adapt a semantic trained on a single language, such as English, to new languages and multiple domains with minimal annotation.
We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models.
arXiv Detail & Related papers (2020-04-06T12:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.