Related papers: Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods

Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods

URL: http://arxiv.org/abs/2409.17939v1
Date: Thu, 26 Sep 2024 15:12:59 GMT
Title: Predicting Anchored Text from Translation Memories for Machine Translation Using Deep Learning Methods
Authors: Richard Yue, John E. Ortega
Abstract summary: We show that for anchored words that follow the continuous bag-of-words (CBOW) paradigm, Word2Vec, BERT, and GPT-4 can be used. For some cases, better results than neural machine translation for translating anchored words from French to English.
Score: 2.44755919161855
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Translation memories (TMs) are the backbone for professional translation tools called computer-aided translation (CAT) tools. In order to perform a translation using a CAT tool, a translator uses the TM to gather translations similar to the desired segment to translate (s'). Many CAT tools offer a fuzzy-match algorithm to locate segments (s) in the TM that are close in distance to s'. After locating two similar segments, the CAT tool will present parallel segments (s, t) that contain one segment in the source language along with its translation in the target language. Additionally, CAT tools contain fuzzy-match repair (FMR) techniques that will automatically use the parallel segments from the TM to create new TM entries containing a modified version of the original with the idea in mind that it will be the translation of s'. Most FMR techniques use machine translation as a way of "repairing" those words that have to be modified. In this article, we show that for a large part of those words which are anchored, we can use other techniques that are based on machine learning approaches such as Word2Vec. BERT, and even ChatGPT. Specifically, we show that for anchored words that follow the continuous bag-of-words (CBOW) paradigm, Word2Vec, BERT, and GPT-4 can be used to achieve similar and, for some cases, better results than neural machine translation for translating anchored words from French to English.

Related papers

DRT: Deep Reasoning Translation via Long Chain-of-Thought [73.21414200780171]
We introduce DRT, an attempt to bring the success of long chain-of-thought to neural machine translation (MT)<n>We first mine sentences containing similes or metaphors from existing literature books, and then develop a multi-agent framework to translate these sentences via long thought.<n>Using Qwen2.5 and LLama-3.1 as the backbones, DRT models can learn the thought process during machine translation.
arXiv Detail & Related papers (2024-12-23T11:55:33Z)
Creating Domain-Specific Translation Memories for Machine Translation Fine-tuning: The TRENCARD Bilingual Cardiology Corpus [0.0]
The article introduces a semi-automatic TM preparation methodology leveraging primarily translation tools used by translators. The resulting corpus called TRENCARD Corpus has approximately 800,000 source words and 50,000 sentences.
arXiv Detail & Related papers (2024-09-04T12:48:30Z)
LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries. Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z)
Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval. We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning. On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z)
Subword Segmental Machine Translation: Unifying Segmentation and Target Sentence Generation [7.252933737829635]
Subword segmental machine translation (SSMT) learns to segment target sentence words while jointly learning to generate target sentences. Experiments across 6 translation directions show that SSMT improves chrF scores for morphologically rich agglutinative languages.
arXiv Detail & Related papers (2023-05-11T17:44:29Z)
Neural Machine Translation with Contrastive Translation Memories [71.86990102704311]
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios. We propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence. In training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence.
arXiv Detail & Related papers (2022-12-06T17:10:17Z)
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters [66.7986513246294]
We study the compositionality of language and domain adapters in the context of Machine Translation. We find that in the partial resource scenario a naive combination of domain-specific and language-specific adapters often results in catastrophic forgetting' of the missing languages.
arXiv Detail & Related papers (2021-10-18T18:55:23Z)
Neural Machine Translation with Monolingual Translation Memory [58.98657907678992]
We propose a new framework that uses monolingual memory and performs learnable memory retrieval in a cross-lingual manner. Experiments show that the proposed method obtains substantial improvements.
arXiv Detail & Related papers (2021-05-24T13:35:19Z)
Bootstrapping a Crosslingual Semantic Parser [74.99223099702157]
We adapt a semantic trained on a single language, such as English, to new languages and multiple domains with minimal annotation. We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models.
arXiv Detail & Related papers (2020-04-06T12:05:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.