On Translating Technical Terminology: A Translation Workflow for
Machine-Translated Acronyms
- URL: http://arxiv.org/abs/2409.17943v1
- Date: Thu, 26 Sep 2024 15:18:34 GMT
- Title: On Translating Technical Terminology: A Translation Workflow for
Machine-Translated Acronyms
- Authors: Richard Yue, John E. Ortega, Kenneth Ward Church
- Abstract summary: We find that an important step is being missed: the translation of technical terms, specifically acronyms.
Some state-of-the art machine translation systems like Google Translate which are publicly available can be erroneous when dealing with acronyms.
We propose an additional step to the SL-TL (FR-EN) translation workflow where we first offer a new acronym corpus for public consumption and then experiment with a search-based thresholding algorithm.
- Score: 3.053989095162017
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The typical workflow for a professional translator to translate a document
from its source language (SL) to a target language (TL) is not always focused
on what many language models in natural language processing (NLP) do - predict
the next word in a series of words. While high-resource languages like English
and French are reported to achieve near human parity using common metrics for
measurement such as BLEU and COMET, we find that an important step is being
missed: the translation of technical terms, specifically acronyms. Some
state-of-the art machine translation systems like Google Translate which are
publicly available can be erroneous when dealing with acronyms - as much as 50%
in our findings. This article addresses acronym disambiguation for MT systems
by proposing an additional step to the SL-TL (FR-EN) translation workflow where
we first offer a new acronym corpus for public consumption and then experiment
with a search-based thresholding algorithm that achieves nearly 10% increase
when compared to Google Translate and OpusMT.
Related papers
- Simplifying Translations for Children: Iterative Simplification Considering Age of Acquisition with LLMs [19.023628411128406]
We propose a method that replaces words with high Age of Acquisitions (AoA) in translations with simpler words to match the translations to the user's level.
The experimental results obtained from the dataset show that our method effectively replaces high-AoA words with lower-AoA words.
arXiv Detail & Related papers (2024-08-08T04:57:36Z) - LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries.
Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z) - Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks.
We show that they have yet to attain state-of-the-art performance in Neural Machine Translation.
We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z) - Dictionary-based Phrase-level Prompting of Large Language Models for
Machine Translation [91.57514888410205]
Large language models (LLMs) demonstrate remarkable machine translation (MT) abilities via prompting.
LLMs can struggle to translate inputs with rare words, which are common in low resource or domain transfer scenarios.
We show that LLM prompting can provide an effective solution for rare words as well, by using prior knowledge from bilingual dictionaries to provide control hints in the prompts.
arXiv Detail & Related papers (2023-02-15T18:46:42Z) - DICTDIS: Dictionary Constrained Disambiguation for Improved NMT [50.888881348723295]
We present DictDis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries.
We demonstrate the utility of DictDis via extensive experiments on English-Hindi and English-German sentences in a variety of domains including regulatory, finance, engineering.
arXiv Detail & Related papers (2022-10-13T13:04:16Z) - AlphaMWE: Construction of Multilingual Parallel Corpora with MWE
Annotations [5.8010446129208155]
We present the construction of multilingual parallel corpora with annotation of multiword expressions (MWEs)
The languages covered include English, Chinese, Polish, and German.
We present a categorisation of the error types encountered by MT systems in performing MWE related translation.
arXiv Detail & Related papers (2020-11-07T14:28:54Z) - Bootstrapping a Crosslingual Semantic Parser [74.99223099702157]
We adapt a semantic trained on a single language, such as English, to new languages and multiple domains with minimal annotation.
We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models.
arXiv Detail & Related papers (2020-04-06T12:05:02Z) - Language-Independent Tokenisation Rivals Language-Specific Tokenisation
for Word Similarity Prediction [12.376752724719005]
Language-independent tokenisation (LIT) methods do not require labelled language resources or lexicons.
Language-specific tokenisation (LST) methods have a long and established history, and are developed using carefully created lexicons and training resources.
We empirically compare the two approaches using semantic similarity measurement as an evaluation task across a diverse set of languages.
arXiv Detail & Related papers (2020-02-25T16:24:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.