On Translating Technical Terminology: A Translation Workflow for
Machine-Translated Acronyms
- URL: http://arxiv.org/abs/2409.17943v1
- Date: Thu, 26 Sep 2024 15:18:34 GMT
- Title: On Translating Technical Terminology: A Translation Workflow for
Machine-Translated Acronyms
- Authors: Richard Yue, John E. Ortega, Kenneth Ward Church
- Abstract summary: We find that an important step is being missed: the translation of technical terms, specifically acronyms.
Some state-of-the art machine translation systems like Google Translate which are publicly available can be erroneous when dealing with acronyms.
We propose an additional step to the SL-TL (FR-EN) translation workflow where we first offer a new acronym corpus for public consumption and then experiment with a search-based thresholding algorithm.
- Score: 3.053989095162017
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The typical workflow for a professional translator to translate a document
from its source language (SL) to a target language (TL) is not always focused
on what many language models in natural language processing (NLP) do - predict
the next word in a series of words. While high-resource languages like English
and French are reported to achieve near human parity using common metrics for
measurement such as BLEU and COMET, we find that an important step is being
missed: the translation of technical terms, specifically acronyms. Some
state-of-the art machine translation systems like Google Translate which are
publicly available can be erroneous when dealing with acronyms - as much as 50%
in our findings. This article addresses acronym disambiguation for MT systems
by proposing an additional step to the SL-TL (FR-EN) translation workflow where
we first offer a new acronym corpus for public consumption and then experiment
with a search-based thresholding algorithm that achieves nearly 10% increase
when compared to Google Translate and OpusMT.
Related papers
- Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu [53.437954702561065]
In-context machine translation (MT) with large language models (LLMs) is a promising approach for low-resource MT.
This study systematically investigates how each resource and its quality affects the translation performance, with the Manchu language.
Our results indicate that high-quality dictionaries and good parallel examples are very helpful, while grammars hardly help.
arXiv Detail & Related papers (2025-02-17T14:53:49Z) - Towards Global AI Inclusivity: A Large-Scale Multilingual Terminology Dataset (GIST) [19.91873751674613]
GIST is a large-scale multilingual AI terminology dataset containing 5K terms extracted from top AI conference papers spanning 2000 to 2023.
The terms are translated into Arabic, Chinese, French, Japanese, and Russian using a hybrid framework that combines LLMs for extraction with human expertise for translation.
This work aims to address critical gaps in AI terminology resources and fosters global inclusivity and collaboration in AI research.
arXiv Detail & Related papers (2024-12-24T11:50:18Z) - Retrieval-Augmented Machine Translation with Unstructured Knowledge [74.84236945680503]
Retrieval-augmented generation (RAG) introduces additional information to enhance large language models (LLMs)
In machine translation (MT), previous work typically retrieves in-context examples from paired MT corpora, or domain-specific knowledge from knowledge graphs.
In this paper, we study retrieval-augmented MT using unstructured documents.
arXiv Detail & Related papers (2024-12-05T17:00:32Z) - Simplifying Translations for Children: Iterative Simplification Considering Age of Acquisition with LLMs [19.023628411128406]
We propose a method that replaces words with high Age of Acquisitions (AoA) in translations with simpler words to match the translations to the user's level.
The experimental results obtained from the dataset show that our method effectively replaces high-AoA words with lower-AoA words.
arXiv Detail & Related papers (2024-08-08T04:57:36Z) - Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks.
We show that they have yet to attain state-of-the-art performance in Neural Machine Translation.
We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models [67.19567060894563]
Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
arXiv Detail & Related papers (2023-04-26T19:55:52Z) - Dictionary-based Phrase-level Prompting of Large Language Models for
Machine Translation [91.57514888410205]
Large language models (LLMs) demonstrate remarkable machine translation (MT) abilities via prompting.
LLMs can struggle to translate inputs with rare words, which are common in low resource or domain transfer scenarios.
We show that LLM prompting can provide an effective solution for rare words as well, by using prior knowledge from bilingual dictionaries to provide control hints in the prompts.
arXiv Detail & Related papers (2023-02-15T18:46:42Z) - DICTDIS: Dictionary Constrained Disambiguation for Improved NMT [50.888881348723295]
We present DictDis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries.
We demonstrate the utility of DictDis via extensive experiments on English-Hindi and English-German sentences in a variety of domains including regulatory, finance, engineering.
arXiv Detail & Related papers (2022-10-13T13:04:16Z) - AlphaMWE: Construction of Multilingual Parallel Corpora with MWE
Annotations [5.8010446129208155]
We present the construction of multilingual parallel corpora with annotation of multiword expressions (MWEs)
The languages covered include English, Chinese, Polish, and German.
We present a categorisation of the error types encountered by MT systems in performing MWE related translation.
arXiv Detail & Related papers (2020-11-07T14:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.