Analyzing Context Contributions in LLM-based Machine Translation
- URL: http://arxiv.org/abs/2410.16246v1
- Date: Mon, 21 Oct 2024 17:51:41 GMT
- Title: Analyzing Context Contributions in LLM-based Machine Translation
- Authors: Emmanouil Zaranis, Nuno M. Guerreiro, André F. T. Martins,
- Abstract summary: Large language models (LLMs) have achieved state-of-the-art performance in machine translation (MT)
We study how LLMs use various context parts, such as few-shot examples and the source text, when generating translations.
Our findings shed light on the internal workings of LLM-based MT which go beyond those known for standard encoder-decoder MT models.
- Score: 21.95318929582271
- License:
- Abstract: Large language models (LLMs) have achieved state-of-the-art performance in machine translation (MT) and demonstrated the ability to leverage in-context learning through few-shot examples. However, the mechanisms by which LLMs use different parts of the input context remain largely unexplored. In this work, we provide a comprehensive analysis of context utilization in MT, studying how LLMs use various context parts, such as few-shot examples and the source text, when generating translations. We highlight several key findings: (1) the source part of few-shot examples appears to contribute more than its corresponding targets, irrespective of translation direction; (2) finetuning LLMs with parallel data alters the contribution patterns of different context parts; and (3) there is a positional bias where earlier few-shot examples have higher contributions to the translated sequence. Finally, we demonstrate that inspecting anomalous context contributions can potentially uncover pathological translations, such as hallucinations. Our findings shed light on the internal workings of LLM-based MT which go beyond those known for standard encoder-decoder MT models.
Related papers
- Effective Self-Mining of In-Context Examples for Unsupervised Machine Translation with LLMs [16.98133269527045]
We propose an unsupervised approach to mine in-context examples for machine translation (MT)
We introduce a filtering criterion to select the optimal in-context examples from a pool of unsupervised parallel sentences.
Our findings demonstrate the effectiveness of our unsupervised approach in mining in-context examples for MT.
arXiv Detail & Related papers (2024-10-14T18:47:04Z) - Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem [4.830018386227]
This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline.
We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of context retrieved from a constrained database of digitized pedagogical materials and parallel corpora.
arXiv Detail & Related papers (2024-06-21T20:02:22Z) - TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning [38.89119606657543]
In contrast to sentence-level translation, document-level translation (DOCMT) by large language models (LLMs) based on in-context learning faces two major challenges.
We propose a Context-Aware Prompting method (CAP) to generate more accurate, cohesive, and coherent translations via in-context learning.
We conduct extensive experiments across various DOCMT tasks, and the results demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-11T09:11:17Z) - A Preference-driven Paradigm for Enhanced Translation with Large Language Models [33.51585908894444]
Large language models (LLMs) can achieve remarkable translation performance using only a small amount of parallel data.
SFT simply instructs the model to imitate the reference translations at the token level, making it vulnerable to the noise present in the references.
We propose a preference-based approach built upon the Plackett-Luce model to overcome this plateau.
arXiv Detail & Related papers (2024-04-17T11:52:47Z) - Building Accurate Translation-Tailored LLMs with Language Aware Instruction Tuning [57.323716555996114]
Off-target translation remains an unsolved problem, especially for low-resource languages.
Recent works have either designed advanced prompting strategies to highlight the functionality of translation instructions or exploited the in-context learning ability of LLMs.
In this work, we design a two-stage fine-tuning algorithm to improve the instruction-following ability (especially the translation direction) of LLMs.
arXiv Detail & Related papers (2024-03-21T13:47:40Z) - Adapting Large Language Models for Document-Level Machine Translation [46.370862171452444]
Large language models (LLMs) have significantly advanced various natural language processing (NLP) tasks.
Recent research indicates that moderately-sized LLMs often outperform larger ones after task-specific fine-tuning.
This study focuses on adapting LLMs for document-level machine translation (DocMT) for specific language pairs.
arXiv Detail & Related papers (2024-01-12T09:29:13Z) - Towards Effective Disambiguation for Machine Translation with Large
Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences"
Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z) - Exploring Human-Like Translation Strategy with Large Language Models [93.49333173279508]
Large language models (LLMs) have demonstrated impressive capabilities in general scenarios.
This work proposes the MAPS framework, which stands for Multi-Aspect Prompting and Selection.
We employ a selection mechanism based on quality estimation to filter out noisy and unhelpful knowledge.
arXiv Detail & Related papers (2023-05-06T19:03:12Z) - Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT)
This paper systematically investigates the advantages and challenges of LLMs for MMT.
We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z) - Dictionary-based Phrase-level Prompting of Large Language Models for
Machine Translation [91.57514888410205]
Large language models (LLMs) demonstrate remarkable machine translation (MT) abilities via prompting.
LLMs can struggle to translate inputs with rare words, which are common in low resource or domain transfer scenarios.
We show that LLM prompting can provide an effective solution for rare words as well, by using prior knowledge from bilingual dictionaries to provide control hints in the prompts.
arXiv Detail & Related papers (2023-02-15T18:46:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.