Related papers: Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models

Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models

URL: http://arxiv.org/abs/2407.16470v2
Date: Thu, 25 Jul 2024 16:31:39 GMT
Title: Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models
Authors: Kenza Benkirane, Laura Gongas, Shahar Pelles, Naomi Fuchs, Joshua Darmon, Pontus Stenetorp, David Ifeoluwa Adelani, Eduardo Sánchez,
Abstract summary: This paper evaluates hallucination detection approaches using Large Language Models (LLMs) and semantic similarity within massively multilingual embeddings. LLMs can achieve performance comparable or even better than previously proposed models, despite not being explicitly trained for any machine translation task.
Score: 12.447489454369636
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advancements in massively multilingual machine translation systems have significantly enhanced translation accuracy; however, even the best performing systems still generate hallucinations, severely impacting user trust. Detecting hallucinations in Machine Translation (MT) remains a critical challenge, particularly since existing methods excel with High-Resource Languages (HRLs) but exhibit substantial limitations when applied to Low-Resource Languages (LRLs). This paper evaluates hallucination detection approaches using Large Language Models (LLMs) and semantic similarity within massively multilingual embeddings. Our study spans 16 language directions, covering HRLs, LRLs, with diverse scripts. We find that the choice of model is essential for performance. On average, for HRLs, Llama3-70B outperforms the previous state of the art by as much as 0.16 MCC (Matthews Correlation Coefficient). However, for LRLs we observe that Claude Sonnet outperforms other LLMs on average by 0.03 MCC. The key takeaway from our study is that LLMs can achieve performance comparable or even better than previously proposed models, despite not being explicitly trained for any machine translation task. However, their advantage is less significant for LRLs.

Related papers

Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From [61.63091726904068]
We evaluate the cross-lingual context retrieval ability of over 40 large language models (LLMs) across 12 languages. Several small, post-trained open LLMs show strong cross-lingual context retrieval ability. Our results also indicate that larger-scale pretraining cannot improve the xMRC performance.
arXiv Detail & Related papers (2025-04-15T06:35:27Z)
Is LLM the Silver Bullet to Low-Resource Languages Machine Translation? [14.55410092719299]
Low-Resource Languages (LRLs) present significant challenges in natural language processing due to their limited linguistic resources and underrepresentation in standard datasets. This paper systematically evaluates the limitations of current Large Language Models (LLMs) across 200 languages using benchmarks such as FLORES-200.
arXiv Detail & Related papers (2025-03-31T13:56:03Z)
What do Large Language Models Need for Machine Translation Evaluation? [12.42394213466485]
Large language models (LLMs) can achieve results comparable to fine-tuned multilingual pre-trained language models. This paper explores what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate machine translation quality.
arXiv Detail & Related papers (2024-10-04T09:50:45Z)
Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation [62.202893186343935]
We explore what it would take to adapt Large Language Models for low-resource languages. We show that parallel data is critical during both pre-training andSupervised Fine-Tuning (SFT) Our experiments with three LLMs across two low-resourced language groups reveal consistent trends, underscoring the generalizability of our findings.
arXiv Detail & Related papers (2024-08-23T00:59:38Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages [60.162717568496355]
Large language models (LLMs) have been pre-trained on multilingual corpora. Their performance still lags behind in most languages compared to a few resource-rich languages.
arXiv Detail & Related papers (2024-02-19T15:07:32Z)
Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages [51.301942056881146]
We investigate how large language models (LLMs) function as rerankers in cross-lingual information retrieval systems for African languages. Our implementation covers English and four African languages (Hausa, Somali, Swahili, and Yoruba) We examine cross-lingual reranking with queries in English and passages in the African languages.
arXiv Detail & Related papers (2023-12-26T18:38:54Z)
ChatGPT MT: Competitive for High- (but not Low-) Resource Languages [62.178282377729566]
Large language models (LLMs) implicitly learn to perform a range of language tasks, including machine translation (MT) We present the first experimental evidence for an expansive set of 204 languages, along with MT cost analysis. Our analysis reveals that a language's resource level is the most important feature in determining ChatGPT's relative ability to translate it.
arXiv Detail & Related papers (2023-09-14T04:36:00Z)
When your Cousin has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages [29.346191691508125]
Unsupervised bilingual lexicon induction is most likely to be useful for low-resource languages, where large datasets are not available. We show that state-of-the-art BLI methods in the literature exhibit near-zero performance for severely data-imbalanced language pairs. We present a new method for unsupervised BLI between a related LRL and HRL that only requires inference on a masked language model of the HRL.
arXiv Detail & Related papers (2023-05-23T12:49:21Z)
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models [100.47154959254937]
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) We present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities.
arXiv Detail & Related papers (2023-05-11T05:19:47Z)
CharSpan: Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages [22.51558549091902]
We address the task of machine translation (MT) from extremely low-resource language (ELRL) to English by leveraging cross-lingual transfer from 'closely-related' high-resource language (HRL) Many ELRLs share lexical similarities with some HRLs, which presents a novel modeling opportunity. Existing subword-based neural MT models do not explicitly harness this lexical similarity, as they only implicitly align HRL and ELRL latent embedding space. We propose a novel, CharSpan, approach based on 'character-span noise augmentation' into the training data of HRL. This serves as a
arXiv Detail & Related papers (2023-05-09T07:23:01Z)
Letz Translate: Low-Resource Machine Translation for Luxembourgish [4.860100893494234]
We build resource-efficient models based on German, knowledge distillation from the multilingual No Language Left Behind model, and pseudo-translation. We find that our efficient models are more than 30% faster and perform only 4% lower compared to the large state-of-the-art NLLB model.
arXiv Detail & Related papers (2023-03-02T15:26:46Z)
Overlap-based Vocabulary Generation Improves Cross-lingual Transfer Among Related Languages [18.862296065737347]
We argue that relatedness among languages in a language family along the dimension of lexical overlap may be leveraged to overcome some of the corpora limitations of LRLs. We propose Overlap BPE, a simple yet effective modification to the BPE vocabulary generation algorithm which enhances overlap across related languages.
arXiv Detail & Related papers (2022-03-03T19:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.