Related papers: Ready to Translate, Not to Represent? Bias and Performance Gaps in Multilingual LLMs Across Language Families and Domains

Ready to Translate, Not to Represent? Bias and Performance Gaps in Multilingual LLMs Across Language Families and Domains

URL: http://arxiv.org/abs/2510.07877v2
Date: Fri, 31 Oct 2025 11:31:13 GMT
Title: Ready to Translate, Not to Represent? Bias and Performance Gaps in Multilingual LLMs Across Language Families and Domains
Authors: Md. Faiyaz Abdullah Sayeedi, Md. Mahbub Alam, Subhey Sadi Rahman, Md. Adnanul Islam, Jannatul Ferdous Deepti, Tasnim Mohiuddin, Md Mofijul Islam, Swakkhar Shatabda,
Abstract summary: Large Language Models (LLMs) have redefined Machine Translation (MT)<n>LLMs often exhibit uneven performance across language families and specialized domains.<n>We introduce Translation Tangles, a unified framework and dataset for evaluating the translation quality and fairness of open-source LLMs.
Score: 6.357124887141297
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of Large Language Models (LLMs) has redefined Machine Translation (MT), enabling context-aware and fluent translations across hundreds of languages and textual domains. Despite their remarkable capabilities, LLMs often exhibit uneven performance across language families and specialized domains. Moreover, recent evidence reveals that these models can encode and amplify different biases present in their training data, posing serious concerns for fairness, especially in low-resource languages. To address these gaps, we introduce Translation Tangles, a unified framework and dataset for evaluating the translation quality and fairness of open-source LLMs. Our approach benchmarks 24 bidirectional language pairs across multiple domains using different metrics. We further propose a hybrid bias detection pipeline that integrates rule-based heuristics, semantic similarity filtering, and LLM-based validation. We also introduce a high-quality, bias-annotated dataset based on human evaluations of 1,439 translation-reference pairs. The code and dataset are accessible on GitHub: https://github.com/faiyazabdullah/TranslationTangles

Related papers

BETA-Labeling for Multilingual Dataset Construction in Low-Resource IR [0.06363400715351396]
This work presents a Bangla IR dataset constructed using a BETA-labeling framework.<n>We examine whether IR datasets from other low-resource languages can be effectively reused through one-hop machine translation.
arXiv Detail & Related papers (2026-02-16T06:04:04Z)
Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results [16.391752298134474]
We study the performance of different large language models (LLMs) across languages.<n>We find that there exists a non-negligible and consistent gap in the performance of the models across languages.<n>We propose a method for automatic quality assurance to address the first issue at scale, and give recommendations to address the second one.
arXiv Detail & Related papers (2025-11-07T11:30:10Z)
HPLT 3.0: Very Large-Scale Multilingual Resources for LLM and MT. Mono- and Bi-lingual Data, Multilingual Evaluation, and Pre-Trained Models [25.953042884928006]
We present an initiative to provide open, very large, high-quality, and richly annotated textual datasets for almost 200 languages.<n>At 30 trillion tokens, this is likely the largest generally available multilingual collection of LLM pre-training data.<n>We train and evaluate a family of 57 monolingual encoder-decoder models, as well as a handful of monolingual GPT-like reference models.
arXiv Detail & Related papers (2025-11-02T20:16:38Z)
Aligning Large Language Models to Low-Resource Languages through LLM-Based Selective Translation: A Systematic Study [1.0470286407954037]
selective translation is a technique that translates only the translatable parts of a text while preserving non-translatable content and sentence structure.<n>Our experiments focus on the low-resource Indic language Hindi and compare translations generated by Google Cloud Translation (GCP) and Llama-3.1-405B.
arXiv Detail & Related papers (2025-07-18T18:21:52Z)
Trans-Zero: Self-Play Incentivizes Large Language Models for Multilingual Translation Without Parallel Data [64.4458540273004]
We propose a self-play framework that leverages only monolingual data and the intrinsic multilingual knowledge of Large Language Models (LLMs)<n>Experiments demonstrate that this approach not only matches the performance of models trained on large-scale parallel data but also excels in non-English translation directions.
arXiv Detail & Related papers (2025-04-20T16:20:30Z)
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking [2.321323878201932]
MultiSynFact is the first large-scale multilingual fact-checking dataset containing 2.2M claim-source pairs.<n>Our dataset generation pipeline leverages Large Language Models (LLMs), integrating external knowledge from Wikipedia.<n>We open-source a user-friendly framework to facilitate further research in multilingual fact-checking and dataset generation.
arXiv Detail & Related papers (2025-02-21T12:38:26Z)
Understanding In-Context Machine Translation for Low-Resource Languages: A Case Study on Manchu [53.437954702561065]
In-context machine translation (MT) with large language models (LLMs) is a promising approach for low-resource MT.<n>This study systematically investigates how each type of resource, e.g., dictionary, grammar book, and retrieved parallel examples, affect the translation performance.<n>Our results indicate that high-quality dictionaries and good parallel examples are very helpful, while grammars hardly help.
arXiv Detail & Related papers (2025-02-17T14:53:49Z)
What do Large Language Models Need for Machine Translation Evaluation? [12.42394213466485]
Large language models (LLMs) can achieve results comparable to fine-tuned multilingual pre-trained language models. This paper explores what translation information, such as the source, reference, translation errors and annotation guidelines, is needed for LLMs to evaluate machine translation quality.
arXiv Detail & Related papers (2024-10-04T09:50:45Z)
Quality or Quantity? On Data Scale and Diversity in Adapting Large Language Models for Low-Resource Translation [62.202893186343935]
We explore what it would take to adapt Large Language Models for low-resource languages. We show that parallel data is critical during both pre-training andSupervised Fine-Tuning (SFT) Our experiments with three LLMs across two low-resourced language groups reveal consistent trends, underscoring the generalizability of our findings.
arXiv Detail & Related papers (2024-08-23T00:59:38Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset [69.33424532827608]
Open-source large language models (LLMs) have gained significant strength across diverse fields. In this work, we construct an open-source multilingual supervised fine-tuning dataset. The resulting UltraLink dataset comprises approximately 1 million samples across five languages.
arXiv Detail & Related papers (2024-02-07T05:05:53Z)
Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis [103.89753784762445]
Large language models (LLMs) have demonstrated remarkable potential in handling multilingual machine translation (MMT) This paper systematically investigates the advantages and challenges of LLMs for MMT. We thoroughly evaluate eight popular LLMs, including ChatGPT and GPT-4.
arXiv Detail & Related papers (2023-04-10T15:51:30Z)
Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model. We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.