Related papers: Benchmarking Cross-Lingual Semantic Alignment in Multilingual Embeddings

Benchmarking Cross-Lingual Semantic Alignment in Multilingual Embeddings

URL: http://arxiv.org/abs/2601.09732v1
Date: Mon, 29 Dec 2025 14:32:57 GMT
Title: Benchmarking Cross-Lingual Semantic Alignment in Multilingual Embeddings
Authors: Wen G. Gong,
Abstract summary: Task-driven benchmarks (MTEB) may mask fundamental alignment shortcomings.<n>We introduce Semantic Affinity (SA), a bounded (between 0 and 1) metric measuring inter-lingual to intra-lingual spread ratio.<n> Benchmarking 13 models across 4 datasets (52 experiments) reveals a three-tier structure.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With hundreds of multilingual embedding models available, practitioners lack clear guidance on which provide genuine cross-lingual semantic alignment versus task performance through language-specific patterns. Task-driven benchmarks (MTEB) may mask fundamental alignment shortcomings. We introduce Semantic Affinity (SA), a bounded (between 0 and 1) metric measuring inter-lingual to intra-lingual spread ratio using cosine distance, combined with PHATE visualization in our Semanscope framework. Benchmarking 13 models across 4 datasets (52 experiments) reveals a three-tier structure: (1) Top BERT models (LaBSE SA = 0.70, USE SA = 0.68, S-BERT SA = 0.68) achieve strong alignment via translation-pair supervision; (2) LLM embeddings plateau at SA between 0.55 and 0.61 regardless of 0.6 B to 8 B scale; (3) MLM-only BERT models (mBERT, XLM-R, SA < 0.50) fail despite more than 100 language training. Training objective, not architecture or scale, determines alignment. Oracle Bone primitives (1200 BCE) expose semantic drift-models learn corpus patterns rather than cognitive primitives. This work provides semantic benchmarking to help practitioners select quality multilingual embeddings from hundreds of available models, showing cross-lingual alignment requires explicit translation supervision, not merely model scale or multilingual data.

Related papers

XplaiNLP at CheckThat! 2025: Multilingual Subjectivity Detection with Finetuned Transformers and Prompt-Based Inference with Large Language Models [2.749729059235755]
This notebook reports the Xplai submission to the CheckThat! 2025 shared task on multilingual subjectivity detection.<n>We evaluate two approaches: supervised fine-tuning of transformer encoders, EuroBERT, XLM-RoBERTa, and German-BERT, on monolingual and machine-translated training data.<n>For German, a German-BERT model fine-tuned on translated training data from typologically related languages yields competitive performance over the baseline.
arXiv Detail & Related papers (2025-09-15T16:53:41Z)
Mario at EXIST 2025: A Simple Gateway to Effective Multilingual Sexism Detection [8.40042895828361]
EXIST 2025 Task 1 addresses text-based sexism detection in English and Spanish tweets through hierarchical Low-Rank Adaptation (LoRA) of Llama 3.1 8B.<n>Our method introduces conditional adapter routing that explicitly models dependencies across three hierarchically structured subtasks.<n>Our approach reduces training time by 75% and model storage by 98%, while achieving competitive performance across all subtasks.
arXiv Detail & Related papers (2025-07-15T05:30:32Z)
Enhancing LLM Language Adaption through Cross-lingual In-Context Pre-training [57.62126373849383]
Cross-lingual In-context Pre-training (CrossIC-PT) is a simple and scalable approach that enhances cross-lingual transfer.<n>We construct CrossIC-PT samples by interleaving semantic-related bilingual Wikipedia documents into a single context window.<n> Experimental results demonstrate that CrossIC-PT improves multilingual performance on three models across six target languages.
arXiv Detail & Related papers (2025-04-29T07:24:25Z)
Machine Translation for Ge'ez Language [0.0]
Machine translation for low-resource languages such as Ge'ez faces challenges such as out-of-vocabulary words, domain mismatches, and lack of labeled training data. We develop a multilingual neural machine translation (MNMT) model based on languages relatedness. We also experiment with using GPT-3.5, a state-of-the-art LLM, for few-shot translation with fuzzy matches.
arXiv Detail & Related papers (2023-11-24T14:55:23Z)
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages. By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z)
XSemPLR: Cross-Lingual Semantic Parsing in Multiple Natural Languages and Meaning Representations [25.50509874992198]
Cross-Lingual Semantic Parsing aims to translate queries in multiple natural languages into meaning representations. Existing CLSP models are separately proposed and evaluated on datasets of limited tasks and applications. We present XSemPLR, a unified benchmark for cross-lingual semantic parsing featured with 22 natural languages and 8 meaning representations.
arXiv Detail & Related papers (2023-06-07T01:09:37Z)
OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource Language Pair for Low-Resource Sentence Retrieval [91.76575626229824]
We present OneAligner, an alignment model specially designed for sentence retrieval tasks. When trained with all language pairs of a large-scale parallel multilingual corpus (OPUS-100), this model achieves the state-of-the-art result. We conclude through empirical results and analyses that the performance of the sentence alignment task depends mostly on the monolingual and parallel data size.
arXiv Detail & Related papers (2022-05-17T19:52:42Z)
Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization. We show how to practically optimize this objective for large translation corpora using an iterated best response scheme. Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages [75.08199398141744]
We present AmericasNLI, an extension of XNLI (Conneau et al.), to 10 indigenous languages of the Americas. We conduct experiments with XLM-R, testing multiple zero-shot and translation-based approaches. We find that XLM-R's zero-shot performance is poor for all 10 languages, with an average performance of 38.62%.
arXiv Detail & Related papers (2021-04-18T05:32:28Z)
Self-Learning for Zero Shot Neural Machine Translation [13.551731309506874]
This work proposes a novel zero-shot NMT modeling approach that learns without the now-standard assumption of a pivot language sharing parallel data. Compared to unsupervised NMT, consistent improvements are observed even in a domain-mismatch setting.
arXiv Detail & Related papers (2021-03-10T09:15:19Z)
Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR) AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities. Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.