Multilingual Self-Taught Faithfulness Evaluators
- URL: http://arxiv.org/abs/2507.20752v1
- Date: Mon, 28 Jul 2025 12:01:59 GMT
- Title: Multilingual Self-Taught Faithfulness Evaluators
- Authors: Carlo Alfano, Aymen Al Marjani, Zeno Jonke, Amin Mantrach, Saab Mansour, Marcello Federico,
- Abstract summary: Self-Taught Evaluators for Multilingual Faithfulness is a framework that learns exclusively from synthetic multilingual summarization data.<n>Our framework shows improvements over existing baselines, including state-of-the-art English evaluators and machine translation-based approaches.
- Score: 11.200203292660758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing use of large language models (LLMs) has increased the need for automatic evaluation systems, particularly to address the challenge of information hallucination. Although existing faithfulness evaluation approaches have shown promise, they are predominantly English-focused and often require expensive human-labeled training data for fine-tuning specialized models. As LLMs see increased adoption in multilingual contexts, there is a need for accurate faithfulness evaluators that can operate across languages without extensive labeled data. This paper presents Self-Taught Evaluators for Multilingual Faithfulness, a framework that learns exclusively from synthetic multilingual summarization data while leveraging cross-lingual transfer learning. Through experiments comparing language-specific and mixed-language fine-tuning approaches, we demonstrate a consistent relationship between an LLM's general language capabilities and its performance in language-specific evaluation tasks. Our framework shows improvements over existing baselines, including state-of-the-art English evaluators and machine translation-based approaches.
Related papers
- Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization [9.349707150988893]
We propose a continuous multilingual integration strategy that injects text-only multilingual data during visual instruction tuning.<n>Our approach significantly improves linguistic fidelity across languages without degradation in visual performance.
arXiv Detail & Related papers (2025-03-28T16:26:52Z) - Entity-aware Cross-lingual Claim Detection for Automated Fact-checking [7.242609314791262]
We introduce EX-Claim, an entity-aware cross-lingual claim detection model that generalizes well to handle multilingual claims.<n>We show consistent performance gains across 27 languages and robust knowledge transfer between languages seen and unseen during training.
arXiv Detail & Related papers (2025-03-19T14:00:55Z) - Enhancing Multilingual Language Models for Code-Switched Input Data [0.0]
This research investigates if pre-training Multilingual BERT (mBERT) on code-switched datasets improves the model's performance on critical NLP tasks.<n>We use a dataset of Spanglish tweets for pre-training and evaluate the pre-trained model against a baseline model.<n>Our findings show that our pre-trained mBERT model outperforms or matches the baseline model in the given tasks, with the most significant improvements seen for parts of speech tagging.
arXiv Detail & Related papers (2025-03-11T02:49:41Z) - P-MMEval: A Parallel Multilingual Multitask Benchmark for Consistent Evaluation of LLMs [84.24644520272835]
We introduce P-MMEval, a large-scale benchmark covering effective fundamental and capability-specialized datasets.<n>P-MMEval delivers consistent language coverage across various datasets and provides parallel samples.<n>We conduct extensive experiments on representative multilingual model series to compare performances across models and tasks.
arXiv Detail & Related papers (2024-11-14T01:29:36Z) - Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in
Multilingual Language Models [12.662039551306632]
We show that observed high performance of multilingual models can be largely attributed to factors not requiring the transfer of actual linguistic knowledge.
More specifically, we observe what has been transferred across languages is mostly data artifacts and biases, especially for low-resource languages.
arXiv Detail & Related papers (2024-02-03T09:41:52Z) - Is Prompt-Based Finetuning Always Better than Vanilla Finetuning?
Insights from Cross-Lingual Language Understanding [0.30586855806896046]
We propose the ProFiT pipeline to investigate the cross-lingual capabilities of Prompt-based Finetuning.
Our results reveal the effectiveness and versatility of prompt-based finetuning in cross-lingual language understanding.
arXiv Detail & Related papers (2023-07-15T20:33:33Z) - Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of
Multilingual Language Models [73.11488464916668]
This study investigates the dynamics of the multilingual pretraining process.
We probe checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks.
Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones.
arXiv Detail & Related papers (2022-05-24T03:35:00Z) - Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm.
We provide insights into what makes multilingual sequential learning particularly challenging.
The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
Cross-lingual Generalization [128.37244072182506]
Cross-lingual TRansfer Evaluation of Multilinguals XTREME is a benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks.
We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models.
arXiv Detail & Related papers (2020-03-24T19:09:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.