XRAG: Cross-lingual Retrieval-Augmented Generation
- URL: http://arxiv.org/abs/2505.10089v1
- Date: Thu, 15 May 2025 08:47:55 GMT
- Title: XRAG: Cross-lingual Retrieval-Augmented Generation
- Authors: Wei Liu, Sony Trenous, Leonardo F. R. Ribeiro, Bill Byrne, Felix Hieber,
- Abstract summary: XRAG is designed to evaluate the generation abilities of LLMs in cross-lingual Retrieval-Augmented Generation settings.<n>XRAG is constructed from recent news articles to ensure that its questions require external knowledge to be answered.
- Score: 21.548347969135254
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose XRAG, a novel benchmark designed to evaluate the generation abilities of LLMs in cross-lingual Retrieval-Augmented Generation (RAG) settings where the user language does not match the retrieval results. XRAG is constructed from recent news articles to ensure that its questions require external knowledge to be answered. It covers the real-world scenarios of monolingual and multilingual retrieval, and provides relevancy annotations for each retrieved document. Our novel dataset construction pipeline results in questions that require complex reasoning, as evidenced by the significant gap between human and LLM performance. Consequently, XRAG serves as a valuable benchmark for studying LLM reasoning abilities, even before considering the additional cross-lingual complexity. Experimental results on five LLMs uncover two previously unreported challenges in cross-lingual RAG: 1) in the monolingual retrieval setting, all evaluated models struggle with response language correctness; 2) in the multilingual retrieval setting, the main challenge lies in reasoning over retrieved information across languages rather than generation of non-English text.
Related papers
- Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate [36.641755706551336]
Large language models (LLMs) provide detailed and impressive responses to queries in English.<n>But are they really consistent at responding to the same query in other languages?<n>We propose a framework to evaluate LLM's cross-lingual consistency based on a simple Translate then Evaluate strategy.
arXiv Detail & Related papers (2025-05-28T06:00:21Z) - Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering [73.73820209993515]
We introduce KoLasSimpleQA, the first benchmark evaluating the multilingual factual ability of Large Language Models (LLMs)<n>Inspired by existing research, we created the question set with features such as single knowledge point coverage, absolute objectivity, unique answers, and temporal stability.<n>Results show significant performance differences between the two domains.
arXiv Detail & Related papers (2025-05-22T12:27:02Z) - Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task [73.35882908048423]
Retrieval-augmented generation (RAG) has become a cornerstone of contemporary NLP.<n>This paper investigates the effectiveness of RAG across multiple languages by proposing novel approaches for multilingual open-domain question-answering.
arXiv Detail & Related papers (2025-04-04T17:35:43Z) - On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation [7.478369203246005]
Retrieval-augmented generation (RAG) with large language models (LLMs) has demonstrated strong performance in multilingual question-answering tasks.<n>In multilingual RAG, retrieved passages can be written in languages other than that of the query entered by the user.
arXiv Detail & Related papers (2025-04-01T09:55:23Z) - Investigating Language Preference of Multilingual RAG Systems [4.438698005789677]
MRAG systems struggle with retrieving relevant information due to linguistic variations between queries and documents.<n>We propose Dual Knowledge Multilingual RAG (DKM-RAG), a framework that fuses translated multilingual passages with complementary model knowledge.<n> Empirical results demonstrate that DKM-RAG mitigates language preference in generation and enhances performance across diverse linguistic settings.
arXiv Detail & Related papers (2025-02-16T15:54:05Z) - Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.<n>Currently, instruction-tuned large language models (LLMs) excel at various English tasks.<n>Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z) - Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models [22.859955360764275]
We introduce the MultiLingual Needle-in-a-Haystack (MLNeedle) test to assess a model's ability to retrieve relevant information.
We evaluate four state-of-the-art large language models on MLNeedle.
arXiv Detail & Related papers (2024-08-19T17:02:06Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - MELA: Multilingual Evaluation of Linguistic Acceptability [7.524375463656369]
We present the largest benchmark to date on linguistic acceptability: Multilingual Evaluation of Linguistic Acceptability -- MELA, with 46K samples covering 10 languages.
In pursuit of multilingual interpretability, we conduct probing experiments with fine-tuned XLM-R.
Cross-lingual transfer experiments show that transfer in acceptability judgment is non-trivial.
arXiv Detail & Related papers (2023-11-15T15:25:28Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training,
Understanding and Generation [100.09099800591822]
XGLUE is a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models.
XGLUE provides 11 diversified tasks that cover both natural language understanding and generation scenarios.
arXiv Detail & Related papers (2020-04-03T07:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.