Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness
- URL: http://arxiv.org/abs/2410.01171v3
- Date: Sun, 22 Jun 2025 21:14:14 GMT
- Title: Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness
- Authors: Bryan Li, Fiona Luo, Samar Haider, Adwait Agashe, Tammy Li, Runqi Liu, Muqing Miao, Shriya Ramakrishnan, Yuan Yuan, Chris Callison-Burch,
- Abstract summary: We introduce BordIRLines, a dataset of territorial disputes paired with retrieved Wikipedia documents, across 49 languages.<n>We evaluate the cross-lingual robustness of this RAG setting by formalizing several modes for multilingual retrieval.<n>Our experiments show that incorporating perspectives from diverse languages can in fact improve robustness.
- Score: 30.00463676754559
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paradigm of retrieval-augmented generated (RAG) helps mitigate hallucinations of large language models (LLMs). However, RAG also introduces biases contained within the retrieved documents. These biases can be amplified in scenarios which are multilingual and culturally-sensitive, such as territorial disputes. We thus introduce BordIRLines, a dataset of territorial disputes paired with retrieved Wikipedia documents, across 49 languages. We evaluate the cross-lingual robustness of this RAG setting by formalizing several modes for multilingual retrieval. Our experiments on several LLMs show that incorporating perspectives from diverse languages can in fact improve robustness; retrieving multilingual documents best improves response consistency and decreases geopolitical bias over RAG with purely in-language documents. We also consider how RAG responses utilize presented documents, finding a much wider variance in the linguistic distribution of response citations, when querying in low-resource languages. Our further analyses investigate the various aspects of a cross-lingual RAG pipeline, from retrieval to document contents. We release our benchmark and code to support continued research towards equitable information access across languages at https://huggingface.co/datasets/borderlines/bordirlines.
Related papers
- The Cross-Lingual Cost: Retrieval Biases in RAG over Arabic-English Corpora [6.594531626178451]
Cross-lingual retrieval-augmented generation (RAG) is a critical capability for retrieving and generating answers across languages.<n>We study Arabic-English RAG in a domain-specific setting using benchmarks derived from real-world corporate datasets.<n>We propose a simple retrieval strategy that addresses this source of failure by enforcing equal retrieval from both languages.
arXiv Detail & Related papers (2025-07-10T08:38:31Z) - Multilingual Retrieval-Augmented Generation for Knowledge-Intensive Task [73.35882908048423]
Retrieval-augmented generation (RAG) has become a cornerstone of contemporary NLP.
This paper investigates the effectiveness of RAG across multiple languages by proposing novel approaches for multilingual open-domain question-answering.
arXiv Detail & Related papers (2025-04-04T17:35:43Z) - Investigating Language Preference of Multilingual RAG Systems [4.438698005789677]
MRAG systems struggle with retrieving relevant information due to linguistic variations between queries and documents.<n>We propose Dual Knowledge Multilingual RAG (DKM-RAG), a framework that fuses translated multilingual passages with complementary model knowledge.<n> Empirical results demonstrate that DKM-RAG mitigates language preference in generation and enhances performance across diverse linguistic settings.
arXiv Detail & Related papers (2025-02-16T15:54:05Z) - Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation [38.631934251052485]
We evaluate six multilingual RALMs using our benchmark to explore the challenges of multilingual RALMs.
High-resource languages stand out in Monolingual Knowledge Extraction.
Indo-European languages lead RALMs to provide answers directly from documents.
English benefits from RALMs' selection bias and speaks louder in multilingual knowledge selection.
arXiv Detail & Related papers (2024-10-29T11:53:19Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, i.e., be crosslingual?
This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering [58.17090503446995]
We focus on a conversational question answering task which combines the challenges of understanding questions in context and reasoning over evidence gathered from heterogeneous sources like text, knowledge graphs, tables, and infoboxes.
Our method utilizes a graph structured representation to aggregate information about a question and its context.
arXiv Detail & Related papers (2024-06-14T13:28:03Z) - XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples [64.79218405438871]
We introduce XAMPLER: Cross-Lingual Example Retrieval, a method tailored to tackle the challenge of cross-lingual in-context learning.<n>XAMPLER first trains a retriever based on Glot500, a multilingual small language model.<n>It can directly retrieve English examples as few-shot examples for in-context learning of target languages.
arXiv Detail & Related papers (2024-05-08T15:13:33Z) - What Evidence Do Language Models Find Convincing? [94.90663008214918]
We build a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts.
We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions.
Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important.
arXiv Detail & Related papers (2024-02-19T02:15:34Z) - Multi-EuP: The Multilingual European Parliament Dataset for Analysis of
Bias in Information Retrieval [62.82448161570428]
This dataset is designed to investigate fairness in a multilingual information retrieval context.
It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages.
It offers rich demographic information associated with its documents, facilitating the study of demographic bias.
arXiv Detail & Related papers (2023-11-03T12:29:11Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Breaking Language Barriers with MMTweets: Advancing Cross-Lingual Debunked Narrative Retrieval for Fact-Checking [5.880794128275313]
Cross-lingual debunked narrative retrieval is an understudied problem.
This study introduces cross-lingual debunked narrative retrieval and addresses this research gap by: (i) creating Multilingual Misinformation Tweets (MMTweets)
MMTweets features cross-lingual pairs, images, human annotations, and fine-grained labels, making it a comprehensive resource compared to its counterparts.
We find that MMTweets presents challenges for cross-lingual debunked narrative retrieval, highlighting areas for improvement in retrieval models.
arXiv Detail & Related papers (2023-08-10T16:33:17Z) - Evaluating and Modeling Attribution for Cross-Lingual Question Answering [80.4807682093432]
This work is the first to study attribution for cross-lingual question answering.
We collect data in 5 languages to assess the attribution level of a state-of-the-art cross-lingual QA system.
We find that a substantial portion of the answers is not attributable to any retrieved passages.
arXiv Detail & Related papers (2023-05-23T17:57:46Z) - Soft Prompt Decoding for Multilingual Dense Retrieval [30.766917713997355]
We show that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance.
This is due to the heterogeneous and imbalanced nature of multilingual collections.
We present KD-SPD, a novel soft prompt decoding approach for MLIR that implicitly "translates" the representation of documents in different languages into the same embedding space.
arXiv Detail & Related papers (2023-05-15T21:17:17Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual
Open-retrieval Question Answering System [16.89747171947662]
This paper introduces our proposed system for the MIA Shared Task on Cross-lingual Open-retrieval Question Answering (COQA)
In this challenging scenario, given an input question the system has to gather evidence documents from a multilingual pool and generate an answer in the language of the question.
We devised several approaches combining different model variants for three main components: Data Augmentation, Passage Retrieval, and Answer Generation.
arXiv Detail & Related papers (2022-05-30T10:31:08Z) - Cross-Lingual Phrase Retrieval [49.919180978902915]
Cross-lingual retrieval aims to retrieve relevant text across languages.
Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level.
We propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences.
arXiv Detail & Related papers (2022-04-19T13:35:50Z) - Models and Datasets for Cross-Lingual Summarisation [78.56238251185214]
We present a cross-lingual summarisation corpus with long documents in a source language associated with multi-sentence summaries in a target language.
The corpus covers twelve language pairs and directions for four European languages, namely Czech, English, French and German.
We derive cross-lingual document-summary instances from Wikipedia by combining lead paragraphs and articles' bodies from language aligned Wikipedia titles.
arXiv Detail & Related papers (2022-02-19T11:55:40Z) - A Survey on non-English Question Answering Dataset [0.0]
The aim of this survey is to recognize, summarize and analyze the existing datasets that have been released by many researchers.
In this paper, we review question answering datasets that are available in common languages other than English such as French, German, Japanese, Chinese, Arabic, Russian, as well as the multilingual and cross-lingual question-answering datasets.
arXiv Detail & Related papers (2021-12-27T12:45:06Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Ground-Truth, Whose Truth? -- Examining the Challenges with Annotating
Toxic Text Datasets [26.486492641924226]
This study examines selected toxic text datasets with the goal of shedding light on some of the inherent issues.
We re-annotate samples from three toxic text datasets and find that a multi-label approach to annotating toxic text samples can help to improve dataset quality.
arXiv Detail & Related papers (2021-12-07T06:58:22Z) - Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval [51.004601358498135]
Mr. TyDi is a benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages.
The goal of this resource is to spur research in dense retrieval techniques in non-English languages.
arXiv Detail & Related papers (2021-08-19T16:53:43Z) - One Question Answering Model for Many Languages with Cross-lingual Dense
Passage Retrieval [39.061900747689094]
CORA is a Cross-lingual Open-Retrieval Answer Generation model.
It can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
arXiv Detail & Related papers (2021-07-26T06:02:54Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - LAReQA: Language-agnostic answer retrieval from a multilingual pool [29.553907688813347]
LAReQA tests for "strong" cross-lingual alignment.
We find that augmenting training data via machine translation is effective.
This finding underscores our claim that languageagnostic retrieval is a substantively new kind of cross-lingual evaluation.
arXiv Detail & Related papers (2020-04-11T20:51:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.