Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering
- URL: http://arxiv.org/abs/2510.11928v1
- Date: Mon, 13 Oct 2025 20:48:26 GMT
- Title: Discrepancy Detection at the Data Level: Toward Consistent Multilingual Question Answering
- Authors: Lorena Calvo-Bartolomé, Valérie Aldana, Karla Cantarero, Alonso Madroñal de Mesa, Jerónimo Arenas-García, Jordan Boyd-Graber,
- Abstract summary: We propose MIND, a user-in-the-loop fact-checking pipeline to detect factual and cultural discrepancies in multilingual QA knowledge bases.<n> MIND highlights divergent answers to culturally sensitive questions that vary by region and context.<n>We evaluate MIND on a bilingual QA system in the maternal and infant health domain and release a dataset of bilingual questions annotated for factual and cultural inconsistencies.
- Score: 4.724569611822116
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual question answering (QA) systems must ensure factual consistency across languages, especially for objective queries such as What is jaundice?, while also accounting for cultural variation in subjective responses. We propose MIND, a user-in-the-loop fact-checking pipeline to detect factual and cultural discrepancies in multilingual QA knowledge bases. MIND highlights divergent answers to culturally sensitive questions (e.g., Who assists in childbirth?) that vary by region and context. We evaluate MIND on a bilingual QA system in the maternal and infant health domain and release a dataset of bilingual questions annotated for factual and cultural inconsistencies. We further test MIND on datasets from other domains to assess generalization. In all cases, MIND reliably identifies inconsistencies, supporting the development of more culturally aware and factually consistent QA systems.
Related papers
- Language over Content: Tracing Cultural Understanding in Multilingual Large Language Models [10.798925500517823]
Internal paths overlap more for same-language, cross-country questions than for cross-language, same-country questions.<n>Results show that internal paths overlap more for same-language, cross-country questions than for cross-language, same-country questions.
arXiv Detail & Related papers (2025-10-18T16:19:45Z) - XLQA: A Benchmark for Locale-Aware Multilingual Open-Domain Question Answering [48.913480244527925]
Large Language Models (LLMs) have shown significant progress in Open-domain question answering (ODQA)<n>Most evaluations focus on English and assume locale-invariant answers across languages.<n>We introduce XLQA, a novel benchmark explicitly designed for locale-sensitive multilingual ODQA.
arXiv Detail & Related papers (2025-08-22T07:00:13Z) - Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation [71.59208664920452]
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as global benchmarks.<n>We show that progress on MMLU depends heavily on learning Western-centric concepts, with 28% of all questions requiring culturally sensitive knowledge.<n>We release Global MMLU, an improved MMLU with evaluation coverage across 42 languages.
arXiv Detail & Related papers (2024-12-04T13:27:09Z) - CaLMQA: Exploring culturally specific long-form question answering across 23 languages [58.18984409715615]
CaLMQA is a dataset of 51.7K culturally specific questions across 23 different languages.<n>We evaluate factuality, relevance and surface-level quality of LLM-generated long-form answers.
arXiv Detail & Related papers (2024-06-25T17:45:26Z) - CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark [68.21939124278065]
Culturally-diverse multilingual Visual Question Answering benchmark designed to cover a rich set of languages and cultures.
CVQA includes culturally-driven images and questions from across 30 countries on four continents, covering 31 languages with 13 scripts, providing a total of 10k questions.
We benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models.
arXiv Detail & Related papers (2024-06-10T01:59:00Z) - Evaluating and Modeling Attribution for Cross-Lingual Question Answering [80.4807682093432]
This work is the first to study attribution for cross-lingual question answering.
We collect data in 5 languages to assess the attribution level of a state-of-the-art cross-lingual QA system.
We find that a substantial portion of the answers is not attributable to any retrieved passages.
arXiv Detail & Related papers (2023-05-23T17:57:46Z) - Investigating Information Inconsistency in Multilingual Open-Domain
Question Answering [18.23417521199809]
We analyze the behavior of multilingual open-domain question answering models with a focus on retrieval bias.
We speculate that the content differences in documents across languages might reflect cultural divergences and/or social biases.
arXiv Detail & Related papers (2022-05-25T02:58:54Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.