Related papers: Bridging the Culture Gap: A Framework for LLM-Driven Socio-Cultural Localization of Math Word Problems in Low-Resource Languages

Bridging the Culture Gap: A Framework for LLM-Driven Socio-Cultural Localization of Math Word Problems in Low-Resource Languages

URL: http://arxiv.org/abs/2508.14913v3
Date: Tue, 07 Oct 2025 09:29:49 GMT
Title: Bridging the Culture Gap: A Framework for LLM-Driven Socio-Cultural Localization of Math Word Problems in Low-Resource Languages
Authors: Israel Abebe Azime, Tadesse Destaw Belay, Dietrich Klakow, Philipp Slusallek, Anshuman Chhabra,
Abstract summary: We introduce a framework for cultural localization of math word problems in languages other than English.<n>We find that translated benchmarks can obscure true multilingual math ability under appropriate socio-cultural contexts.<n>Our framework can help mitigate English-centric entity bias and improve robustness when native entities are introduced across various languages.
Score: 32.87800105020907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have demonstrated significant capabilities in solving mathematical problems expressed in natural language. However, multilingual and culturally-grounded mathematical reasoning in low-resource languages lags behind English due to the scarcity of socio-cultural task datasets that reflect accurate native entities such as person names, organization names, and currencies. Existing multilingual benchmarks are predominantly produced via translation and typically retain English-centric entities, owing to the high cost associated with human annotater-based localization. Moreover, automated localization tools are limited, and hence, truly localized datasets remain scarce. To bridge this gap, we introduce a framework for LLM-driven cultural localization of math word problems that automatically constructs datasets with native names, organizations, and currencies from existing sources. We find that translated benchmarks can obscure true multilingual math ability under appropriate socio-cultural contexts. Through extensive experiments, we also show that our framework can help mitigate English-centric entity bias and improves robustness when native entities are introduced across various languages.

Related papers

CLM-Bench: Benchmarking and Analyzing Cross-lingual Misalignment of LLMs in Knowledge Editing [5.137059606366328]
We propose CLM-Bench, a culture-aware benchmark constructed using a native Chinese-first methodology.<n>We conduct extensive experiments on representative LLMs and reveal a significant Cross-lingual Misalignment.<n>Our findings challenge the effectiveness of current methods in cross-lingual transfer and underscore the importance of culturally native benchmarks.
arXiv Detail & Related papers (2026-01-24T09:55:34Z)
Breaking Physical and Linguistic Borders: Multilingual Federated Prompt Tuning for Low-Resource Languages [27.63253872229416]
We propose a Federated Prompt Tuning Paradigm for multilingual scenarios.<n>Our approach achieves 6.9% higher accuracy with improved data efficiency.<n>These findings underscore the potential of our approach to promote social equality and champion linguistic diversity.
arXiv Detail & Related papers (2025-07-02T05:23:20Z)
Natural language processing for African languages [7.884789325654572]
dissertation focuses on languages spoken in Sub-Saharan Africa where all the indigenous languages can be regarded as low-resourced.<n>We show that the quality of semantic representations learned in word embeddings does not only depend on the amount of data but on the quality of pre-training data.<n>We develop large scale human-annotated labelled datasets for 21 African languages in two impactful NLP tasks.
arXiv Detail & Related papers (2025-06-30T22:26:36Z)
NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge [11.430887334254422]
We propose the NativQA framework, which can seamlessly construct large-scale, culturally and regionally aligned QA datasets in native languages.<n>The framework has been evaluated across 39 locations in 24 countries and in 7 languages, resulting in over 300K Question-Answer pairs.
arXiv Detail & Related papers (2025-04-08T13:01:51Z)
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge [36.234295907476515]
The development of functional large language models (LLM) is bottlenecked by the lack of high-quality evaluation resources in languages other than English.<n>In this work, we construct an evaluation suite of 197,243 QA pairs from local exam sources to measure the capabilities of multilingual LLMs in a variety of regional contexts.
arXiv Detail & Related papers (2024-11-29T16:03:14Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
We propose Lens, a novel approach to enhance multilingual capabilities in large language models (LLMs)<n>Lens operates on two subspaces: the language-agnostic subspace, where it aligns target languages with the central language to inherit strong semantic representations, and the language-specific subspace, where it separates target and central languages to preserve linguistic specificity.<n>Lens significantly improves multilingual performance while maintaining the model's English proficiency, achieving better results with less computational cost compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z)
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages. We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets. Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z)
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z)
CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts. CLSE covers 74 different semantic types to support various applications from airline ticketing to video games. We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.