Related papers: BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali

BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali

URL: http://arxiv.org/abs/2511.20399v2
Date: Wed, 26 Nov 2025 17:08:26 GMT
Title: BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali
Authors: Abdullah Al Sefat,
Abstract summary: We present BengaliFig, a compact yet richly annotated challenge set.<n>The dataset contains 435 unique riddles drawn from Bengali oral and literary traditions.<n>Each item is annotated along five dimensions capturing reasoning type, trap type, cultural depth, answer category, and difficulty.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models excel on broad multilingual benchmarks but remain to be evaluated extensively in figurative and culturally grounded reasoning, especially in low-resource contexts. We present BengaliFig, a compact yet richly annotated challenge set that targets this gap in Bengali, a widely spoken low-resourced language. The dataset contains 435 unique riddles drawn from Bengali oral and literary traditions. Each item is annotated along five orthogonal dimensions capturing reasoning type, trap type, cultural depth, answer category, and difficulty, and is automatically converted to multiple-choice format through a constraint-aware, AI-assisted pipeline. We evaluate eight frontier LLMs from major providers under zero-shot and few-shot chain-of-thought prompting, revealing consistent weaknesses in metaphorical and culturally specific reasoning. BengaliFig thus contributes both a diagnostic probe for evaluating LLM robustness in low-resource cultural contexts and a step toward inclusive and heritage-aware NLP evaluation.

Related papers

When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms [1.5840067220859924]
Figurative language understanding remains a significant challenge for Large Language Models (LLMs)<n>We introduce a new idiom dataset, a large-scale, culturally-grounded corpus of 10,361 Bengali idioms.<n>Each idiom is annotated under a comprehensive 19-field schema, established and refined through a deliberative expert consensus process.<n>We evaluate 30 state-of-the-art multilingual and instruction-tuned LLMs on the task of inferring figurative meaning.
arXiv Detail & Related papers (2026-02-13T13:26:11Z)
BNLI: A Linguistically-Refined Bengali Dataset for Natural Language Inference [1.7688536690159165]
Existing Bengali NLI datasets exhibit several inconsistencies, including annotation errors, ambiguous sentence pairs, and inadequate linguistic diversity.<n>We introduce BNLI, a refined and linguistically curated Bengali NLI dataset designed to support robust language understanding and inference modeling.<n>We benchmarked BNLI using a suite of state-of-the-art transformer-based architectures, including multilingual and Bengali-specific models, to assess their ability to capture complex semantic relations.
arXiv Detail & Related papers (2025-11-11T22:29:14Z)
BengaliMoralBench: A Benchmark for Auditing Moral Reasoning in Large Language Models within Bengali Language and Culture [5.215285027585101]
Bengali is spoken by over 285 million people and ranked 6th globally.<n>Existing ethics benchmarks are largely English-centric and shaped by Western frameworks.<n>We introduce BengaliMoralBench, the first large-scale ethics benchmark for the Bengali language and socio-cultural contexts.
arXiv Detail & Related papers (2025-11-05T04:55:35Z)
MultiNRC: A Challenging and Native Multilingual Reasoning Evaluation Benchmark for LLMs [56.87573414161703]
We introduce the Multilingual Native Reasoning Challenge (MultiNRC), a benchmark to assess Large Language Models (LLMs)<n>MultiNRC covers four core reasoning categories: language-specific linguistic reasoning, wordplay & riddles, cultural/tradition reasoning, and math reasoning with cultural relevance.<n>For cultural/tradition reasoning and math reasoning with cultural relevance, we also provide English equivalent translations of the multilingual questions by manual translation from native speakers fluent in English.
arXiv Detail & Related papers (2025-07-23T12:56:31Z)
Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning [0.0]
Solving Bengali Math Word Problems (MWPs) remains a major challenge in natural language processing (NLP)<n>No human-annotated Bengali dataset has previously addressed this task.<n>We created SOMADHAN, a dataset of 8792 complex Bengali MWPs with manually written, step-by-step solutions.
arXiv Detail & Related papers (2025-05-27T15:47:10Z)
XIFBench: Evaluating Large Language Models on Multilingual Instruction Following [59.549015333755186]
Large Language Models (LLMs) have demonstrated remarkable instruction-following capabilities across various applications.<n>Existing evaluations lack fine-grained constraint analysis across diverse linguistic contexts.<n>We introduce XIFBench, a comprehensive benchmark for evaluating multilingual instruction-following abilities of LLMs.
arXiv Detail & Related papers (2025-03-10T17:07:52Z)
CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models [53.9835961434552]
We introduce the Chinese Instruction-Following Benchmark (CIF-Bench) to evaluate the generalizability of large language models (LLMs) to the Chinese language. CIF-Bench comprises 150 tasks and 15,000 input-output pairs, developed by native speakers to test complex reasoning and Chinese cultural nuances. To mitigate data contamination, we release only half of the dataset publicly, with the remainder kept private, and introduce diversified instructions to minimize score variance.
arXiv Detail & Related papers (2024-02-20T16:02:12Z)
BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP [17.362068473064717]
Large Language Models (LLMs) have emerged as one of the most important breakthroughs in NLP. This paper introduces BenLLM-Eval, which consists of a comprehensive evaluation of LLMs to benchmark their performance in the Bengali language. Our experimental results demonstrate that while in some Bengali NLP tasks, zero-shot LLMs could achieve performance on par, or even better than current SOTA fine-tuned models.
arXiv Detail & Related papers (2023-09-22T20:29:34Z)
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages. We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets. Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z)
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.