BNLI: A Linguistically-Refined Bengali Dataset for Natural Language Inference
- URL: http://arxiv.org/abs/2511.08813v1
- Date: Thu, 13 Nov 2025 01:09:46 GMT
- Title: BNLI: A Linguistically-Refined Bengali Dataset for Natural Language Inference
- Authors: Farah Binta Haque, Md Yasin, Shishir Saha, Md Shoaib Akhter Rafi, Farig Sadeque,
- Abstract summary: Existing Bengali NLI datasets exhibit several inconsistencies, including annotation errors, ambiguous sentence pairs, and inadequate linguistic diversity.<n>We introduce BNLI, a refined and linguistically curated Bengali NLI dataset designed to support robust language understanding and inference modeling.<n>We benchmarked BNLI using a suite of state-of-the-art transformer-based architectures, including multilingual and Bengali-specific models, to assess their ability to capture complex semantic relations.
- Score: 1.7688536690159165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the growing progress in Natural Language Inference (NLI) research, resources for the Bengali language remain extremely limited. Existing Bengali NLI datasets exhibit several inconsistencies, including annotation errors, ambiguous sentence pairs, and inadequate linguistic diversity, which hinder effective model training and evaluation. To address these limitations, we introduce BNLI, a refined and linguistically curated Bengali NLI dataset designed to support robust language understanding and inference modeling. The dataset was constructed through a rigorous annotation pipeline emphasizing semantic clarity and balance across entailment, contradiction, and neutrality classes. We benchmarked BNLI using a suite of state-of-the-art transformer-based architectures, including multilingual and Bengali-specific models, to assess their ability to capture complex semantic relations in Bengali text. The experimental findings highlight the improved reliability and interpretability achieved with BNLI, establishing it as a strong foundation for advancing research in Bengali and other low-resource language inference tasks.
Related papers
- When Words Don't Mean What They Say: Figurative Understanding in Bengali Idioms [1.5840067220859924]
Figurative language understanding remains a significant challenge for Large Language Models (LLMs)<n>We introduce a new idiom dataset, a large-scale, culturally-grounded corpus of 10,361 Bengali idioms.<n>Each idiom is annotated under a comprehensive 19-field schema, established and refined through a deliberative expert consensus process.<n>We evaluate 30 state-of-the-art multilingual and instruction-tuned LLMs on the task of inferring figurative meaning.
arXiv Detail & Related papers (2026-02-13T13:26:11Z) - BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali [0.0]
We present BengaliFig, a compact yet richly annotated challenge set.<n>The dataset contains 435 unique riddles drawn from Bengali oral and literary traditions.<n>Each item is annotated along five dimensions capturing reasoning type, trap type, cultural depth, answer category, and difficulty.
arXiv Detail & Related papers (2025-11-25T15:26:47Z) - Evaluating LLMs' Multilingual Capabilities for Bengali: Benchmark Creation and Performance Analysis [0.0]
Bengali is an underrepresented language in NLP research.<n>We systematically investigate the challenges that hinder Bengali NLP performance.<n>Our findings reveal consistent performance gaps for Bengali compared to English.
arXiv Detail & Related papers (2025-07-31T05:16:43Z) - Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning [0.0]
Solving Bengali Math Word Problems (MWPs) remains a major challenge in natural language processing (NLP)<n>No human-annotated Bengali dataset has previously addressed this task.<n>We created SOMADHAN, a dataset of 8792 complex Bengali MWPs with manually written, step-by-step solutions.
arXiv Detail & Related papers (2025-05-27T15:47:10Z) - Cross-Lingual Pitfalls: Automatic Probing Cross-Lingual Weakness of Multilingual Large Language Models [55.14276067678253]
This paper introduces a novel methodology for efficiently identifying inherent cross-lingual weaknesses in Large Language Models (LLMs)<n>We construct a new dataset of over 6,000 bilingual pairs across 16 languages using this methodology, demonstrating its effectiveness in revealing weaknesses even in state-of-the-art models.<n>Further experiments investigate the relationship between linguistic similarity and cross-lingual weaknesses, revealing that linguistically related languages share similar performance patterns.
arXiv Detail & Related papers (2025-05-24T12:31:27Z) - Unraveling the Dominance of Large Language Models Over Transformer Models for Bangla Natural Language Inference: A Comprehensive Study [0.0]
Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP)
This study addresses the underexplored area of evaluating Large Language Models (LLMs) in low-resourced languages like Bengali.
arXiv Detail & Related papers (2024-05-05T13:57:05Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.<n>This survey delves into an important attribute of these datasets: the dialect of a language.<n>Motivated by the performance degradation of NLP models for dialectal datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer [50.40191599304911]
We introduce MoSECroT Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer.
In this paper, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language.
We show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines.
arXiv Detail & Related papers (2024-01-09T21:09:07Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - A Continuous Space Neural Language Model for Bengali Language [0.4799822253865053]
This paper proposes a continuous-space neural language model, or more specifically an ASGD weight dropped LSTM language model, along with techniques to efficiently train it for Bengali Language.
The proposed architecture outperforms its counterparts by achieving an inference perplexity as low as 51.2 on the held out data set for Bengali.
arXiv Detail & Related papers (2020-01-11T14:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.