GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO
- URL: http://arxiv.org/abs/2601.06767v1
- Date: Sun, 11 Jan 2026 03:49:18 GMT
- Title: GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO
- Authors: Shubhashis Roy Dipta, Khairul Mahbub, Nadia Najjar,
- Abstract summary: We present a Bengali mathematical reasoning model called GanitLLM.<n>We also present a new difficulty-aware Bengali math corpus and a curriculum-based GRPO pipeline.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a Bengali mathematical reasoning model called GanitLLM (named after the Bangla word for mathematics, "Ganit"), together with a new difficulty-aware Bengali math corpus and a curriculum-based GRPO pipeline. Bengali is one of the world's most widely spoken languages, yet existing LLMs either reason in English and then translate, or simply fail on multi-step Bengali math, in part because reinforcement learning recipes are tuned for high-resource languages and collapse under reward sparsity in low-resource settings. To address this, we construct Ganit, a rigorously filtered and decontaminated Bengali math dataset with automatic difficulty tags derived from the pass@k of a strong evaluator model. Building on this dataset, we propose Curriculum-GRPO, which combines multi-stage training (SFT + GRPO) with difficulty-aware sampling and verifiable rewards for format, numerical correctness, and Bengali reasoning. On Bn-MGSM and Bn-MSVAMP, GanitLLM-4B improves over its Qwen3-4B base by +8 and +7 accuracy points, respectively, while increasing the percentage of Bengali reasoning tokens from 14% to over 88% and reducing average solution length from 943 to 193 words.
Related papers
- BengaliFig: A Low-Resource Challenge for Figurative and Culturally Grounded Reasoning in Bengali [0.0]
We present BengaliFig, a compact yet richly annotated challenge set.<n>The dataset contains 435 unique riddles drawn from Bengali oral and literary traditions.<n>Each item is annotated along five dimensions capturing reasoning type, trap type, cultural depth, answer category, and difficulty.
arXiv Detail & Related papers (2025-11-25T15:26:47Z) - Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning [0.0]
Solving Bengali Math Word Problems (MWPs) remains a major challenge in natural language processing (NLP)<n>No human-annotated Bengali dataset has previously addressed this task.<n>We created SOMADHAN, a dataset of 8792 complex Bengali MWPs with manually written, step-by-step solutions.
arXiv Detail & Related papers (2025-05-27T15:47:10Z) - Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models [52.00446751692225]
We present a novel and simple yet effective method called textbfDictionary textbfInsertion textbfPrompting (textbfDIP)
When providing a non-English prompt, DIP looks up a word dictionary and inserts words' English counterparts into the prompt for LLMs.
It then enables better translation into English and better English model thinking steps which leads to obviously better results.
arXiv Detail & Related papers (2024-11-02T05:10:50Z) - Too Late to Train, Too Early To Use? A Study on Necessity and Viability of Low-Resource Bengali LLMs [2.309018557701645]
We aim to explore the question of whether there is a need for English-oriented Large Language Models dedicated to a low-resource language.<n>We compare the performance of open-weight and closed-source LLMs against fine-tuned encoder-decoder models.<n>Our findings reveal that while LLMs generally excel in reasoning tasks, their performance in tasks requiring Bengali script generation is inconsistent.
arXiv Detail & Related papers (2024-06-29T11:50:16Z) - Lean Workbook: A large-scale Lean problem set formalized from natural language math problems [51.15420267178]
Large language models are not good at math theorem proving using formal languages like Lean.<n>A significant challenge in this area is the scarcity of training data available in these formal languages.<n>We propose a novel pipeline that iteratively generates and filters synthetic data to translate natural language mathematical problems into Lean 4 statements.
arXiv Detail & Related papers (2024-06-06T08:25:43Z) - Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z) - BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP [17.362068473064717]
Large Language Models (LLMs) have emerged as one of the most important breakthroughs in NLP.
This paper introduces BenLLM-Eval, which consists of a comprehensive evaluation of LLMs to benchmark their performance in the Bengali language.
Our experimental results demonstrate that while in some Bengali NLP tasks, zero-shot LLMs could achieve performance on par, or even better than current SOTA fine-tuned models.
arXiv Detail & Related papers (2023-09-22T20:29:34Z) - Baichuan 2: Open Large-scale Language Models [51.34140526283222]
We present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens.<n>Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval.
arXiv Detail & Related papers (2023-09-19T04:13:22Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - Language Models are Multilingual Chain-of-Thought Reasoners [83.37148309771378]
We introduce the Multilingual Grade School Math benchmark, by manually translating 250 grade-school math problems into ten typologically diverse languages.
We find that the ability to solve MGSM problems via chain-of-thought prompting emerges with increasing model scale.
We show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment.
arXiv Detail & Related papers (2022-10-06T17:03:34Z) - Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New
Datasets for Bengali-English Machine Translation [6.2418269277908065]
Despite being the seventh most widely spoken language in the world, Bengali has received much less attention in machine translation literature due to being low in resources.
We build a customized sentence segmenter for Bengali and propose two novel methods for parallel corpus creation on low-resource setups.
With the segmenter and the two methods combined, we compile a high-quality Bengali-English parallel corpus comprising of 2.75 million sentence pairs.
arXiv Detail & Related papers (2020-09-20T06:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.