Related papers: BanglaForge: LLM Collaboration with Self-Refinement for Bangla Code Generation

BanglaForge: LLM Collaboration with Self-Refinement for Bangla Code Generation

URL: http://arxiv.org/abs/2512.19122v1
Date: Mon, 22 Dec 2025 07:53:16 GMT
Title: BanglaForge: LLM Collaboration with Self-Refinement for Bangla Code Generation
Authors: Mahir Labib Dihan, Sadif Ahmed, Md Nafiu Rahman,
Abstract summary: We introduce BanglaForge, a novel framework for generating code from Bangla function descriptions.<n>On the BLP-2025 Bangla Code Generation benchmark, BanglaForge achieves a competitive Pass@1 accuracy of 84.00%.
Score: 0.2761313371455893
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bangla is a low-resource language for code generation, lacking large-scale annotated datasets and tools to transform natural language specifications into executable programs. This makes Bangla-to-code generation a challenging task requiring innovative solutions. To address this, we introduce BanglaForge, a novel framework for generating code from Bangla function descriptions. BanglaForge leverages a retrieval-augmented dual-model collaboration paradigm with self-refinement, combining in-context learning, llm-based translation, systematic prompt engineering, and iterative self-refinement based on execution feedback, where a coder generates initial solutions and a reviewer enhances them for robustness. On the BLP-2025 Bangla Code Generation benchmark, BanglaForge achieves a competitive Pass@1 accuracy of 84.00%, demonstrating the effectiveness of retrieval, model collaboration, and self-refinement for low-resource Bangla code generation.

Related papers

PyBangla at BLP-2025 Task 2: Enhancing Bangla-to-Python Code Generation with Iterative Self-Correction and Multilingual Agents [0.5735035463793009]
We introduce BanglaCodeAct, an agent-based framework for code generation in Bangla-to-Python.<n>BanglaCodeAct employs an open-source multilingual LLM within a Thought-Code-Observation loop, enabling dynamic generation, testing, and refinement of code from Bangla instructions.<n>Our results establish a new benchmark for Bangla-to-Python translation and highlight the potential of agent-based reasoning for reliable code generation in low-resource languages.
arXiv Detail & Related papers (2025-11-27T07:09:47Z)
Retriv at BLP-2025 Task 2: Test-Driven Feedback-Guided Framework for Bangla-to-Python Code Generation [7.459430148112738]
We propose a method that combines instruction prompting with a test-driven, feedback-guided iterative refinement process.<n>The model generates code from Bangla instructions, tests it against unit tests, and iteratively refines any failing outputs through three evaluation passes.<n>This approach helped our team "Retriv" to secure 2nd place in the shared task with a Pass@1 score of 0.934.
arXiv Detail & Related papers (2025-11-10T18:41:44Z)
TigerCoder: A Novel Suite of LLMs for Code Generation in Bangla [37.210208249613]
Despite being the 5th most spoken language, Bangla remains underrepresented in Large Language Models (LLMs)<n>This primarily stems from the scarcity of high-quality data to pre-train and/or finetune such models.<n>We offer three major contributions: (1) a comprehensive Bangla code instruction datasets for programming domain adaptation; (2) MBPP-Bangla, an evaluation benchmark for Bangla code generation; and (3) the TigerCoder-family of Code LLMs, achieving significant 11-18% performance gains at Pass@1 over existing multilingual and general-purpose Bangla LLMs.
arXiv Detail & Related papers (2025-09-11T02:25:49Z)
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers [103.4410890572479]
We introduce the Loong Project: an open-source framework for scalable synthetic data generation and verification.<n>LoongBench is a curated seed dataset containing 8,729 human-vetted examples across 12 domains.<n>LoongEnv is a modular synthetic data generation environment that supports multiple prompting strategies to produce new question-answer-code triples.
arXiv Detail & Related papers (2025-09-03T06:42:40Z)
Dream-Coder 7B: An Open Diffusion Language Model for Code [99.14959222355988]
We present Dream-Coder 7B, an open-source discrete diffusion language model for code generation that exhibits emergent any-order generation capabilities.<n>Unlike traditional autoregressive (AR) models that decode strictly left-to-right, Dream-Coder 7B adaptively determines its decoding strategy based on the coding task.
arXiv Detail & Related papers (2025-09-01T05:30:56Z)
CodeRAG-Bench: Can Retrieval Augment Code Generation? [78.37076502395699]
We conduct a systematic, large-scale analysis of code generation using retrieval-augmented generation.<n>We first curate a comprehensive evaluation benchmark, CodeRAG-Bench, encompassing three categories of code generation tasks.<n>We examine top-performing models on CodeRAG-Bench by providing contexts retrieved from one or multiple sources.
arXiv Detail & Related papers (2024-06-20T16:59:52Z)
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components. CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks. FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization. Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z)
Self-Infilling Code Generation [60.12883980846781]
We introduce self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding. We utilize this capability to introduce novel interruption and looping mechanisms in conventional decoding. Our proposed decoding process is effective in enhancing both regularity and quality across several code generation benchmarks.
arXiv Detail & Related papers (2023-11-29T16:02:06Z)
Coder Reviewer Reranking for Code Generation [56.80381384717]
We propose Coder-Reviewer reranking as a method for sampling diverse programs from a code language model and reranking with model likelihood. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement over reranking with the Coder model only. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyper parameters.
arXiv Detail & Related papers (2022-11-29T18:56:33Z)
BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset [3.922582192616519]
We present BanglaParaphrase, a high-quality synthetic Bangla Paraphrase dataset curated by a novel filtering pipeline. We aim to take a step towards alleviating the low resource status of the Bangla language in the NLP domain through the introduction of BanglaParaphrase.
arXiv Detail & Related papers (2022-10-11T02:52:31Z)
BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla [21.47743471497797]
This work presents a benchmark for evaluating natural language generation models in Bangla. We aggregate three challenging conditional text generation tasks under the BanglaNLG benchmark. Using a clean corpus of 27.5 GB of Bangla data, we pretrain BanglaT5, a sequence-to-sequence Transformer model for Bangla. BanglaT5 achieves state-of-the-art performance in all of these tasks, outperforming mT5 (base) by up to 5.4%.
arXiv Detail & Related papers (2022-05-23T06:54:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.