PyBangla at BLP-2025 Task 2: Enhancing Bangla-to-Python Code Generation with Iterative Self-Correction and Multilingual Agents
- URL: http://arxiv.org/abs/2512.23713v1
- Date: Thu, 27 Nov 2025 07:09:47 GMT
- Title: PyBangla at BLP-2025 Task 2: Enhancing Bangla-to-Python Code Generation with Iterative Self-Correction and Multilingual Agents
- Authors: Jahidul Islam, Md Ataullha, Saiful Azad,
- Abstract summary: We introduce BanglaCodeAct, an agent-based framework for code generation in Bangla-to-Python.<n>BanglaCodeAct employs an open-source multilingual LLM within a Thought-Code-Observation loop, enabling dynamic generation, testing, and refinement of code from Bangla instructions.<n>Our results establish a new benchmark for Bangla-to-Python translation and highlight the potential of agent-based reasoning for reliable code generation in low-resource languages.
- Score: 0.5735035463793009
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: LLMs excel at code generation from English prompts, but this progress has not extended to low-resource languages. We address Bangla-to-Python code generation by introducing BanglaCodeAct, an agent-based framework that leverages multi-agent prompting and iterative self-correction. Unlike prior approaches relying on task-specific fine-tuning, BanglaCodeAct employs an open-source multilingual LLM within a Thought-Code-Observation loop, enabling dynamic generation, testing, and refinement of code from Bangla instructions. We benchmark several small-parameter open-source LLMs and evaluate their effectiveness on the mHumanEval dataset for Bangla NL2Code. Our results show that Qwen3-8B, when deployed with BanglaCodeAct, achieves the best performance, with pass@1 accuracy of 94.0\% on the development set and 71.6\% on the blind test set. These results establish a new benchmark for Bangla-to-Python translation and highlight the potential of agent-based reasoning for reliable code generation in low-resource languages. Experimental scripts are publicly available at github.com/jahidulzaid/PyBanglaCodeActAgent.
Related papers
- BanglaForge: LLM Collaboration with Self-Refinement for Bangla Code Generation [0.2761313371455893]
We introduce BanglaForge, a novel framework for generating code from Bangla function descriptions.<n>On the BLP-2025 Bangla Code Generation benchmark, BanglaForge achieves a competitive Pass@1 accuracy of 84.00%.
arXiv Detail & Related papers (2025-12-22T07:53:16Z) - Retriv at BLP-2025 Task 2: Test-Driven Feedback-Guided Framework for Bangla-to-Python Code Generation [7.459430148112738]
We propose a method that combines instruction prompting with a test-driven, feedback-guided iterative refinement process.<n>The model generates code from Bangla instructions, tests it against unit tests, and iteratively refines any failing outputs through three evaluation passes.<n>This approach helped our team "Retriv" to secure 2nd place in the shared task with a Pass@1 score of 0.934.
arXiv Detail & Related papers (2025-11-10T18:41:44Z) - TigerCoder: A Novel Suite of LLMs for Code Generation in Bangla [37.210208249613]
Despite being the 5th most spoken language, Bangla remains underrepresented in Large Language Models (LLMs)<n>This primarily stems from the scarcity of high-quality data to pre-train and/or finetune such models.<n>We offer three major contributions: (1) a comprehensive Bangla code instruction datasets for programming domain adaptation; (2) MBPP-Bangla, an evaluation benchmark for Bangla code generation; and (3) the TigerCoder-family of Code LLMs, achieving significant 11-18% performance gains at Pass@1 over existing multilingual and general-purpose Bangla LLMs.
arXiv Detail & Related papers (2025-09-11T02:25:49Z) - IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z) - ObscuraCoder: Powering Efficient Code LM Pre-Training Via Obfuscation Grounding [60.37988508851391]
Language models (LMs) have become a staple of the code-writing toolbox.<n>Research exploring modifications to Code-LMs' pre-training objectives, geared towards improving data efficiency and better disentangling between syntax and semantics, has been noticeably sparse.<n>In this work, we examine grounding on obfuscated code as a means of helping Code-LMs look beyond the surface-form syntax and enhance their pre-training sample efficiency.
arXiv Detail & Related papers (2025-03-27T23:08:53Z) - Executable Code Actions Elicit Better LLM Agents [76.95566120678787]
This work proposes to use Python code to consolidate Large Language Model (LLM) agents' actions into a unified action space (CodeAct)
integrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations through multi-turn interactions.
The encouraging performance of CodeAct motivates us to build an open-source LLM agent that interacts with environments by executing interpretable code and collaborates with users using natural language.
arXiv Detail & Related papers (2024-02-01T21:38:58Z) - CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model [58.127534002232096]
This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM.
It is specifically designed for code-related tasks with both English and Chinese prompts.
CodeFuse achieves its effectiveness by utilizing a high quality pre-training dataset.
arXiv Detail & Related papers (2023-10-10T02:38:44Z) - Code Llama: Open Foundation Models for Code [93.30115424203868]
We release Code Llama, a family of large language models for code based on Llama 2.
Code Llama reaches state-of-the-art performance among open models on several code benchmarks.
We release Code Llama under a permissive license that allows for both research and commercial use.
arXiv Detail & Related papers (2023-08-24T17:39:13Z) - LEVER: Learning to Verify Language-to-Code Generation with Execution [64.36459105535]
We propose LEVER, a simple approach to improve language-to-code generation by learning to verify the generated programs with their execution results.
Specifically, we train verifiers to determine whether a program sampled from the LLMs is correct or not based on the natural language input, the program itself and its execution results.
LEVER consistently improves over the base code LLMs(4.6% to 10.9% with code-davinci) and achieves new state-of-the-art results on all of them.
arXiv Detail & Related papers (2023-02-16T18:23:22Z) - End-to-End Natural Language Understanding Pipeline for Bangla
Conversational Agents [0.43012765978447565]
We propose a novel approach to build a business assistant which can communicate in Bangla and Bangla Transliteration in English with high confidence consistently.
We use Rasa Open Source Framework, fastText embeddings, Polyglot embeddings, Flask, and other systems as building blocks.
We present a pipeline for intent classification and entity extraction which achieves reasonable performance.
arXiv Detail & Related papers (2021-07-12T16:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.