BanglaMedQA and BanglaMMedBench: Evaluating Retrieval-Augmented Generation Strategies for Bangla Biomedical Question Answering
- URL: http://arxiv.org/abs/2511.04560v1
- Date: Thu, 06 Nov 2025 17:15:33 GMT
- Title: BanglaMedQA and BanglaMMedBench: Evaluating Retrieval-Augmented Generation Strategies for Bangla Biomedical Question Answering
- Authors: Sadia Sultana, Saiyma Sittul Muna, Mosammat Zannatul Samarukh, Ajwad Abrar, Tareque Mohmud Chowdhury,
- Abstract summary: This paper introduces the first large-scale datasets designed to evaluate reasoning and retrieval in medical artificial intelligence (AI)<n>The study applies and benchmarks several Retrieval-Augmented Generation (RAG) strategies, including Traditional, Zero-Shot Fallback, Agentic, Iterative Feedback, and Aggregate RAG.<n> Experimental results show that the Agentic RAG achieved the highest accuracy 89.54% with openai/gpt-oss-120b, outperforming other configurations and demonstrating superior rationale quality.
- Score: 0.13048920509133807
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Developing accurate biomedical Question Answering (QA) systems in low-resource languages remains a major challenge, limiting equitable access to reliable medical knowledge. This paper introduces BanglaMedQA and BanglaMMedBench, the first large-scale Bangla biomedical Multiple Choice Question (MCQ) datasets designed to evaluate reasoning and retrieval in medical artificial intelligence (AI). The study applies and benchmarks several Retrieval-Augmented Generation (RAG) strategies, including Traditional, Zero-Shot Fallback, Agentic, Iterative Feedback, and Aggregate RAG, combining textbook-based and web retrieval with generative reasoning to improve factual accuracy. A key novelty lies in integrating a Bangla medical textbook corpus through Optical Character Recognition (OCR) and implementing an Agentic RAG pipeline that dynamically selects between retrieval and reasoning strategies. Experimental results show that the Agentic RAG achieved the highest accuracy 89.54% with openai/gpt-oss-120b, outperforming other configurations and demonstrating superior rationale quality. These findings highlight the potential of RAG-based methods to enhance the reliability and accessibility of Bangla medical QA, establishing a foundation for future research in multilingual medical artificial intelligence.
Related papers
- MedBioRAG: Semantic Search and Retrieval-Augmented Generation with Large Language Models for Medical and Biological QA [0.0]
MedBioRAG is a retrieval-augmented model designed to improve biomedical question-answering (QA) performance.<n>We evaluate MedBioRAG across text retrieval, close-ended QA, and long-form QA tasks using benchmark datasets.
arXiv Detail & Related papers (2025-12-10T15:43:25Z) - From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering [26.91142737000078]
Existing approaches typically fall into two categories: Retrieval-Augmented Generation (RAG) and Generation-Augmented Generation (GAG)<n>We propose MedRGAG, a unified retrieval-generation augmented framework that seamlessly integrates external and parametric knowledge for medical QA.
arXiv Detail & Related papers (2025-10-21T04:58:29Z) - RAG-BioQA Retrieval-Augmented Generation for Long-Form Biomedical Question Answering [0.0]
We present RAG-BioQA, a novel framework combining retrieval-augmented generation with domain-specific fine-tuning to produce evidence-based, long-form answers.<n> Experimental results on the PubMedQA dataset show significant improvements over baselines.
arXiv Detail & Related papers (2025-10-02T02:49:09Z) - Biomedical Literature Q&A System Using Retrieval-Augmented Generation (RAG) [0.0]
This work presents a Biomedical Literature Question Answering (Q&A) system based on a Retrieval-Augmented Generation architecture.<n>The system integrates diverse sources, including PubMed articles, curated Q&A datasets, and medical encyclopedias.<n>The system supports both general medical queries and domain-specific tasks, with a focused evaluation on breast cancer literature.
arXiv Detail & Related papers (2025-09-05T21:29:52Z) - End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning [52.12425911708585]
Deep-DxSearch is an agentic RAG system trained end-to-end with reinforcement learning (RL)<n>In Deep-DxSearch, we first construct a large-scale medical retrieval corpus comprising patient records and reliable medical knowledge sources.<n> Experiments demonstrate that our end-to-end RL training framework consistently outperforms prompt-engineering and training-free RAG approaches.
arXiv Detail & Related papers (2025-08-21T17:42:47Z) - Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z) - MedGemma Technical Report [75.88152277443179]
We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B.<n>MedGemma demonstrates advanced medical understanding and reasoning on images and text.<n>We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
arXiv Detail & Related papers (2025-07-07T17:01:44Z) - DoctorRAG: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients [4.062920795080979]
Existing medical RAG systems mainly leverage knowledge from medical knowledge bases.<n>We propose DoctorRAG, a RAG framework that emulates doctor-like reasoning by integrating both explicit clinical knowledge and implicit case-based experience.<n>Our approach generates more accurate, relevant, and comprehensive responses, taking a step towards more doctor-like medical reasoning systems.
arXiv Detail & Related papers (2025-05-26T05:56:23Z) - Talk Before You Retrieve: Agent-Led Discussions for Better RAG in Medical QA [17.823588070044217]
We propose Discuss-RAG, a plug-and-play module designed to enhance the medical question answering system.<n>Our method introduces a summarizer agent that orchestrates a team of medical experts to emulate multi-turn brainstorming, thereby improving the relevance of retrieved content.<n> Experimental results on four benchmark medical QA datasets show that Discuss-RAG consistently outperforms MedRAG.
arXiv Detail & Related papers (2025-04-30T01:37:44Z) - Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.<n>Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.<n>We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z) - SeRTS: Self-Rewarding Tree Search for Biomedical Retrieval-Augmented Generation [50.26966969163348]
Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG)
Existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries.
We propose Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm.
arXiv Detail & Related papers (2024-06-17T06:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.