Bridging the Gap in Bangla Healthcare: Machine Learning Based Disease Prediction Using a Symptoms-Disease Dataset
- URL: http://arxiv.org/abs/2601.12068v1
- Date: Sat, 17 Jan 2026 14:33:01 GMT
- Title: Bridging the Gap in Bangla Healthcare: Machine Learning Based Disease Prediction Using a Symptoms-Disease Dataset
- Authors: Rowzatul Zannat, Abdullah Al Shafi, Abdul Muntakim,
- Abstract summary: This study develops a comprehensive Bangla symptoms-disease dataset containing 758 unique symptom-disease relationships spanning 85 diseases.<n>The dataset enables the prediction of diseases based on Bangla symptom inputs, supporting healthcare accessibility for Bengali-speaking populations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Increased access to reliable health information is essential for non-English-speaking populations, yet resources in Bangla for disease prediction remain limited. This study addresses this gap by developing a comprehensive Bangla symptoms-disease dataset containing 758 unique symptom-disease relationships spanning 85 diseases. To ensure transparency and reproducibility, we also make our dataset publicly available. The dataset enables the prediction of diseases based on Bangla symptom inputs, supporting healthcare accessibility for Bengali-speaking populations. Using this dataset, we evaluated multiple machine learning models to predict diseases based on symptoms provided in Bangla and analyzed their performance on our dataset. Both soft and hard voting ensemble approaches combining top-performing models achieved 98\% accuracy, demonstrating superior robustness and generalization. Our work establishes a foundational resource for disease prediction in Bangla, paving the way for future advancements in localized health informatics and diagnostic tools. This contribution aims to enhance equitable access to health information for Bangla-speaking communities, particularly for early disease detection and healthcare interventions.
Related papers
- Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes [55.310195121276074]
We propose a Knowledge graph-enhanced, Prototype-aware, and Interpretable (KPI) framework to predict diseases.<n>It integrates structured and trusted medical knowledge into a unified disease knowledge graph, constructs clinically meaningful disease prototypes, and employs contrastive learning to enhance predictive accuracy.<n>It provides clinically valid explanations that closely align with patient narratives, highlighting its practical value for patient-centered healthcare delivery.
arXiv Detail & Related papers (2025-12-09T05:37:54Z) - Generative AI-Driven Decision-Making for Disease Control and Pandemic Preparedness Model 4.0 in Rural Communities of Bangladesh: Management Informatics Approach [0.7067443325368975]
Rural Bangladesh is confronted with substantial healthcare obstacles.<n>These obstacles impede effective disease control and pandemic preparedness.<n>The study concludes that the health resilience and pandemic preparedness of marginalized rural populations can be improved through AI-driven, localized disease control strategies.
arXiv Detail & Related papers (2025-08-02T01:54:16Z) - Integrating Knowledge Graphs and Bayesian Networks: A Hybrid Approach for Explainable Disease Risk Prediction [0.0]
We present a novel approach for constructing BNs from knowledge graphs and multimodal EHR data for explainable disease risk prediction.<n>We demonstrate that the approach balances generalised medical knowledge with patient-specific context, effectively handles uncertainty, is highly explainable, and achieves good predictive performance.
arXiv Detail & Related papers (2025-06-16T18:57:07Z) - A Structured Dataset of Disease-Symptom Associations to Improve Diagnostic Accuracy [0.7349727826230863]
Disease-symptom datasets are significant and in demand for medical research, disease diagnosis, clinical decision-making, and AI-driven health management applications.<n>This study systematically compiles disease-symptom relationships from various online sources, medical literature, and publicly available health databases.<n>The data was gathered through analyzing peer-reviewed medical articles, clinical case studies, and disease-symptom association reports.
arXiv Detail & Related papers (2025-06-16T15:38:39Z) - CureGraph: Contrastive Multi-Modal Graph Representation Learning for Urban Living Circle Health Profiling and Prediction [13.681538916025021]
We propose CureGraph, a contrastive multi-modal representation learning framework for urban health prediction.<n>CureGraph infers the prevalence of common chronic diseases among the elderly within the urban living circles of each neighborhood.<n>It captures cross-modal spatial dependencies, offering a comprehensive understanding of urban environments tailored to elderly health considerations.
arXiv Detail & Related papers (2025-01-13T09:30:38Z) - Artificial Intelligence for Infectious Disease Prediction and Prevention: A Comprehensive Review [1.4874449172133888]
The paper critically assesses the potential of AI and outlines its limitations in infectious disease management.
It categorizes contributions into three areas: predictions using Public Health Data to prevent the spread of a transmissible disease within a region; predictions using Patients' Medical Data to detect whether a person is infected by a transmissible disease; and predictions using both Public and patient medical data to estimate the extent of disease spread in a population.
arXiv Detail & Related papers (2024-11-14T00:43:32Z) - Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Predicting Parkinson's Disease with Multimodal Irregularly Collected
Longitudinal Smartphone Data [75.23250968928578]
Parkinsons Disease is a neurological disorder and prevalent in elderly people.
Traditional ways to diagnose the disease rely on in-person subjective clinical evaluations on the quality of a set of activity tests.
We propose a novel time-series based approach to predicting Parkinson's Disease with raw activity test data collected by smartphones in the wild.
arXiv Detail & Related papers (2020-09-25T01:50:15Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.