A Structured Bangla Dataset of Disease-Symptom Associations to Improve Diagnostic Accuracy
- URL: http://arxiv.org/abs/2506.13610v3
- Date: Sat, 26 Jul 2025 06:23:47 GMT
- Title: A Structured Bangla Dataset of Disease-Symptom Associations to Improve Diagnostic Accuracy
- Authors: Abdullah Al Shafi, Rowzatul Zannat, Abdul Muntakim, Mahmudul Hasan,
- Abstract summary: Disease-symptom datasets are significant and in demand for medical research, disease diagnosis, clinical decision-making, and AI-driven health management applications.<n>This study systematically compiles disease-symptom relationships from various online sources, medical literature, and publicly available health databases.<n>The data was gathered through analyzing peer-reviewed medical articles, clinical case studies, and disease-symptom association reports.
- Score: 0.562479170374811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Disease-symptom datasets are significant and in demand for medical research, disease diagnosis, clinical decision-making, and AI-driven health management applications. These datasets help identify symptom patterns associated with specific diseases, thus improving diagnostic accuracy and enabling early detection. The dataset presented in this study systematically compiles disease-symptom relationships from various online sources, medical literature, and publicly available health databases. The data was gathered through analyzing peer-reviewed medical articles, clinical case studies, and disease-symptom association reports. Only the verified medical sources were included in the dataset, while those from non-peer-reviewed and anecdotal sources were excluded. The dataset is structured in a tabular format, where the first column represents diseases, and the remaining columns represent symptoms. Each symptom cell contains a binary value, indicating whether a symptom is associated with a disease. Thereby, this structured representation makes the dataset very useful for a wide range of applications, including machine learning-based disease prediction, clinical decision support systems, and epidemiological studies. Although there are some advancements in the field of disease-symptom datasets, there is a significant gap in structured datasets for the Bangla language. This dataset aims to bridge that gap by facilitating the development of multilingual medical informatics tools and improving disease prediction models for underrepresented linguistic communities. Further developments should include region-specific diseases and further fine-tuning of symptom associations for better diagnostic performance
Related papers
- Patient Trajectory Prediction: Integrating Clinical Notes with Transformers [0.0]
We propose an approach that integrates unstructured clinical notes into transformer-based deep learning models for sequential disease prediction.<n> Experiments on MIMIC-IV datasets demonstrate that the proposed approach outperforms traditional models relying solely on structured data.
arXiv Detail & Related papers (2025-02-25T09:14:07Z) - Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - Explorative analysis of human disease-symptoms relations using the
Convolutional Neural Network [0.0]
This study aims to understand the extent of symptom types in disease prediction tasks.
Our results indicate that machine learning can potentially diagnose diseases with the 98-100% accuracy in the early stage.
arXiv Detail & Related papers (2023-02-23T15:02:07Z) - DDXPlus: A new Dataset for Medical Automatic Diagnosis [2.7126836481535213]
We present a large-scale synthetic dataset that includes a differential diagnosis, along with the ground truth pathology, for each patient.
As a proof-of-concept, we extend several existing AD and ASD systems to incorporate differential diagnosis.
We provide empirical evidence that using differentials in training signals is essential for such systems to learn to predict differentials.
arXiv Detail & Related papers (2022-05-18T18:03:39Z) - Domain Invariant Model with Graph Convolutional Network for Mammogram
Classification [49.691629817104925]
We propose a novel framework, namely Domain Invariant Model with Graph Convolutional Network (DIM-GCN)
We first propose a Bayesian network, which explicitly decomposes the latent variables into disease-related and other disease-irrelevant parts that are provable to be disentangled from each other.
To better capture the macroscopic features, we leverage the observed clinical attributes as a goal for reconstruction, via Graph Convolutional Network (GCN)
arXiv Detail & Related papers (2022-04-21T08:23:44Z) - Context-aware Health Event Prediction via Transition Functions on
Dynamic Disease Graphs [15.17817233616652]
Many machine learning approaches assume disease representations are static in different visits of a patient.
We propose a novel context-aware learning framework using transition functions on dynamic disease graphs.
Experimental results on two real-world EHR datasets show that the proposed model outperforms state of the art in predicting health events.
arXiv Detail & Related papers (2021-12-09T20:06:39Z) - Graph-Evolving Meta-Learning for Low-Resource Medical Dialogue
Generation [150.52617238140868]
We propose low-resource medical dialogue generation to transfer the diagnostic experience from source diseases to target ones.
We also develop a Graph-Evolving Meta-Learning framework that learns to evolve the commonsense graph for reasoning disease-symptom correlations in a new disease.
arXiv Detail & Related papers (2020-12-22T13:20:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - A Deep Learning Pipeline for Patient Diagnosis Prediction Using
Electronic Health Records [0.5672132510411464]
We develop and publish a Python package to transform public health dataset into easy to access universal format.
We propose two novel model architectures to predict multiple diagnoses simultaneously.
Both models can predict multiple diagnoses simultaneously with high accuracy.
arXiv Detail & Related papers (2020-06-23T14:58:58Z) - Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete
Labels [66.57101219176275]
Disease diagnosis on chest X-ray images is a challenging multi-label classification task.
We propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases.
Our method is the first to build a graph over the feature maps with a dynamic adjacency matrix for correlation learning.
arXiv Detail & Related papers (2020-02-26T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.