Automatic Extraction of Disease Risk Factors from Medical Publications
- URL: http://arxiv.org/abs/2407.07373v1
- Date: Wed, 10 Jul 2024 05:17:55 GMT
- Title: Automatic Extraction of Disease Risk Factors from Medical Publications
- Authors: Maxim Rubchinsky, Ella Rabinovich, Adi Shraibman, Netanel Golan, Tali Sahar, Dorit Shweiki,
- Abstract summary: We present a novel approach to automating the identification of risk factors for diseases from medical literature.
We first identify relevant articles, then classify them based on the presence of risk factor discussions, and finally extract specific risk factor information for a disease.
Our contributions include the development of a comprehensive pipeline for the automated extraction of risk factors and the compilation of several datasets.
- Score: 1.321009936753118
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a novel approach to automating the identification of risk factors for diseases from medical literature, leveraging pre-trained models in the bio-medical domain, while tuning them for the specific task. Faced with the challenges of the diverse and unstructured nature of medical articles, our study introduces a multi-step system to first identify relevant articles, then classify them based on the presence of risk factor discussions and, finally, extract specific risk factor information for a disease through a question-answering model. Our contributions include the development of a comprehensive pipeline for the automated extraction of risk factors and the compilation of several datasets, which can serve as valuable resources for further research in this area. These datasets encompass a wide range of diseases, as well as their associated risk factors, meticulously identified and validated through a fine-grained evaluation scheme. We conducted both automatic and thorough manual evaluation, demonstrating encouraging results. We also highlight the importance of improving models and expanding dataset comprehensiveness to keep pace with the rapidly evolving field of medical research.
Related papers
- Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - A Survey of Deep Learning-based Radiology Report Generation Using Multimodal Data [41.8344712915454]
Automatic radiology report generation can alleviate the workload for physicians and minimize regional disparities in medical resources.
It is a challenging task, as the computational model needs to mimic physicians to obtain information from multi-modal input data.
Recent works emerged to address this issue using deep learning-based methods, such as transformers, contrastive learning, and knowledge-base construction.
This survey summarizes the key techniques developed in the most recent works and proposes a general workflow for deep learning-based report generation.
arXiv Detail & Related papers (2024-05-21T14:37:35Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Feasibility of Identifying Factors Related to Alzheimer's Disease and
Related Dementia in Real-World Data [56.7069469207376]
In total, we extracted 477 risk factors in 10 categories from 537 studies.
Genetic testing for AD/ADRD is still not a common practice and is poorly documented in both structured and unstructured EHRs.
Considering the constantly evolving research on AD/ADRD risk factors, literature mining via NLP methods offers a solution to automatically update our knowledge map.
arXiv Detail & Related papers (2024-02-03T18:17:19Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Machine Learning for Infectious Disease Risk Prediction: A Survey [14.030548098195258]
We systematically describe how machine learning can play an essential role in quantitatively characterizing disease transmission patterns.
We discuss challenges encountered when dealing with model inputs, designing task-oriented objectives, and conducting performance evaluation.
arXiv Detail & Related papers (2023-08-06T06:57:11Z) - Typology of Risks of Generative Text-to-Image Models [1.933681537640272]
This paper investigates the direct risks and harms associated with modern text-to-image generative models, such as DALL-E and Midjourney.
Our review reveals significant knowledge gaps concerning the understanding and treatment of these risks despite some already being addressed.
We identify 22 distinct risk types, spanning issues from data bias to malicious use.
arXiv Detail & Related papers (2023-07-08T20:33:30Z) - Data-Centric Epidemic Forecasting: A Survey [56.99209141838794]
This survey delves into various data-driven methodological and practical advancements.
We enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting.
We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems.
arXiv Detail & Related papers (2022-07-19T16:15:11Z) - Label Dependent Attention Model for Disease Risk Prediction Using
Multimodal Electronic Health Records [8.854691034104071]
Disease risk prediction has attracted increasing attention in the field of modern healthcare.
One challenge of applying AI models for risk prediction lies in generating interpretable evidence.
We propose the method of jointly embedding words and labels.
arXiv Detail & Related papers (2022-01-18T07:21:20Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Semi-Automating Knowledge Base Construction for Cancer Genetics [20.74608114488094]
We propose models to automatically surface key elements from full-text cancer genetics articles.
We induce distant supervision over tokens and snippets in full-text articles using the manually constructed knowledge base.
arXiv Detail & Related papers (2020-05-17T02:01:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.