Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets
- URL: http://arxiv.org/abs/2011.07959v1
- Date: Thu, 22 Oct 2020 19:52:49 GMT
- Title: Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets
- Authors: Rahul Yedida, Saad Mohammad Abrar, Cleber Melo-Filho, Eugene Muratov,
Rada Chirkova, Alexander Tropsha
- Abstract summary: We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
- Score: 56.38623317907416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Objective: We aim to learn potential novel cures for diseases from
unstructured text sources. More specifically, we seek to extract drug-disease
pairs of potential cures to diseases by a simple reasoning over the structure
of spoken text.
Materials and Methods: We use Google Cloud to transcribe podcast episodes of
an NPR radio show. We then build a pipeline for systematically pre-processing
the text to ensure quality input to the core classification model, which feeds
to a series of post-processing steps for obtaining filtered results. Our
classification model itself uses a language model pre-trained on PubMed text.
The modular nature of our pipeline allows for ease of future developments in
this area by substituting higher quality components at each stage of the
pipeline. As a validation measure, we use ROBOKOP, an engine over a medical
knowledge graph with only validated pathways, as a ground truth source for
checking the existence of the proposed pairs. For the proposed pairs not found
in ROBOKOP, we provide further verification using Chemotext.
Results: We found 30.4% of our proposed pairs in the ROBOKOP database. For
example, our model successfully identified that Omeprazole can help treat
heartburn.We discuss the significance of this result, showing some examples of
the proposed pairs.
Discussion and Conclusion: The agreement of our results with the existing
knowledge source indicates a step in the right direction. Given the
plug-and-play nature of our framework, it is easy to add, remove, or modify
parts to improve the model as necessary. We discuss the results showing some
examples, and note that this is a potentially new line of research that has
further scope to be explored. Although our approach was originally oriented on
radio podcast transcripts, it is input-agnostic and could be applied to any
source of textual data and to any problem of interest.
Related papers
- Facilitating phenotyping from clinical texts: the medkit library [1.7924255866089314]
Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition.
Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs.
We developed an open-source Python library named medkit to facilitate the development, evaluation and reproductibility of phenotyping pipelines.
arXiv Detail & Related papers (2024-08-30T16:54:06Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Zero-shot information extraction from radiological reports using ChatGPT [19.457604666012767]
Information extraction is the strategy to transform the sequence of characters into structured data.
With the large language models achieving good performances on various downstream NLP tasks, it becomes possible to use large language models for zero-shot information extraction.
In this study, we aim to explore whether the most popular large language model, ChatGPT, can extract useful information from the radiological reports.
arXiv Detail & Related papers (2023-09-04T07:00:26Z) - SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction
and Drug Design [64.69434941796904]
We propose a novel setting and models for in-context drug synergy learning.
We are given a small "personalized dataset" of 10-20 drug synergy relationships in the context of specific cancer cell targets.
Our goal is to predict additional drug synergy relationships in that context.
arXiv Detail & Related papers (2023-06-19T17:03:46Z) - Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining.
We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data.
Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Graph Enhanced Contrastive Learning for Radiology Findings Summarization [25.377658879658306]
A section of a radiology report summarizes the most prominent observation from the findings.
We propose a unified framework for exploiting both extra knowledge and the original findings.
Key words and their relations can be extracted in an appropriate way to facilitate impression generation.
arXiv Detail & Related papers (2022-04-01T04:39:44Z) - Mining Adverse Drug Reactions from Unstructured Mediums at Scale [0.0]
Adverse drug reactions / events (ADR/ADE) have a major impact on patient health and health care costs.
Most ADR's are not reported via formal channels, but they are often documented in unstructured conversations.
We propose a natural language processing (NLP) solution that detects ADR's in such unstructured free-text conversations.
arXiv Detail & Related papers (2022-01-05T01:52:42Z) - Slot Filling for Biomedical Information Extraction [0.5330240017302619]
We present a slot filling approach to the task of biomedical IE.
We follow the proposed paradigm of coupling a Tranformer-based bi-encoder, Dense Passage Retrieval, with a Transformer-based reader model.
arXiv Detail & Related papers (2021-09-17T14:16:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.