Related papers: Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets

Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets

URL: http://arxiv.org/abs/2011.07959v1
Date: Thu, 22 Oct 2020 19:52:49 GMT
Title: Text Mining to Identify and Extract Novel Disease Treatments From Unstructured Datasets
Authors: Rahul Yedida, Saad Mohammad Abrar, Cleber Melo-Filho, Eugene Muratov, Rada Chirkova, Alexander Tropsha
Abstract summary: We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text. Our model successfully identified that Omeprazole can help treat heartburn.
Score: 56.38623317907416
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Objective: We aim to learn potential novel cures for diseases from unstructured text sources. More specifically, we seek to extract drug-disease pairs of potential cures to diseases by a simple reasoning over the structure of spoken text. Materials and Methods: We use Google Cloud to transcribe podcast episodes of an NPR radio show. We then build a pipeline for systematically pre-processing the text to ensure quality input to the core classification model, which feeds to a series of post-processing steps for obtaining filtered results. Our classification model itself uses a language model pre-trained on PubMed text. The modular nature of our pipeline allows for ease of future developments in this area by substituting higher quality components at each stage of the pipeline. As a validation measure, we use ROBOKOP, an engine over a medical knowledge graph with only validated pathways, as a ground truth source for checking the existence of the proposed pairs. For the proposed pairs not found in ROBOKOP, we provide further verification using Chemotext. Results: We found 30.4% of our proposed pairs in the ROBOKOP database. For example, our model successfully identified that Omeprazole can help treat heartburn.We discuss the significance of this result, showing some examples of the proposed pairs. Discussion and Conclusion: The agreement of our results with the existing knowledge source indicates a step in the right direction. Given the plug-and-play nature of our framework, it is easy to add, remove, or modify parts to improve the model as necessary. We discuss the results showing some examples, and note that this is a potentially new line of research that has further scope to be explored. Although our approach was originally oriented on radio podcast transcripts, it is input-agnostic and could be applied to any source of textual data and to any problem of interest.

Related papers

Facilitating phenotyping from clinical texts: the medkit library [1.7924255866089314]
Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition. Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. We developed an open-source Python library named medkit to facilitate the development, evaluation and reproductibility of phenotyping pipelines.
arXiv Detail & Related papers (2024-08-30T16:54:06Z)
Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z)
Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
Zero-shot information extraction from radiological reports using ChatGPT [19.457604666012767]
Information extraction is the strategy to transform the sequence of characters into structured data. With the large language models achieving good performances on various downstream NLP tasks, it becomes possible to use large language models for zero-shot information extraction. In this study, we aim to explore whether the most popular large language model, ChatGPT, can extract useful information from the radiological reports.
arXiv Detail & Related papers (2023-09-04T07:00:26Z)
SynerGPT: In-Context Learning for Personalized Drug Synergy Prediction and Drug Design [64.69434941796904]
We propose a novel setting and models for in-context drug synergy learning. We are given a small "personalized dataset" of 10-20 drug synergy relationships in the context of specific cancer cell targets. Our goal is to predict additional drug synergy relationships in that context.
arXiv Detail & Related papers (2023-06-19T17:03:46Z)
Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining. We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data. Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z)
Drug Synergistic Combinations Predictions via Large-Scale Pre-Training and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation. Deep learning models have emerged as an efficient way to discover synergistic combinations. Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z)
EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations. Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z)
Graph Enhanced Contrastive Learning for Radiology Findings Summarization [25.377658879658306]
A section of a radiology report summarizes the most prominent observation from the findings. We propose a unified framework for exploiting both extra knowledge and the original findings. Key words and their relations can be extracted in an appropriate way to facilitate impression generation.
arXiv Detail & Related papers (2022-04-01T04:39:44Z)
Mining Adverse Drug Reactions from Unstructured Mediums at Scale [0.0]
Adverse drug reactions / events (ADR/ADE) have a major impact on patient health and health care costs. Most ADR's are not reported via formal channels, but they are often documented in unstructured conversations. We propose a natural language processing (NLP) solution that detects ADR's in such unstructured free-text conversations.
arXiv Detail & Related papers (2022-01-05T01:52:42Z)
Slot Filling for Biomedical Information Extraction [0.5330240017302619]
We present a slot filling approach to the task of biomedical IE. We follow the proposed paradigm of coupling a Tranformer-based bi-encoder, Dense Passage Retrieval, with a Transformer-based reader model.
arXiv Detail & Related papers (2021-09-17T14:16:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.