Rare Disease Identification from Clinical Notes with Ontologies and Weak
Supervision
- URL: http://arxiv.org/abs/2105.01995v1
- Date: Wed, 5 May 2021 11:49:09 GMT
- Title: Rare Disease Identification from Clinical Notes with Ontologies and Weak
Supervision
- Authors: Hang Dong, V\'ictor Su\'arez-Paniagua, Huayu Zhang, Minhong Wang, Emma
Whitfield, Honghan Wu
- Abstract summary: We show that the Text-to-UMLS process can be greatly improved with weak supervision, without any annotated data from domain experts.
Our analysis shows that the overall pipeline processing discharge summaries can surface cases, which are mostly uncaptured in manual ICD codes of the hospital admissions.
- Score: 3.6471045233540806
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The identification of rare diseases from clinical notes with Natural Language
Processing (NLP) is challenging due to the few cases available for machine
learning and the need of data annotation from clinical experts. We propose a
method using ontologies and weak supervision. The approach includes two steps:
(i) Text-to-UMLS, linking text mentions to concepts in Unified Medical Language
System (UMLS), with a named entity linking tool (e.g. SemEHR) and weak
supervision based on customised rules and Bidirectional Encoder Representations
from Transformers (BERT) based contextual representations, and (ii)
UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease
Ontology (ORDO). Using MIMIC-III discharge summaries as a case study, we show
that the Text-to-UMLS process can be greatly improved with weak supervision,
without any annotated data from domain experts. Our analysis shows that the
overall pipeline processing discharge summaries can surface rare disease cases,
which are mostly uncaptured in manual ICD codes of the hospital admissions.
Related papers
- Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - SNOBERT: A Benchmark for clinical notes entity linking in the SNOMED CT clinical terminology [43.89160296332471]
We propose a method for linking text spans in clinical notes to specific concepts in the SNOMED CT using BERT-based models.
The method consists of two stages: candidate selection and candidate matching. The models were trained on one of the largest publicly available dataset of labeled clinical notes.
arXiv Detail & Related papers (2024-05-25T08:00:44Z) - A Hybrid Framework with Large Language Models for Rare Disease Phenotyping [4.550497164299771]
Rare diseases pose significant challenges in diagnosis and treatment due to their low prevalence and heterogeneous clinical presentations.
This study aims to develop a hybrid approach combining dictionary-based natural language processing (NLP) tools with large language models (LLMs)
We propose a novel hybrid framework that integrates the Orphanet Rare Disease Ontology (ORDO) and the Unified Medical Language System (UMLS) to create a comprehensive rare disease vocabulary.
arXiv Detail & Related papers (2024-05-16T20:59:28Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - NEEDED: Introducing Hierarchical Transformer to Eye Diseases Diagnosis [5.608716029921948]
We present an effective automatic eye disease diagnosis framework, NEEDED.
A preprocessing module is integrated to improve the density and quality of information.
For the diagnosis part, we propose an attention-based predictor that enables traceable diagnosis by obtaining disease-specific information.
arXiv Detail & Related papers (2022-12-27T08:37:57Z) - Ontology-Driven and Weakly Supervised Rare Disease Identification from
Clinical Notes [13.096008602034086]
Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts.
We propose a method using brain and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT)
The weakly supervised approach is proposed to learn a confirmation phenotype model to improve Text-to-UMLS linking, without annotated data from domain experts.
arXiv Detail & Related papers (2022-05-11T17:38:24Z) - VBridge: Connecting the Dots Between Features, Explanations, and Data
for Healthcare Models [85.4333256782337]
VBridge is a visual analytics tool that seamlessly incorporates machine learning explanations into clinicians' decision-making workflow.
We identified three key challenges, including clinicians' unfamiliarity with ML features, lack of contextual information, and the need for cohort-level evidence.
We demonstrated the effectiveness of VBridge through two case studies and expert interviews with four clinicians.
arXiv Detail & Related papers (2021-08-04T17:34:13Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.