Ontology-Driven and Weakly Supervised Rare Disease Identification from
Clinical Notes
- URL: http://arxiv.org/abs/2205.05656v5
- Date: Wed, 3 May 2023 06:55:48 GMT
- Title: Ontology-Driven and Weakly Supervised Rare Disease Identification from
Clinical Notes
- Authors: Hang Dong, V\'ictor Su\'arez-Paniagua, Huayu Zhang, Minhong Wang,
Arlene Casey, Emma Davidson, Jiaoyan Chen, Beatrice Alex, William Whiteley,
Honghan Wu
- Abstract summary: Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts.
We propose a method using brain and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT)
The weakly supervised approach is proposed to learn a confirmation phenotype model to improve Text-to-UMLS linking, without annotated data from domain experts.
- Score: 13.096008602034086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computational text phenotyping is the practice of identifying patients with
certain disorders and traits from clinical notes. Rare diseases are challenging
to be identified due to few cases available for machine learning and the need
for data annotation from domain experts. We propose a method using ontologies
and weak supervision, with recent pre-trained contextual representations from
Bi-directional Transformers (e.g. BERT). The ontology-based framework includes
two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking
mentions to concepts in Unified Medical Language System (UMLS), with a Named
Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with
customised rules and contextual mention representation; (ii) UMLS-to-ORDO,
matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology
(ORDO). The weakly supervised approach is proposed to learn a phenotype
confirmation model to improve Text-to-UMLS linking, without annotated data from
domain experts. We evaluated the approach on three clinical datasets, MIMIC-III
discharge summaries, MIMIC-III radiology reports, and NHS Tayside brain imaging
reports from two institutions in the US and the UK, with annotations. The
improvements in the precision were pronounced (by over 30% to 50% absolute
score for Text-to-UMLS linking), with almost no loss of recall compared to the
existing NER+L tool, SemEHR. Results on radiology reports from MIMIC-III and
NHS Tayside were consistent with the discharge summaries. The overall pipeline
processing clinical notes can extract rare disease cases, mostly uncaptured in
structured data (manually assigned ICD codes). We discuss the usefulness of the
weak supervision approach and propose directions for future studies.
Related papers
- Improving Extraction of Clinical Event Contextual Properties from Electronic Health Records: A Comparative Study [2.0884301753594334]
This study performs a comparative analysis of various natural language models for medical text classification.
BERT outperforms Bi-LSTM models by up to 28% and the baseline BERT model by up to 16% for recall of the minority classes.
arXiv Detail & Related papers (2024-08-30T10:28:49Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - SNOBERT: A Benchmark for clinical notes entity linking in the SNOMED CT clinical terminology [43.89160296332471]
We propose a method for linking text spans in clinical notes to specific concepts in the SNOMED CT using BERT-based models.
The method consists of two stages: candidate selection and candidate matching. The models were trained on one of the largest publicly available dataset of labeled clinical notes.
arXiv Detail & Related papers (2024-05-25T08:00:44Z) - Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance.
Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas.
We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in
Radiology [40.52487429030841]
We consider enhancing medical visual-language pre-training with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.
First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information.
Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field.
Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis
arXiv Detail & Related papers (2023-01-05T18:55:09Z) - A Marker-based Neural Network System for Extracting Social Determinants
of Health [12.6970199179668]
Social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known.
Many SDoH items are not coded in structured forms in electronic health records.
We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to extract SDoH information from clinical notes automatically.
arXiv Detail & Related papers (2022-12-24T18:40:23Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - MIMO: Mutual Integration of Patient Journey and Medical Ontology for
Healthcare Representation Learning [49.57261599776167]
We propose an end-to-end robust Transformer-based solution, Mutual Integration of patient journey and Medical Ontology (MIMO) for healthcare representation learning and predictive analytics.
arXiv Detail & Related papers (2021-07-20T07:04:52Z) - Rare Disease Identification from Clinical Notes with Ontologies and Weak
Supervision [3.6471045233540806]
We show that the Text-to-UMLS process can be greatly improved with weak supervision, without any annotated data from domain experts.
Our analysis shows that the overall pipeline processing discharge summaries can surface cases, which are mostly uncaptured in manual ICD codes of the hospital admissions.
arXiv Detail & Related papers (2021-05-05T11:49:09Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.