Enriching Unsupervised User Embedding via Medical Concepts
- URL: http://arxiv.org/abs/2203.10627v1
- Date: Sun, 20 Mar 2022 18:54:05 GMT
- Title: Enriching Unsupervised User Embedding via Medical Concepts
- Authors: Xiaolei Huang, Franck Dernoncourt, Mark Dredze
- Abstract summary: Unsupervised user embedding aims to encode patients into fixed-length vectors without human supervisions.
Medical concepts extracted from the clinical notes contain rich connections between patients and their clinical categories.
We propose a concept-aware unsupervised user embedding that jointly leverages text documents and medical concepts from two clinical corpora.
- Score: 51.17532619610099
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Clinical notes in Electronic Health Records (EHR) present rich documented
information of patients to inference phenotype for disease diagnosis and study
patient characteristics for cohort selection. Unsupervised user embedding aims
to encode patients into fixed-length vectors without human supervisions.
Medical concepts extracted from the clinical notes contain rich connections
between patients and their clinical categories. However, existing unsupervised
approaches of user embeddings from clinical notes do not explicitly incorporate
medical concepts. In this study, we propose a concept-aware unsupervised user
embedding that jointly leverages text documents and medical concepts from two
clinical corpora, MIMIC-III and Diabetes. We evaluate user embeddings on both
extrinsic and intrinsic tasks, including phenotype classification, in-hospital
mortality prediction, patient retrieval, and patient relatedness. Experiments
on the two clinical corpora show our approach exceeds unsupervised baselines,
and incorporating medical concepts can significantly improve the baseline
performance.
Related papers
- Next Visit Diagnosis Prediction via Medical Code-Centric Multimodal Contrastive EHR Modelling with Hierarchical Regularisation [0.0]
We propose NECHO, a novel medical code-centric multimodal contrastive EHR learning framework with hierarchical regularisation.
First, we integrate multifaceted information encompassing medical codes, demographics, and clinical notes using a tailored network design.
We also regularise modality-specific encoders using a parental level information in medical ontology to learn hierarchical structure of EHR data.
arXiv Detail & Related papers (2024-01-22T01:58:32Z) - MD-Manifold: A Medical-Distance-Based Representation Learning Approach
for Medical Concept and Patient Representation [6.795388490479779]
Representing medical concepts for healthcare analytical tasks requires incorporating medical domain knowledge and prior data information.
Our proposed framework, MD-Manifold, introduces a novel approach to medical concept and patient representation.
It includes a new data augmentation approach, concept distance metric, and patient-patient network to incorporate crucial medical domain knowledge and prior data information.
arXiv Detail & Related papers (2023-04-30T18:58:32Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - sEHR-CE: Language modelling of structured EHR data for efficient and
generalizable patient cohort expansion [0.0]
sEHR-CE is a novel framework based on transformers to enable integrated phenotyping and analyses of heterogeneous clinical datasets.
We validate our approach using primary and secondary care data from the UK Biobank, a large-scale research study.
arXiv Detail & Related papers (2022-11-30T16:00:43Z) - This Patient Looks Like That Patient: Prototypical Networks for
Interpretable Diagnosis Prediction from Clinical Text [56.32427751440426]
In clinical practice such models must not only be accurate, but provide doctors with interpretable and helpful results.
We introduce ProtoPatient, a novel method based on prototypical networks and label-wise attention.
We evaluate the model on two publicly available clinical datasets and show that it outperforms existing baselines.
arXiv Detail & Related papers (2022-10-16T10:12:07Z) - Modelling Patient Trajectories Using Multimodal Information [0.0]
We propose a solution to model patient trajectories that combines different types of information and considers the temporal aspect of clinical data.
The developed solution was evaluated on two different clinical outcomes, unexpected patient readmission and disease progression.
arXiv Detail & Related papers (2022-09-09T10:20:54Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - A Corpus for Detecting High-Context Medical Conditions in Intensive Care
Patient Notes Focusing on Frequently Readmitted Patients [28.668217175230822]
This dataset contains 1102 Discharge Summaries and 1000 Nursing Progress Notes.
Annotated phenotypes include treatment non-adherence, chronic pain, advanced/metastatic cancer, as well as 10 other phenotypes.
This dataset can be utilized for academic and industrial research in medicine and computer science, particularly within the field of medical natural language processing.
arXiv Detail & Related papers (2020-03-06T05:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.