Language Model Training Paradigms for Clinical Feature Embeddings
- URL: http://arxiv.org/abs/2311.00768v2
- Date: Tue, 6 Feb 2024 16:33:48 GMT
- Title: Language Model Training Paradigms for Clinical Feature Embeddings
- Authors: Yurong Hu, Manuel Burger, Gunnar R\"atsch, Rita Kuznetsova
- Abstract summary: We use self-supervised training paradigms for language models to learn high-quality clinical feature embeddings.
We visualize the learnt embeddings via unsupervised dimension reduction techniques and observe a high degree of consistency with prior clinical knowledge.
- Score: 1.4513150969598638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In research areas with scarce data, representation learning plays a
significant role. This work aims to enhance representation learning for
clinical time series by deriving universal embeddings for clinical features,
such as heart rate and blood pressure. We use self-supervised training
paradigms for language models to learn high-quality clinical feature
embeddings, achieving a finer granularity than existing time-step and
patient-level representation learning. We visualize the learnt embeddings via
unsupervised dimension reduction techniques and observe a high degree of
consistency with prior clinical knowledge. We also evaluate the model
performance on the MIMIC-III benchmark and demonstrate the effectiveness of
using clinical feature embeddings. We publish our code online for replication.
Related papers
- Named Clinical Entity Recognition Benchmark [2.9332007863461893]
This report introduces a Named Clinical Entity Recognition Benchmark.
It addresses the crucial natural language processing (NLP) task of extracting structured information from clinical narratives.
The leaderboard provides a standardized platform for assessing diverse language models.
arXiv Detail & Related papers (2024-10-07T14:00:18Z) - Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding [5.279406017862076]
The challenge of summarising a hospital course remains an open area for further research and development.
We adapted three pre trained LLMs, Llama 3, BioMistral, Mistral Instruct v0.1 for the hospital course summarisation task.
The fine tuned models were evaluated using BERTScore and ROUGE metrics to assess the effectiveness of clinical domain fine tuning.
arXiv Detail & Related papers (2024-09-23T00:35:23Z) - How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation [6.547981908229007]
We introduce a novel classification framework for time-series imputation using deep learning.
By identifying conceptual gaps in the literature and existing reviews, we devise a taxonomy grounded on the inductive bias of neural imputation frameworks.
arXiv Detail & Related papers (2024-07-11T12:33:28Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Aiming for Relevance [12.924312063047816]
We introduce novel vital sign prediction performance metrics that align with clinical contexts.
These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians.
We employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events.
arXiv Detail & Related papers (2024-03-27T15:11:07Z) - CoRelation: Boosting Automatic ICD Coding Through Contextualized Code
Relation Learning [56.782963838838036]
We propose a novel approach, a contextualized and flexible framework, to enhance the learning of ICD code representations.
Our approach employs a dependent learning paradigm that considers the context of clinical notes in modeling all possible code relations.
arXiv Detail & Related papers (2024-02-24T03:25:28Z) - On the Importance of Step-wise Embeddings for Heterogeneous Clinical
Time-Series [1.3285222309805063]
Recent advances in deep learning for sequence modeling have not fully transferred to tasks handling time-series from electronic health records.
In particular, in problems related to the Intensive Care Unit (ICU), the state-of-the-art remains to tackle sequence classification in a tabular manner with tree-based methods.
arXiv Detail & Related papers (2023-11-15T12:18:15Z) - TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic
Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment.
In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials.
We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.