Making the Most Out of the Limited Context Length: Predictive Power
Varies with Clinical Note Type and Note Section
- URL: http://arxiv.org/abs/2307.07051v1
- Date: Thu, 13 Jul 2023 20:04:05 GMT
- Title: Making the Most Out of the Limited Context Length: Predictive Power
Varies with Clinical Note Type and Note Section
- Authors: Hongyi Zheng, Yixin Zhu, Lavender Yao Jiang, Kyunghyun Cho, Eric Karl
Oermann
- Abstract summary: We propose a framework to analyze the sections with high predictive power.
Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large.
- Score: 70.37720062263176
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recent advances in large language models have led to renewed interest in
natural language processing in healthcare using the free text of clinical
notes. One distinguishing characteristic of clinical notes is their long time
span over multiple long documents. The unique structure of clinical notes
creates a new design choice: when the context length for a language model
predictor is limited, which part of clinical notes should we choose as the
input? Existing studies either choose the inputs with domain knowledge or
simply truncate them. We propose a framework to analyze the sections with high
predictive power. Using MIMIC-III, we show that: 1) predictive power
distribution is different between nursing notes and discharge notes and 2)
combining different types of notes could improve performance when the context
length is large. Our findings suggest that a carefully selected sampling
function could enable more efficient information extraction from clinical
notes.
Related papers
- Improving Clinical Note Generation from Complex Doctor-Patient Conversation [20.2157016701399]
We present three key contributions to the field of clinical note generation using large language models (LLMs)
First, we introduce CliniKnote, a dataset consisting of 1,200 complex doctor-patient conversations paired with their full clinical notes.
Second, we propose K-SOAP, which enhances traditional SOAPcitepodder20soap (Subjective, Objective, Assessment, and Plan) notes by adding a keyword section at the top, allowing for quick identification of essential information.
Third, we develop an automatic pipeline to generate K-SOAP notes from doctor-patient conversations and benchmark various modern LLMs using various
arXiv Detail & Related papers (2024-08-26T18:39:31Z) - Preserving the knowledge of long clinical texts using aggregated
ensembles of large language models [0.0]
Clinical texts contain rich and valuable information that can be used for various clinical outcome prediction tasks.
Applying large language models, such as BERT-based models, to clinical texts poses two major challenges.
This paper proposes a novel method to preserve the knowledge of long clinical texts using aggregated ensembles of large language models.
arXiv Detail & Related papers (2023-11-02T19:50:02Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - A Meta-Evaluation of Faithfulness Metrics for Long-Form Hospital-Course
Summarization [2.8575516056239576]
Long-form clinical summarization of hospital admissions has real-world significance because of its potential to help both clinicians and patients.
We benchmark faithfulness metrics against fine-grained human annotations for model-generated summaries of a patient's Brief Hospital Course.
arXiv Detail & Related papers (2023-03-07T14:57:06Z) - A Comparative Study of Pretrained Language Models for Long Clinical Text [4.196346055173027]
We introduce two domain enriched language models, Clinical-Longformer and Clinical-BigBird, which are pre-trained on a large-scale clinical corpus.
We evaluate both language models using 10 baseline tasks including named entity recognition, question answering, natural language inference, and document classification tasks.
arXiv Detail & Related papers (2023-01-27T16:50:29Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - Assessing mortality prediction through different representation models
based on concepts extracted from clinical notes [2.707154152696381]
Learning of embedding is a method for converting notes into a format that makes them comparable.
Transformer-based representation models have recently made a great leap forward.
We performed experiments to measure the usefulness of the learned embedding vectors in the task of hospital mortality prediction.
arXiv Detail & Related papers (2022-07-22T04:34:33Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.