CLIP: A Dataset for Extracting Action Items for Physicians from Hospital
Discharge Notes
- URL: http://arxiv.org/abs/2106.02524v1
- Date: Fri, 4 Jun 2021 14:49:02 GMT
- Title: CLIP: A Dataset for Extracting Action Items for Physicians from Hospital
Discharge Notes
- Authors: James Mullenbach, Yada Pruksachatkun, Sean Adler, Jennifer Seale,
Jordan Swartz, T. Greg McKelvey, Hui Dai, Yi Yang, David Sontag
- Abstract summary: We create a dataset of clinical action items annotated over MIMIC-III, the largest publicly available dataset of real clinical notes.
This dataset, which we call CLIP, is annotated by physicians and covers documents representing 100K sentences.
We describe the task of extracting the action items from these documents as multi-aspect extractive summarization, with each aspect representing a type of action to be taken.
- Score: 17.107315598110183
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continuity of care is crucial to ensuring positive health outcomes for
patients discharged from an inpatient hospital setting, and improved
information sharing can help. To share information, caregivers write discharge
notes containing action items to share with patients and their future
caregivers, but these action items are easily lost due to the lengthiness of
the documents. In this work, we describe our creation of a dataset of clinical
action items annotated over MIMIC-III, the largest publicly available dataset
of real clinical notes. This dataset, which we call CLIP, is annotated by
physicians and covers 718 documents representing 100K sentences. We describe
the task of extracting the action items from these documents as multi-aspect
extractive summarization, with each aspect representing a type of action to be
taken. We evaluate several machine learning models on this task, and show that
the best models exploit in-domain language model pre-training on 59K
unannotated documents, and incorporate context from neighboring sentences. We
also propose an approach to pre-training data selection that allows us to
explore the trade-off between size and domain-specificity of pre-training
datasets for this task.
Related papers
- NOTE: Notable generation Of patient Text summaries through Efficient
approach based on direct preference optimization [0.0]
"NOTE" stands for "Notable generation Of patient Text summaries through an Efficient approach based on direct preference optimization"
Patient events are sequentially combined and used to generate a discharge summary for each hospitalization.
Note can be utilized to generate various summaries not only discharge summaries but also throughout a patient's journey.
arXiv Detail & Related papers (2024-02-19T06:43:25Z) - Multimodal Pretraining of Medical Time Series and Notes [45.89025874396911]
Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data.
We propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes.
In downstream tasks, including in-hospital mortality prediction and phenotyping, our model outperforms baselines in settings where only a fraction of the data is labeled.
arXiv Detail & Related papers (2023-12-11T21:53:40Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - PULSAR: Pre-training with Extracted Healthcare Terms for Summarising
Patients' Problems and Data Augmentation with Black-box Large Language Models [25.363775123262307]
Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias.
BioNLP 2023 Shared Task 1A focuses on generating a list of diagnoses and problems from the provider's progress notes during hospitalisation.
One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients' problems summarised as a list.
Our approach was ranked second among all submissions to the shared task.
arXiv Detail & Related papers (2023-06-05T10:17:50Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z) - DICE: Data-Efficient Clinical Event Extraction with Generative Models [93.49354508621232]
Event extraction for the clinical domain is an under-explored research area.
We introduce DICE, a robust and data-efficient generative model for clinical event extraction.
Our experiments demonstrate state-of-the-art performances of DICE for clinical and news domain event extraction.
arXiv Detail & Related papers (2022-08-16T23:12:04Z) - Self-supervised Answer Retrieval on Clinical Notes [68.87777592015402]
We introduce CAPR, a rule-based self-supervision objective for training Transformer language models for domain-specific passage matching.
We apply our objective in four Transformer-based architectures: Contextual Document Vectors, Bi-, Poly- and Cross-encoders.
We report that CAPR outperforms strong baselines in the retrieval of domain-specific passages and effectively generalizes across rule-based and human-labeled passages.
arXiv Detail & Related papers (2021-08-02T10:42:52Z) - Towards Clinical Encounter Summarization: Learning to Compose Discharge
Summaries from Prior Notes [15.689048077818324]
This paper introduces the task of generating discharge summaries for a clinical encounter.
We introduce two new measures, faithfulness and hallucination rate for evaluation.
Results across seven medical sections and five models show that a summarization architecture that supports traceability yields promising results.
arXiv Detail & Related papers (2021-04-27T22:45:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.