Exploring Optimal Granularity for Extractive Summarization of
Unstructured Health Records: Analysis of the Largest Multi-Institutional
Archive of Health Records in Japan
- URL: http://arxiv.org/abs/2209.10041v1
- Date: Tue, 20 Sep 2022 23:26:02 GMT
- Title: Exploring Optimal Granularity for Extractive Summarization of
Unstructured Health Records: Analysis of the Largest Multi-Institutional
Archive of Health Records in Japan
- Authors: Kenichiro Ando, Takashi OkumuraID, Mamoru Komachi, Hiromasa Horiguchi,
Yuji Matsumoto
- Abstract summary: "Discharge summaries" are one promising application of the summarization.
It remains unclear how the summaries should be generated from the unstructured source.
This study aimed to identify the optimal granularity in summarization.
- Score: 25.195233641408233
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated summarization of clinical texts can reduce the burden of medical
professionals. "Discharge summaries" are one promising application of the
summarization, because they can be generated from daily inpatient records. Our
preliminary experiment suggests that 20-31% of the descriptions in discharge
summaries overlap with the content of the inpatient records. However, it
remains unclear how the summaries should be generated from the unstructured
source. To decompose the physician's summarization process, this study aimed to
identify the optimal granularity in summarization. We first defined three types
of summarization units with different granularities to compare the performance
of the discharge summary generation: whole sentences, clinical segments, and
clauses. We defined clinical segments in this study, aiming to express the
smallest medically meaningful concepts. To obtain the clinical segments, it was
necessary to automatically split the texts in the first stage of the pipeline.
Accordingly, we compared rule-based methods and a machine learning method, and
the latter outperformed the formers with an F1 score of 0.846 in the splitting
task. Next, we experimentally measured the accuracy of extractive summarization
using the three types of units, based on the ROUGE-1 metric, on a
multi-institutional national archive of health records in Japan. The measured
accuracies of extractive summarization using whole sentences, clinical
segments, and clauses were 31.91, 36.15, and 25.18, respectively. We found that
the clinical segments yielded higher accuracy than sentences and clauses. This
result indicates that summarization of inpatient records demands finer
granularity than sentence-oriented processing. Although we used only Japanese
health records, it can be interpreted as follows: physicians extract "concepts
of medical significance" from patient records and recombine them ...
Related papers
- Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model [0.7373617024876725]
Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants.
The complex nature of unstructured medical texts presents challenges in efficiently identifying participants.
In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task.
arXiv Detail & Related papers (2024-04-24T20:42:28Z) - Attribute Structuring Improves LLM-Based Evaluation of Clinical Text
Summaries [62.32403630651586]
Large language models (LLMs) have shown the potential to generate accurate clinical text summaries, but still struggle with issues regarding grounding and evaluation.
Here, we explore a general mitigation framework using Attribute Structuring (AS), which structures the summary evaluation process.
AS consistently improves the correspondence between human annotations and automated metrics in clinical text summarization.
arXiv Detail & Related papers (2024-03-01T21:59:03Z) - FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence [46.71469172542448]
This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts.
It consists of 345 plain language summaries of abstracts generated from three randomized controlled trials (RCTs)
We assess the factuality of critical elements of RCTs in those summaries, as well as the reported findings concerning these.
arXiv Detail & Related papers (2024-02-18T04:45:01Z) - Making the Most Out of the Limited Context Length: Predictive Power
Varies with Clinical Note Type and Note Section [70.37720062263176]
We propose a framework to analyze the sections with high predictive power.
Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large.
arXiv Detail & Related papers (2023-07-13T20:04:05Z) - Detecting automatically the layout of clinical documents to enhance the
performances of downstream natural language processing [53.797797404164946]
We designed an algorithm to process clinical PDF documents and extract only clinically relevant text.
The algorithm consists of several steps: initial text extraction using a PDF, followed by classification into such categories as body text, left notes, and footers.
Medical performance was evaluated by examining the extraction of medical concepts of interest from the text in their respective sections.
arXiv Detail & Related papers (2023-05-23T08:38:33Z) - A Meta-Evaluation of Faithfulness Metrics for Long-Form Hospital-Course
Summarization [2.8575516056239576]
Long-form clinical summarization of hospital admissions has real-world significance because of its potential to help both clinicians and patients.
We benchmark faithfulness metrics against fine-grained human annotations for model-generated summaries of a patient's Brief Hospital Course.
arXiv Detail & Related papers (2023-03-07T14:57:06Z) - NapSS: Paragraph-level Medical Text Simplification via Narrative
Prompting and Sentence-matching Summarization [46.772517928718216]
We propose a summarize-then-simplify two-stage strategy, which we call NapSS.
NapSS identifies the relevant content to simplify while ensuring that the original narrative flow is preserved.
Our model achieves significantly better than the seq2seq baseline on an English medical corpus.
arXiv Detail & Related papers (2023-02-11T02:20:25Z) - A Unified Framework of Medical Information Annotation and Extraction for
Chinese Clinical Text [1.4841452489515765]
Current state-of-the-art (SOTA) NLP models are highly integrated with deep learning techniques.
This study presents an engineering framework of medical entity recognition, relation extraction and attribute extraction.
arXiv Detail & Related papers (2022-03-08T03:19:16Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - What's in a Summary? Laying the Groundwork for Advances in
Hospital-Course Summarization [2.432409923443071]
Given the documentation authored throughout a patient's hospitalization, generate a paragraph that tells the story of the patient admission.
We construct an English, text-to-text dataset of 109,000 hospitalizations (2M source notes) and their corresponding summary proxy: the clinician-authored "Brief Hospital Course"
Our analysis identifies multiple implications for modeling this complex, multi-document summarization task.
arXiv Detail & Related papers (2021-04-12T19:31:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.