A LongFormer-Based Framework for Accurate and Efficient Medical Text Summarization
- URL: http://arxiv.org/abs/2503.06888v1
- Date: Mon, 10 Mar 2025 03:33:45 GMT
- Title: A LongFormer-Based Framework for Accurate and Efficient Medical Text Summarization
- Authors: Dan Sun, Jacky He, Hanlu Zhang, Zhen Qi, Hongye Zheng, Xiaokai Wang,
- Abstract summary: This paper proposes a medical text summarization method based on LongFormer.<n>Traditional summarization methods are often limited by short-term memory.<n>LongFormer effectively captures long-range dependencies in the text, retaining more key information.
- Score: 3.4635278365524673
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper proposes a medical text summarization method based on LongFormer, aimed at addressing the challenges faced by existing models when processing long medical texts. Traditional summarization methods are often limited by short-term memory, leading to information loss or reduced summary quality in long texts. LongFormer, by introducing long-range self-attention, effectively captures long-range dependencies in the text, retaining more key information and improving the accuracy and information retention of summaries. Experimental results show that the LongFormer-based model outperforms traditional models, such as RNN, T5, and BERT in automatic evaluation metrics like ROUGE. It also receives high scores in expert evaluations, particularly excelling in information retention and grammatical accuracy. However, there is still room for improvement in terms of conciseness and readability. Some experts noted that the generated summaries contain redundant information, which affects conciseness. Future research will focus on further optimizing the model structure to enhance conciseness and fluency, achieving more efficient medical text summarization. As medical data continues to grow, automated summarization technology will play an increasingly important role in fields such as medical research, clinical decision support, and knowledge management.
Related papers
- Temporal Relation Extraction in Clinical Texts: A Span-based Graph Transformer Approach [3.5309406714258764]
We address the task of extracting clinical events and their temporal relations using the well-studied I2B2 2012 Temporal Relations Challenge corpus.
We introduce GRAPHTREX, a novel method integrating span-based entity-relation extraction, clinical large pre-trained language models (LPLMs), and Heterogeneous Graph Transformers (HGT)
This work not only advances temporal information extraction but also lays the groundwork for improved diagnostic and prognostic models through enhanced temporal reasoning.
arXiv Detail & Related papers (2025-03-23T14:34:49Z) - RAPID: Efficient Retrieval-Augmented Long Text Generation with Writing Planning and Information Discovery [69.41989381702858]
Existing methods, such as direct generation and multi-agent discussion, often struggle with issues like hallucinations, topic incoherence, and significant latency.<n>We propose RAPID, an efficient retrieval-augmented long text generation framework.<n>Our work provides a robust and efficient solution to the challenges of automated long-text generation.
arXiv Detail & Related papers (2025-03-02T06:11:29Z) - HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation [89.3260120072177]
We propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for Radiology report generation.<n>Our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression.<n> Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models.
arXiv Detail & Related papers (2024-12-15T06:04:16Z) - LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks [44.89857441408805]
LongBoX is a collection of seven medical datasets in text-to-text format.
Preliminary experiments reveal that both medical LLMs and strong general domain LLMs struggle on this benchmark.
We evaluate two techniques designed for long-sequence handling: (i) local-global attention, and (ii) Fusion-in-Decoder (FiD)
arXiv Detail & Related papers (2023-11-16T04:57:49Z) - On Preserving the Knowledge of Long Clinical Texts [0.0]
A bottleneck in using transformer encoders for processing clinical texts comes from the input length limit of these models.<n>This paper proposes a novel method to preserve the knowledge of long clinical texts in the models using aggregated ensembles of transformer encoders.
arXiv Detail & Related papers (2023-11-02T19:50:02Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Efficient Attentions for Long Document Summarization [25.234852272297598]
Hepos is a novel efficient encoder-decoder attention with head-wise positional strides.
We are able to process ten times more tokens than existing models that use full attentions.
arXiv Detail & Related papers (2021-04-05T18:45:13Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.