Related papers: LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks

LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks

URL: http://arxiv.org/abs/2311.09564v1
Date: Thu, 16 Nov 2023 04:57:49 GMT
Title: LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks
Authors: Mihir Parmar, Aakanksha Naik, Himanshu Gupta, Disha Agrawal, Chitta Baral
Abstract summary: LongBoX is a collection of seven medical datasets in text-to-text format. Preliminary experiments reveal that both medical LLMs and strong general domain LLMs struggle on this benchmark. We evaluate two techniques designed for long-sequence handling: (i) local-global attention, and (ii) Fusion-in-Decoder (FiD)
Score: 44.89857441408805
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Many large language models (LLMs) for medicine have largely been evaluated on short texts, and their ability to handle longer sequences such as a complete electronic health record (EHR) has not been systematically explored. Assessing these models on long sequences is crucial since prior work in the general domain has demonstrated performance degradation of LLMs on longer texts. Motivated by this, we introduce LongBoX, a collection of seven medical datasets in text-to-text format, designed to investigate model performance on long sequences. Preliminary experiments reveal that both medical LLMs (e.g., BioGPT) and strong general domain LLMs (e.g., FLAN-T5) struggle on this benchmark. We further evaluate two techniques designed for long-sequence handling: (i) local-global attention, and (ii) Fusion-in-Decoder (FiD). Our results demonstrate mixed results with long-sequence handling - while scores on some datasets increase, there is substantial room for improvement. We hope that LongBoX facilitates the development of more effective long-sequence techniques for the medical domain. Data and source code are available at https://github.com/Mihir3009/LongBoX.

Related papers

LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources [65.41986915457058]
Long-form generation is crucial for a wide range of practical applications. While short-to-long generations have received considerable attention, generating long texts from extremely long resources remains relatively underexplored. We propose LLM$times$MapReduce-V2, a novel test-time scaling strategy designed to enhance the ability of large language models to process extremely long inputs.
arXiv Detail & Related papers (2025-04-08T07:03:48Z)
A LongFormer-Based Framework for Accurate and Efficient Medical Text Summarization [3.4635278365524673]
This paper proposes a medical text summarization method based on LongFormer. Traditional summarization methods are often limited by short-term memory. LongFormer effectively captures long-range dependencies in the text, retaining more key information.
arXiv Detail & Related papers (2025-03-10T03:33:45Z)
LongEval: A Comprehensive Analysis of Long-Text Generation Through a Plan-based Paradigm [21.661578831520963]
Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks. Our analysis reveals that current LLMs struggle with length requirements and information density in long-text generation. We present LongEval, a benchmark that evaluates long-text generation through both direct and plan-based generation paradigms.
arXiv Detail & Related papers (2025-02-26T12:46:36Z)
Time2Lang: Bridging Time-Series Foundation Models and Large Language Models for Health Sensing Beyond Prompting [3.2688127177376227]
Large language models (LLMs) show promise for health applications when combined with behavioral sensing data. Traditional approaches convert sensor data into text prompts, but this process is prone to errors, computationally expensive, and requires domain expertise. Here, we present Time2Lang, a framework that directly maps TFM outputs to LLM representations without intermediate text conversion.
arXiv Detail & Related papers (2025-02-11T14:58:54Z)
HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation [89.3260120072177]
We propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for Radiology report generation. Our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression. Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models.
arXiv Detail & Related papers (2024-12-15T06:04:16Z)
Longitudinal Ensemble Integration for sequential classification with multimodal data [2.4554016712597138]
We developed Longitudinal Ensemble Integration (LEI) for sequential classification. We evaluated LEI's performance, and compared it against existing approaches, for the early detection of dementia. LEI's design also enabled the identification of features that were consistently important across time for the effective prediction of dementia-related diagnoses.
arXiv Detail & Related papers (2024-11-08T21:31:48Z)
Language Models can Self-Lengthen to Generate Long Texts [74.96074422345806]
This paper introduces an innovative iterative training framework called Self-Lengthen. It leverages only the intrinsic knowledge and skills of Large Language Models without the need for auxiliary data or proprietary models. Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation.
arXiv Detail & Related papers (2024-10-31T13:47:10Z)
Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement [62.87020831987625]
We propose a novel framework designed to identify the influential and high-quality samples enriched with long-range dependency relations. We select the most challenging samples as the influential data to effectively frame the long-range dependencies. Experiments indicate that GATEAU effectively identifies samples enriched with long-range dependency relations and the model trained on these selected samples exhibits better instruction-following and long-context understanding capabilities.
arXiv Detail & Related papers (2024-10-21T04:30:53Z)
LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models [61.12177317970258]
LongSkywork is a long-context Large Language Model capable of processing up to 200,000 tokens. We develop two novel methods for creating synthetic data. LongSkywork achieves outstanding performance on a variety of long-context benchmarks.
arXiv Detail & Related papers (2024-06-02T03:34:41Z)
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks [76.43527940649939]
We introduce Ada-LEval, a benchmark for evaluating the long-context understanding of large language models (LLMs) Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities. We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval.
arXiv Detail & Related papers (2024-04-09T17:30:48Z)
Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z)
How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling [37.247872987053654]
Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains. This work explores long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record system.
arXiv Detail & Related papers (2022-10-25T09:21:28Z)
Extend and Explain: Interpreting Very Long Language Models [0.0]
We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction. MSP identifies 1.7x more clinically informative text blocks than the previous state-of-the-art, runs up to 100x faster, and is tractable for generating important phrase pairs.
arXiv Detail & Related papers (2022-09-02T17:15:43Z)
Longformer: The Long-Document Transformer [40.18988262517733]
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. We introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention.
arXiv Detail & Related papers (2020-04-10T17:54:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.