LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks
- URL: http://arxiv.org/abs/2311.09564v1
- Date: Thu, 16 Nov 2023 04:57:49 GMT
- Title: LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks
- Authors: Mihir Parmar, Aakanksha Naik, Himanshu Gupta, Disha Agrawal, Chitta
Baral
- Abstract summary: LongBoX is a collection of seven medical datasets in text-to-text format.
Preliminary experiments reveal that both medical LLMs and strong general domain LLMs struggle on this benchmark.
We evaluate two techniques designed for long-sequence handling: (i) local-global attention, and (ii) Fusion-in-Decoder (FiD)
- Score: 44.89857441408805
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many large language models (LLMs) for medicine have largely been evaluated on
short texts, and their ability to handle longer sequences such as a complete
electronic health record (EHR) has not been systematically explored. Assessing
these models on long sequences is crucial since prior work in the general
domain has demonstrated performance degradation of LLMs on longer texts.
Motivated by this, we introduce LongBoX, a collection of seven medical datasets
in text-to-text format, designed to investigate model performance on long
sequences. Preliminary experiments reveal that both medical LLMs (e.g., BioGPT)
and strong general domain LLMs (e.g., FLAN-T5) struggle on this benchmark. We
further evaluate two techniques designed for long-sequence handling: (i)
local-global attention, and (ii) Fusion-in-Decoder (FiD). Our results
demonstrate mixed results with long-sequence handling - while scores on some
datasets increase, there is substantial room for improvement. We hope that
LongBoX facilitates the development of more effective long-sequence techniques
for the medical domain. Data and source code are available at
https://github.com/Mihir3009/LongBoX.
Related papers
- Beyond Prompting: Time2Lang -- Bridging Time-Series Foundation Models and Large Language Models for Health Sensing [3.2688127177376227]
Large language models (LLMs) show promise for health applications when combined with behavioral sensing data.
Traditional approaches convert sensor data into text prompts, but this process is prone to errors, computationally expensive, and requires domain expertise.
Here, we present Time2Lang, a framework that directly maps TFM outputs to LLM representations without intermediate text conversion.
arXiv Detail & Related papers (2025-02-11T14:58:54Z) - HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation [89.3260120072177]
We propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for Radiology report generation.
Our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression.
Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models.
arXiv Detail & Related papers (2024-12-15T06:04:16Z) - Longitudinal Ensemble Integration for sequential classification with multimodal data [2.4554016712597138]
We developed Longitudinal Ensemble Integration (LEI) for sequential classification.
We evaluated LEI's performance, and compared it against existing approaches, for the early detection of dementia.
LEI's design also enabled the identification of features that were consistently important across time for the effective prediction of dementia-related diagnoses.
arXiv Detail & Related papers (2024-11-08T21:31:48Z) - Language Models can Self-Lengthen to Generate Long Texts [74.96074422345806]
This paper introduces an innovative iterative training framework called Self-Lengthen.
It leverages only the intrinsic knowledge and skills of Large Language Models without the need for auxiliary data or proprietary models.
Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation.
arXiv Detail & Related papers (2024-10-31T13:47:10Z) - LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models [61.12177317970258]
LongSkywork is a long-context Large Language Model capable of processing up to 200,000 tokens.
We develop two novel methods for creating synthetic data.
LongSkywork achieves outstanding performance on a variety of long-context benchmarks.
arXiv Detail & Related papers (2024-06-02T03:34:41Z) - Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks [76.43527940649939]
We introduce Ada-LEval, a benchmark for evaluating the long-context understanding of large language models (LLMs)
Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities.
We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval.
arXiv Detail & Related papers (2024-04-09T17:30:48Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - How Long Is Enough? Exploring the Optimal Intervals of Long-Range
Clinical Note Language Modeling [37.247872987053654]
Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains.
This work explores long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context.
We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record system.
arXiv Detail & Related papers (2022-10-25T09:21:28Z) - Longformer: The Long-Document Transformer [40.18988262517733]
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length.
We introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer.
Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention.
arXiv Detail & Related papers (2020-04-10T17:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.