LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks
- URL: http://arxiv.org/abs/2311.09564v1
- Date: Thu, 16 Nov 2023 04:57:49 GMT
- Title: LongBoX: Evaluating Transformers on Long-Sequence Clinical Tasks
- Authors: Mihir Parmar, Aakanksha Naik, Himanshu Gupta, Disha Agrawal, Chitta
Baral
- Abstract summary: LongBoX is a collection of seven medical datasets in text-to-text format.
Preliminary experiments reveal that both medical LLMs and strong general domain LLMs struggle on this benchmark.
We evaluate two techniques designed for long-sequence handling: (i) local-global attention, and (ii) Fusion-in-Decoder (FiD)
- Score: 44.89857441408805
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Many large language models (LLMs) for medicine have largely been evaluated on
short texts, and their ability to handle longer sequences such as a complete
electronic health record (EHR) has not been systematically explored. Assessing
these models on long sequences is crucial since prior work in the general
domain has demonstrated performance degradation of LLMs on longer texts.
Motivated by this, we introduce LongBoX, a collection of seven medical datasets
in text-to-text format, designed to investigate model performance on long
sequences. Preliminary experiments reveal that both medical LLMs (e.g., BioGPT)
and strong general domain LLMs (e.g., FLAN-T5) struggle on this benchmark. We
further evaluate two techniques designed for long-sequence handling: (i)
local-global attention, and (ii) Fusion-in-Decoder (FiD). Our results
demonstrate mixed results with long-sequence handling - while scores on some
datasets increase, there is substantial room for improvement. We hope that
LongBoX facilitates the development of more effective long-sequence techniques
for the medical domain. Data and source code are available at
https://github.com/Mihir3009/LongBoX.
Related papers
- Longitudinal Ensemble Integration for sequential classification with multimodal data [2.4554016712597138]
We developed Longitudinal Ensemble Integration (LEI) for sequential classification.
We evaluated LEI's performance, and compared it against existing approaches, for the early detection of dementia.
LEI's design also enabled the identification of features that were consistently important across time for the effective prediction of dementia-related diagnoses.
arXiv Detail & Related papers (2024-11-08T21:31:48Z) - Language Models can Self-Lengthen to Generate Long Texts [74.96074422345806]
This paper introduces an innovative iterative training framework called Self-Lengthen.
It leverages only the intrinsic knowledge and skills of Large Language Models without the need for auxiliary data or proprietary models.
Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation.
arXiv Detail & Related papers (2024-10-31T13:47:10Z) - Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement [62.87020831987625]
We propose a novel framework designed to identify the influential and high-quality samples enriched with long-range dependency relations.
We select the most challenging samples as the influential data to effectively frame the long-range dependencies.
Experiments indicate that GATEAU effectively identifies samples enriched with long-range dependency relations and the model trained on these selected samples exhibits better instruction-following and long-context understanding capabilities.
arXiv Detail & Related papers (2024-10-21T04:30:53Z) - LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models [61.12177317970258]
LongSkywork is a long-context Large Language Model capable of processing up to 200,000 tokens.
We develop two novel methods for creating synthetic data.
LongSkywork achieves outstanding performance on a variety of long-context benchmarks.
arXiv Detail & Related papers (2024-06-02T03:34:41Z) - Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks [76.43527940649939]
We introduce Ada-LEval, a benchmark for evaluating the long-context understanding of large language models (LLMs)
Ada-LEval includes two challenging subsets, TSort and BestAnswer, which enable a more reliable evaluation of LLMs' long context capabilities.
We evaluate 4 state-of-the-art closed-source API models and 6 open-source models with Ada-LEval.
arXiv Detail & Related papers (2024-04-09T17:30:48Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - How Long Is Enough? Exploring the Optimal Intervals of Long-Range
Clinical Note Language Modeling [37.247872987053654]
Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains.
This work explores long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context.
We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record system.
arXiv Detail & Related papers (2022-10-25T09:21:28Z) - Extend and Explain: Interpreting Very Long Language Models [0.0]
We introduce a novel Masked Sampling Procedure (MSP) to identify the text blocks that contribute to a prediction.
MSP identifies 1.7x more clinically informative text blocks than the previous state-of-the-art, runs up to 100x faster, and is tractable for generating important phrase pairs.
arXiv Detail & Related papers (2022-09-02T17:15:43Z) - Longformer: The Long-Document Transformer [40.18988262517733]
Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length.
We introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer.
Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention.
arXiv Detail & Related papers (2020-04-10T17:54:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.