Length-Aware Multi-Kernel Transformer for Long Document Classification
- URL: http://arxiv.org/abs/2405.07052v1
- Date: Sat, 11 May 2024 16:48:06 GMT
- Title: Length-Aware Multi-Kernel Transformer for Long Document Classification
- Authors: Guangzeng Han, Jack Tsao, Xiaolei Huang,
- Abstract summary: Lengthy documents pose a unique challenge to neural language models due to substantial memory consumption.
We propose a Length-Aware Multi- Kernel Transformer (LAMKIT) to address the new challenges for the long document classification.
- Score: 4.796752450839119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lengthy documents pose a unique challenge to neural language models due to substantial memory consumption. While existing state-of-the-art (SOTA) models segment long texts into equal-length snippets (e.g., 128 tokens per snippet) or deploy sparse attention networks, these methods have new challenges of context fragmentation and generalizability due to sentence boundaries and varying text lengths. For example, our empirical analysis has shown that SOTA models consistently overfit one set of lengthy documents (e.g., 2000 tokens) while performing worse on texts with other lengths (e.g., 1000 or 4000). In this study, we propose a Length-Aware Multi-Kernel Transformer (LAMKIT) to address the new challenges for the long document classification. LAMKIT encodes lengthy documents by diverse transformer-based kernels for bridging context boundaries and vectorizes text length by the kernels to promote model robustness over varying document lengths. Experiments on five standard benchmarks from health and law domains show LAMKIT outperforms SOTA models up to an absolute 10.9% improvement. We conduct extensive ablation analyses to examine model robustness and effectiveness over varying document lengths.
Related papers
- Summarizing long regulatory documents with a multi-step pipeline [2.2591852560804675]
We show that the effectiveness of a two-step architecture for summarizing long regulatory texts varies depending on the model used.
For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for long-context encoder-decoder models, the extractive step worsens their performance.
arXiv Detail & Related papers (2024-08-19T08:07:25Z) - mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval [67.50604814528553]
We first introduce a text encoder enhanced with RoPE and unpadding, pre-trained in a native 8192-token context.
Then we construct a hybrid TRM and a cross-encoder reranker by contrastive learning.
arXiv Detail & Related papers (2024-07-29T03:12:28Z) - Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA [71.04146366608904]
Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows.
We propose a novel long-context benchmark, Loong, aligning with realistic scenarios through extended multi-document question answering (QA)
Loong introduces four types of tasks with a range of context lengths: Spotlight Locating, Comparison, Clustering, and Chain of Reasoning.
arXiv Detail & Related papers (2024-06-25T09:42:56Z) - Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning [68.43706033424378]
This study introduces an innovative method designed to increase in-context text length in large language models (MLLMs) efficiently.
We present Visualized In-Context Text Processing (VisInContext), which processes long in-context text using visual tokens.
This technique significantly reduces GPU memory usage and floating point operations (FLOPs) for both training and inferenceing stage.
arXiv Detail & Related papers (2024-06-04T17:59:25Z) - LOCOST: State-Space Models for Long Document Abstractive Summarization [76.31514220737272]
We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs.
With a computational complexity of $O(L log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns.
arXiv Detail & Related papers (2024-01-31T15:33:37Z) - Modeling Context With Linear Attention for Scalable Document-Level
Translation [72.41955536834702]
We investigate the efficacy of a recent linear attention model on document translation and augment it with a sentential gate to promote a recency inductive bias.
We show that sentential gating further improves translation quality on IWSLT.
arXiv Detail & Related papers (2022-10-16T03:41:50Z) - Adapting Pretrained Text-to-Text Models for Long Text Sequences [39.62224414485055]
We adapt an existing pretrained text-to-text model for long-sequence inputs.
We build a long-context model that achieves competitive performance on long-text QA tasks.
arXiv Detail & Related papers (2022-09-21T00:41:07Z) - Revisiting Transformer-based Models for Long Document Classification [31.60414185940218]
In real-world applications, multi-page multi-paragraph documents are common and cannot be efficiently encoded by vanilla Transformer-based models.
We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers.
We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.
arXiv Detail & Related papers (2022-04-14T00:44:36Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - ERNIE-DOC: The Retrospective Long-Document Modeling Transformer [24.426571160930635]
We propose ERNIE-DOC, a document-level language pretraining model based on Recurrence Transformers.
Two well-designed techniques, namely the retrospective feed mechanism and the enhanced recurrence mechanism enable ERNIE-DOC with much longer effective context length.
Various experiments on both English and Chinese document-level tasks are conducted.
arXiv Detail & Related papers (2020-12-31T16:12:48Z) - Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical
Encoder for Long-Form Document Matching [28.190001111358438]
We propose a Siamese Multi-depth Transformer-based SMITH for long-form document matching.
Our model contains several innovations to adapt self-attention models for longer text input.
We will open source a Wikipedia based benchmark dataset, code and a pre-trained checkpoint to accelerate future research on long-form document matching.
arXiv Detail & Related papers (2020-04-26T07:04:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.