Can Model Fusing Help Transformers in Long Document Classification? An
Empirical Study
- URL: http://arxiv.org/abs/2307.09532v1
- Date: Tue, 18 Jul 2023 18:21:26 GMT
- Title: Can Model Fusing Help Transformers in Long Document Classification? An
Empirical Study
- Authors: Damith Premasiri, Tharindu Ranasinghe, Ruslan Mitkov
- Abstract summary: Adapting NLP to multiple domains has introduced many new challenges for text classification.
The majority of the transformer models are limited to 512 tokens, and therefore, they struggle with long document classification problems.
In this research, we explore on employing Model Fusing for long document classification while comparing the results with well-known BERT and Longformer architectures.
- Score: 11.395215994671863
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text classification is an area of research which has been studied over the
years in Natural Language Processing (NLP). Adapting NLP to multiple domains
has introduced many new challenges for text classification and one of them is
long document classification. While state-of-the-art transformer models provide
excellent results in text classification, most of them have limitations in the
maximum sequence length of the input sequence. The majority of the transformer
models are limited to 512 tokens, and therefore, they struggle with long
document classification problems. In this research, we explore on employing
Model Fusing for long document classification while comparing the results with
well-known BERT and Longformer architectures.
Related papers
- Length-Aware Multi-Kernel Transformer for Long Document Classification [4.796752450839119]
Lengthy documents pose a unique challenge to neural language models due to substantial memory consumption.
We propose a Length-Aware Multi- Kernel Transformer (LAMKIT) to address the new challenges for the long document classification.
arXiv Detail & Related papers (2024-05-11T16:48:06Z) - Breaking the Token Barrier: Chunking and Convolution for Efficient Long
Text Classification with BERT [0.0]
Transformer-based models, specifically BERT, have propelled research in various NLP tasks.
BERT models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input.
We propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text.
arXiv Detail & Related papers (2023-10-31T15:41:08Z) - Attention over pre-trained Sentence Embeddings for Long Document
Classification [4.38566347001872]
transformers are often limited to short sequences due to their quadratic attention complexity on the number of tokens.
We suggest to take advantage of pre-trained sentence transformers to start from semantically meaningful embeddings of the individual sentences.
We report the results obtained by this simple architecture on three standard document classification datasets.
arXiv Detail & Related papers (2023-07-18T09:06:35Z) - Revisiting Transformer-based Models for Long Document Classification [31.60414185940218]
In real-world applications, multi-page multi-paragraph documents are common and cannot be efficiently encoded by vanilla Transformer-based models.
We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers.
We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.
arXiv Detail & Related papers (2022-04-14T00:44:36Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - DocNLI: A Large-scale Dataset for Document-level Natural Language
Inference [55.868482696821815]
Natural language inference (NLI) is formulated as a unified framework for solving various NLP problems.
This work presents DocNLI -- a newly-constructed large-scale dataset for document-level NLI.
arXiv Detail & Related papers (2021-06-17T13:02:26Z) - Long Range Arena: A Benchmark for Efficient Transformers [115.1654897514089]
Long-rangearena benchmark is a suite of tasks consisting of sequences ranging from $1K$ to $16K$ tokens.
We systematically evaluate ten well-established long-range Transformer models on our newly proposed benchmark suite.
arXiv Detail & Related papers (2020-11-08T15:53:56Z) - Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers.
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z) - Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical
Encoder for Long-Form Document Matching [28.190001111358438]
We propose a Siamese Multi-depth Transformer-based SMITH for long-form document matching.
Our model contains several innovations to adapt self-attention models for longer text input.
We will open source a Wikipedia based benchmark dataset, code and a pre-trained checkpoint to accelerate future research on long-form document matching.
arXiv Detail & Related papers (2020-04-26T07:04:08Z) - SPECTER: Document-level Representation Learning using Citation-informed
Transformers [51.048515757909215]
SPECTER generates document-level embedding of scientific documents based on pretraining a Transformer language model.
We introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction to document classification and recommendation.
arXiv Detail & Related papers (2020-04-15T16:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.