Transformer-based Models for Long-Form Document Matching: Challenges and
Empirical Analysis
- URL: http://arxiv.org/abs/2302.03765v1
- Date: Tue, 7 Feb 2023 21:51:05 GMT
- Title: Transformer-based Models for Long-Form Document Matching: Challenges and
Empirical Analysis
- Authors: Akshita Jha, Adithya Samavedhi, Vineeth Rakesh, Jaideep Chandrashekar,
Chandan K. Reddy
- Abstract summary: We show that simple neural models outperform the more complex BERT-based models.
Simple models are also more robust to variations in document length and text perturbations.
- Score: 12.269318291685753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in the area of long document matching have primarily focused
on using transformer-based models for long document encoding and matching.
There are two primary challenges associated with these models. Firstly, the
performance gain provided by transformer-based models comes at a steep cost -
both in terms of the required training time and the resource (memory and
energy) consumption. The second major limitation is their inability to handle
more than a pre-defined input token length at a time. In this work, we
empirically demonstrate the effectiveness of simple neural models (such as
feed-forward networks, and CNNs) and simple embeddings (like GloVe, and
Paragraph Vector) over transformer-based models on the task of document
matching. We show that simple models outperform the more complex BERT-based
models while taking significantly less training time, energy, and memory. The
simple models are also more robust to variations in document length and text
perturbations.
Related papers
- sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting [6.434378359932152]
We review and categorize existing Transformer-based models into two main types: (1) modifications to the model structure and (2) modifications to the input data.
We propose $textbfsTransformer$, which introduces the Sequence and Temporal Convolutional Network (STCN) to fully capture both sequential and temporal information.
We compare our model with linear models and existing forecasting models on long-term time-series forecasting, achieving new state-of-the-art results.
arXiv Detail & Related papers (2024-08-19T06:23:41Z) - Long-Range Transformer Architectures for Document Understanding [1.9331361036118608]
Document Understanding (DU) was not left behind with first Transformer based models for DU dating from late 2019.
We introduce 2 new multi-modal (text + layout) long-range models for DU based on efficient implementations of Transformers for long sequences.
Relative 2D attention revealed to be effective on dense text for both normal and long-range models.
arXiv Detail & Related papers (2023-09-11T14:45:24Z) - Fourier Transformer: Fast Long Range Modeling by Removing Sequence
Redundancy with FFT Operator [24.690247474891958]
Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models.
Our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA.
For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART.
arXiv Detail & Related papers (2023-05-24T12:33:06Z) - An Exploration of Hierarchical Attention Transformers for Efficient Long
Document Classification [37.069127262896764]
Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents.
We develop and release fully pre-trained HAT models that use segment-wise followed by cross-segment encoders.
Our best HAT model outperforms equally-sized Longformer models while using 10-20% less GPU memory and processing documents 40-45% faster.
arXiv Detail & Related papers (2022-10-11T15:17:56Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - Coreference Resolution without Span Representations [20.84150608402576]
We introduce a lightweight coreference model that removes the dependency on span representations, handcrafted features, and NLPs.
Our model performs competitively with the current end-to-end model, while being simpler and more efficient.
arXiv Detail & Related papers (2021-01-02T11:46:51Z) - Train Large, Then Compress: Rethinking Model Size for Efficient Training
and Inference of Transformers [94.43313684188819]
We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute.
We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps.
This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models.
arXiv Detail & Related papers (2020-02-26T21:17:13Z) - Pre-training Tasks for Embedding-based Large-scale Retrieval [68.01167604281578]
We consider the large-scale query-document retrieval problem.
Given a query (e.g., a question), return the set of relevant documents from a large document corpus.
We show that the key ingredient of learning a strong embedding-based Transformer model is the set of pre-training tasks.
arXiv Detail & Related papers (2020-02-10T16:44:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.