Hi-Transformer: Hierarchical Interactive Transformer for Efficient and
Effective Long Document Modeling
- URL: http://arxiv.org/abs/2106.01040v1
- Date: Wed, 2 Jun 2021 09:30:29 GMT
- Title: Hi-Transformer: Hierarchical Interactive Transformer for Efficient and
Effective Long Document Modeling
- Authors: Chuhan Wu, Fangzhao Wu, Tao Qi, Yongfeng Huang
- Abstract summary: We propose a hierarchical interactive Transformer (Hi-Transformer) for efficient and effective long document modeling.
Hi-Transformer models documents in a hierarchical way, first learns sentence representations and then learns document representations.
Experiments on three benchmark datasets validate the efficiency and effectiveness of Hi-Transformer in long document modeling.
- Score: 51.79399904527525
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Transformer is important for text modeling. However, it has difficulty in
handling long documents due to the quadratic complexity with input text length.
In order to handle this problem, we propose a hierarchical interactive
Transformer (Hi-Transformer) for efficient and effective long document
modeling. Hi-Transformer models documents in a hierarchical way, i.e., first
learns sentence representations and then learns document representations. It
can effectively reduce the complexity and meanwhile capture global document
context in the modeling of each sentence. More specifically, we first use a
sentence Transformer to learn the representations of each sentence. Then we use
a document Transformer to model the global document context from these sentence
representations. Next, we use another sentence Transformer to enhance sentence
modeling using the global document context. Finally, we use hierarchical
pooling method to obtain document embedding. Extensive experiments on three
benchmark datasets validate the efficiency and effectiveness of Hi-Transformer
in long document modeling.
Related papers
- Long-Range Transformer Architectures for Document Understanding [1.9331361036118608]
Document Understanding (DU) was not left behind with first Transformer based models for DU dating from late 2019.
We introduce 2 new multi-modal (text + layout) long-range models for DU based on efficient implementations of Transformers for long sequences.
Relative 2D attention revealed to be effective on dense text for both normal and long-range models.
arXiv Detail & Related papers (2023-09-11T14:45:24Z) - Revisiting Transformer-based Models for Long Document Classification [31.60414185940218]
In real-world applications, multi-page multi-paragraph documents are common and cannot be efficiently encoded by vanilla Transformer-based models.
We compare different Transformer-based Long Document Classification (TrLDC) approaches that aim to mitigate the computational overhead of vanilla transformers.
We observe a clear benefit from being able to process longer text, and, based on our results, we derive practical advice of applying Transformer-based models on long document classification tasks.
arXiv Detail & Related papers (2022-04-14T00:44:36Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Hierarchical Transformers Are More Efficient Language Models [19.061388006885686]
Transformer models yield impressive results on many NLP and sequence modeling tasks.
Remarkably, Transformers can handle long sequences which allows them to produce long coherent outputs.
We postulate that having an explicit hierarchical architecture is the key to Transformers that efficiently handle long sequences.
arXiv Detail & Related papers (2021-10-26T14:00:49Z) - HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text
Extractive Summarization [57.798070356553936]
HETFORMER is a Transformer-based pre-trained model with multi-granularity sparse attentions for extractive summarization.
Experiments on both single- and multi-document summarization tasks show that HETFORMER achieves state-of-the-art performance in Rouge F1.
arXiv Detail & Related papers (2021-10-12T22:42:31Z) - Fastformer: Additive Attention Can Be All You Need [51.79399904527525]
We propose Fastformer, which is an efficient Transformer model based on additive attention.
In Fastformer, instead of modeling the pair-wise interactions between tokens, we first use additive attention mechanism to model global contexts.
In this way, Fastformer can achieve effective context modeling with linear complexity.
arXiv Detail & Related papers (2021-08-20T09:44:44Z) - Transformer-F: A Transformer network with effective methods for learning
universal sentence representation [8.225067988604351]
The Transformer model is widely used in natural language processing for sentence representation.
In this paper, two approaches are introduced to improve the performance of Transformers.
arXiv Detail & Related papers (2021-07-02T03:20:11Z) - Rethinking Document-level Neural Machine Translation [73.42052953710605]
We try to answer the question: Is the capacity of current models strong enough for document-level translation?
We observe that the original Transformer with appropriate training techniques can achieve strong results for document translation, even with a length of 2000 words.
arXiv Detail & Related papers (2020-10-18T11:18:29Z) - Applying the Transformer to Character-level Transduction [68.91664610425114]
The transformer has been shown to outperform recurrent neural network-based sequence-to-sequence models in various word-level NLP tasks.
We show that with a large enough batch size, the transformer does indeed outperform recurrent models for character-level tasks.
arXiv Detail & Related papers (2020-05-20T17:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.