HiPool: Modeling Long Documents Using Graph Neural Networks
- URL: http://arxiv.org/abs/2305.03319v2
- Date: Mon, 15 May 2023 03:48:36 GMT
- Title: HiPool: Modeling Long Documents Using Graph Neural Networks
- Authors: Irene Li, Aosong Feng, Dragomir Radev, Rex Ying
- Abstract summary: Long sequences in Natural Language Processing (NLP) are a challenging problem.
Recent pretraining language models achieve satisfying performances in many NLP tasks.
We propose a new challenging benchmark, totaling six datasets with up to 53k samples and 4034 average tokens' length.
- Score: 24.91040673099863
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Encoding long sequences in Natural Language Processing (NLP) is a challenging
problem. Though recent pretraining language models achieve satisfying
performances in many NLP tasks, they are still restricted by a pre-defined
maximum length, making them challenging to be extended to longer sequences. So
some recent works utilize hierarchies to model long sequences. However, most of
them apply sequential models for upper hierarchies, suffering from long
dependency issues. In this paper, we alleviate these issues through a
graph-based method. We first chunk the sequence with a fixed length to model
the sentence-level information. We then leverage graphs to model intra- and
cross-sentence correlations with a new attention mechanism. Additionally, due
to limited standard benchmarks for long document classification (LDC), we
propose a new challenging benchmark, totaling six datasets with up to 53k
samples and 4034 average tokens' length. Evaluation shows our model surpasses
competitive baselines by 2.6% in F1 score, and 4.8% on the longest sequence
dataset. Our method is shown to outperform hierarchical sequential models with
better performance and scalability, especially for longer sequences.
Related papers
- LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences.
We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook.
LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z) - LOCOST: State-Space Models for Long Document Abstractive Summarization [76.31514220737272]
We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs.
With a computational complexity of $O(L log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns.
arXiv Detail & Related papers (2024-01-31T15:33:37Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - No Length Left Behind: Enhancing Knowledge Tracing for Modeling
Sequences of Excessive or Insufficient Lengths [3.2687390531088414]
Knowledge tracing aims to predict students' responses to practices based on their historical question-answering behaviors.
As sequences get longer, computational costs will increase exponentially.
We propose a model called Sequence-Flexible Knowledge Tracing (SFKT)
arXiv Detail & Related papers (2023-08-07T11:30:58Z) - SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences.
We formulate sequence generation as an imitation learning (IL) problem.
This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset.
Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z) - Do Long-Range Language Models Actually Use Long-Range Context? [27.084888397778823]
Language models are generally trained on short, truncated input sequences.
Recent efforts to improve the efficiency of self-attention have led to a proliferation of long-range Transformer language models.
arXiv Detail & Related papers (2021-09-19T12:49:43Z) - Neural Latent Dependency Model for Sequence Labeling [47.32215014130811]
A classic approach to sequence labeling is linear chain conditional random fields (CRFs)
One limitation of linear chain CRFs is their inability to model long-range dependencies between labels.
High order CRFs extend linear chain CRFs by no longer than their order, but the computational complexity grows exponentially in the order.
We propose a Neural Latent Dependency Model (NLDM) that models arbitrary length between labels with a latent tree structure.
arXiv Detail & Related papers (2020-11-10T10:05:21Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.