Related papers: HiPool: Modeling Long Documents Using Graph Neural Networks

HiPool: Modeling Long Documents Using Graph Neural Networks

URL: http://arxiv.org/abs/2305.03319v2
Date: Mon, 15 May 2023 03:48:36 GMT
Title: HiPool: Modeling Long Documents Using Graph Neural Networks
Authors: Irene Li, Aosong Feng, Dragomir Radev, Rex Ying
Abstract summary: Long sequences in Natural Language Processing (NLP) are a challenging problem. Recent pretraining language models achieve satisfying performances in many NLP tasks. We propose a new challenging benchmark, totaling six datasets with up to 53k samples and 4034 average tokens' length.
Score: 24.91040673099863
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Encoding long sequences in Natural Language Processing (NLP) is a challenging problem. Though recent pretraining language models achieve satisfying performances in many NLP tasks, they are still restricted by a pre-defined maximum length, making them challenging to be extended to longer sequences. So some recent works utilize hierarchies to model long sequences. However, most of them apply sequential models for upper hierarchies, suffering from long dependency issues. In this paper, we alleviate these issues through a graph-based method. We first chunk the sequence with a fixed length to model the sentence-level information. We then leverage graphs to model intra- and cross-sentence correlations with a new attention mechanism. Additionally, due to limited standard benchmarks for long document classification (LDC), we propose a new challenging benchmark, totaling six datasets with up to 53k samples and 4034 average tokens' length. Evaluation shows our model surpasses competitive baselines by 2.6% in F1 score, and 4.8% on the longest sequence dataset. Our method is shown to outperform hierarchical sequential models with better performance and scalability, especially for longer sequences.

Related papers

Long Is More Important Than Difficult for Training Reasoning Models [21.369780872368143]
We show that reasoning length, rather than problem difficulty, primarily influences the performance of trained models. We present our model, Long1K-32B, which achieves remarkable performance with only 1,000 training samples.
arXiv Detail & Related papers (2025-03-23T13:33:59Z)
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation [74.89981179257194]
LongProc (Long Procedural Generation) is a new benchmark for evaluating long-context language models (LCLMs) LongProc consists of six diverse procedural generation tasks, such as extracting structured information from HTML pages into a TSV format and executing complex search procedures to create travel plans. We evaluate 23 LCLMs, including instruction-tuned models and recent reasoning models, on LongProc at three difficulty levels, with the maximum number of output tokens set at 500, 2K, and 8K.
arXiv Detail & Related papers (2025-01-09T18:16:55Z)
LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences. We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook. LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z)
LOCOST: State-Space Models for Long Document Abstractive Summarization [76.31514220737272]
We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of $O(L log L)$, this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns.
arXiv Detail & Related papers (2024-01-31T15:33:37Z)
Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z)
No Length Left Behind: Enhancing Knowledge Tracing for Modeling Sequences of Excessive or Insufficient Lengths [3.2687390531088414]
Knowledge tracing aims to predict students' responses to practices based on their historical question-answering behaviors. As sequences get longer, computational costs will increase exponentially. We propose a model called Sequence-Flexible Knowledge Tracing (SFKT)
arXiv Detail & Related papers (2023-08-07T11:30:58Z)
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking [60.109453252858806]
A maximum-likelihood (MLE) objective does not match a downstream use-case of autoregressively generating high-quality sequences. We formulate sequence generation as an imitation learning (IL) problem. This allows us to minimize a variety of divergences between the distribution of sequences generated by an autoregressive model and sequences from a dataset. Our resulting method, SequenceMatch, can be implemented without adversarial training or architectural changes.
arXiv Detail & Related papers (2023-06-08T17:59:58Z)
Do Long-Range Language Models Actually Use Long-Range Context? [27.084888397778823]
Language models are generally trained on short, truncated input sequences. Recent efforts to improve the efficiency of self-attention have led to a proliferation of long-range Transformer language models.
arXiv Detail & Related papers (2021-09-19T12:49:43Z)
Neural Latent Dependency Model for Sequence Labeling [47.32215014130811]
A classic approach to sequence labeling is linear chain conditional random fields (CRFs) One limitation of linear chain CRFs is their inability to model long-range dependencies between labels. High order CRFs extend linear chain CRFs by no longer than their order, but the computational complexity grows exponentially in the order. We propose a Neural Latent Dependency Model (NLDM) that models arbitrary length between labels with a latent tree structure.
arXiv Detail & Related papers (2020-11-10T10:05:21Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.