DocMamba: Efficient Document Pre-training with State Space Model
- URL: http://arxiv.org/abs/2409.11887v1
- Date: Wed, 18 Sep 2024 11:34:28 GMT
- Title: DocMamba: Efficient Document Pre-training with State Space Model
- Authors: Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Shuhang Liu, Jun Du, Jianshu Zhang,
- Abstract summary: We present DocMamba, a novel framework based on the state space model.
It is designed to reduce computational complexity to linear while preserving global modeling capabilities.
Experiments on the HRDoc confirm DocMamba's potential for length extrapolation.
- Score: 56.84200017560988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, visually-rich document understanding has attracted increasing attention. Transformer-based pre-trained models have become the mainstream approach, yielding significant performance gains in this field. However, the self-attention mechanism's quadratic computational complexity hinders their efficiency and ability to process long documents. In this paper, we present DocMamba, a novel framework based on the state space model. It is designed to reduce computational complexity to linear while preserving global modeling capabilities. To further enhance its effectiveness in document processing, we introduce the Segment-First Bidirectional Scan (SFBS) to capture contiguous semantic information. Experimental results demonstrate that DocMamba achieves new state-of-the-art results on downstream datasets such as FUNSD, CORD, and SORIE, while significantly improving speed and reducing memory usage. Notably, experiments on the HRDoc confirm DocMamba's potential for length extrapolation. The code will be available online.
Related papers
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception [16.301481927603554]
We introduce Doc-YOLO, a novel approach that enhances accuracy while maintaining speed advantages.
For robust document pre-training, we introduce the Mesh-candidate BestFit algorithm.
In terms of model optimization, we propose a Global-to-Local Controllable Receptive Module.
arXiv Detail & Related papers (2024-10-16T14:50:47Z) - Scaling Up Diffusion and Flow-based XGBoost Models [5.944645679491607]
We investigate a recent proposal to use XGBoost as the function approximator in diffusion and flow-matching models.
With better implementation it can be scaled to datasets 370x larger than previously used.
We present results on large-scale scientific datasets as part of the Fast Calorimeter Simulation Challenge.
arXiv Detail & Related papers (2024-08-28T18:00:00Z) - Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.
We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.
Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - Efficient Document Ranking with Learnable Late Interactions [73.41976017860006]
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval.
To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings.
Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer.
arXiv Detail & Related papers (2024-06-25T22:50:48Z) - ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich
Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement.
We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents.
Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - VAULT: VAriable Unified Long Text Representation for Machine Reading
Comprehension [31.639069657951747]
Existing models on Machine Reading require complex model architecture for modeling long texts with paragraph representation and classification.
We propose VAULT: a light-weight and parallel-efficient paragraph representation for MRC based on contextualized representation from long document input.
arXiv Detail & Related papers (2021-05-07T13:03:43Z) - Efficient Attentions for Long Document Summarization [25.234852272297598]
Hepos is a novel efficient encoder-decoder attention with head-wise positional strides.
We are able to process ten times more tokens than existing models that use full attentions.
arXiv Detail & Related papers (2021-04-05T18:45:13Z) - ERNIE-DOC: The Retrospective Long-Document Modeling Transformer [24.426571160930635]
We propose ERNIE-DOC, a document-level language pretraining model based on Recurrence Transformers.
Two well-designed techniques, namely the retrospective feed mechanism and the enhanced recurrence mechanism enable ERNIE-DOC with much longer effective context length.
Various experiments on both English and Chinese document-level tasks are conducted.
arXiv Detail & Related papers (2020-12-31T16:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.