Related papers: Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

URL: http://arxiv.org/abs/2301.10593v1
Date: Wed, 25 Jan 2023 13:55:14 GMT
Title: Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition
Authors: Denis Coquenet and Cl\'ement Chatelain and Thierry Paquet
Abstract summary: Faster DAN is a two-step strategy to speed up the recognition process at prediction time. It is at least 4 times faster on whole single-page and double-page images of the RIMES 2009, READ 2016 and MAURDOR datasets.
Score: 1.7875811547963403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in handwritten text recognition enabled to recognize whole documents in an end-to-end way: the Document Attention Network (DAN) recognizes the characters one after the other through an attention-based prediction process until reaching the end of the document. However, this autoregressive process leads to inference that cannot benefit from any parallelization optimization. In this paper, we propose Faster DAN, a two-step strategy to speed up the recognition process at prediction time: the model predicts the first character of each text line in the document, and then completes all the text lines in parallel through multi-target queries and a specific document positional encoding scheme. Faster DAN reaches competitive results compared to standard DAN, while being at least 4 times faster on whole single-page and double-page images of the RIMES 2009, READ 2016 and MAURDOR datasets. Source code and trained model weights are available at https://github.com/FactoDeepLearning/FasterDAN.

Related papers

Meta-DAN: towards an efficient prediction strategy for page-level handwritten text recognition [4.605037293860087]
We propose the Meta Document Attention Network (Meta-DAN) as a novel decoding strategy to reduce the prediction time. We evaluate the proposed approach on 10 full-page handwritten datasets and demonstrate state-of-the-art results on average in terms of character error rate.
arXiv Detail & Related papers (2025-04-04T11:06:09Z)
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription [8.143448433315319]
We explore the use of multi-modal large language models (MLLMs) for transcribing multi-page handwritten documents in a zero-shot setting. We propose a novel method, '+first page', which enhances MLLM transcription by providing the OCR output of the entire document along with just the first page image.
arXiv Detail & Related papers (2025-02-27T17:21:18Z)
Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings. First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss. Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z)
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding [103.05835688963947]
We propose a High-resolution DocCompressor module to compress each high-resolution document image into 324 tokens. DocOwl2 sets a new state-of-the-art across multi-page document understanding benchmarks and reduces first token latency by more than 50%. Compared to single-image MLLMs trained on similar data, our DocOwl2 achieves comparable single-page understanding performance with less than 20% of the visual tokens.
arXiv Detail & Related papers (2024-09-05T11:09:00Z)
Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding [23.061797784952855]
This paper introduces PAG, a novel optimization and decoding approach that guides autoregressive generation of document identifiers. Experiments on MSMARCO and TREC Deep Learning Track data reveal that PAG outperforms the state-of-the-art generative retrieval model by a large margin.
arXiv Detail & Related papers (2024-04-22T21:50:01Z)
REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking [11.374031643273941]
REXEL is a highly efficient and accurate model for the joint task of document level cIE (DocIE) It is on average 11 times faster than competitive existing approaches in a similar setting. The combination of speed and accuracy makes REXEL an accurate cost-efficient system for extracting structured information at web-scale.
arXiv Detail & Related papers (2024-04-19T11:04:27Z)
Context Perception Parallel Decoder for Scene Text Recognition [52.620841341333524]
Scene text recognition methods have struggled to attain high accuracy and fast inference speed. We present an empirical study of AR decoding in STR, and discover that the AR decoder not only models linguistic context, but also provides guidance on visual context perception. We construct a series of CPPD models and also plug the proposed modules into existing STR decoders. Experiments on both English and Chinese benchmarks demonstrate that the CPPD models achieve highly competitive accuracy while running approximately 8x faster than their AR-based counterparts.
arXiv Detail & Related papers (2023-07-23T09:04:13Z)
Towards End-to-end Handwritten Document Recognition [0.0]
Handwritten text recognition has been widely studied in the last decades for its numerous applications. In this thesis, we propose to tackle these issues by performing the handwritten text recognition of whole document in an end-to-end way. We reached state-of-the-art results at paragraph level on the RIMES 2011, IAM and READ 2016 datasets and outperformed the line-level state of the art on these datasets.
arXiv Detail & Related papers (2022-09-30T10:31:22Z)
Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval [79.37614949970013]
We propose a new dense retrieval model which learns diverse document representations with deep query interactions. Our model encodes each document with a set of generated pseudo-queries to get query-informed, multi-view document representations.
arXiv Detail & Related papers (2022-08-08T16:00:55Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
DAN: a Segmentation-free Document Attention Network for Handwritten Document Recognition [1.7875811547963403]
We propose an end-to-end segmentation-free architecture for handwritten document recognition. The model is trained to label text parts using begin and end tags in an XML-like fashion. We achieve competitive results on the READ dataset at page level, as well as double-page level with a CER of 3.53% and 3.69%, respectively.
arXiv Detail & Related papers (2022-03-23T08:40:42Z)
SDR: Efficient Neural Re-ranking using Succinct Document Representation [4.9278175139681215]
We propose the Succinct Document Representation scheme that computes emphhighly compressed intermediate document representations. Our method is highly efficient, achieving 4x-11.6x better compression rates for the same ranking quality.
arXiv Detail & Related papers (2021-10-03T07:43:16Z)
Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs) We compare their accuracy and performance on widely used public datasets of scene and handwritten text. Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.