Related papers: LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

URL: http://arxiv.org/abs/2103.15348v1
Date: Mon, 29 Mar 2021 05:55:08 GMT
Title: LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis
Authors: Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li
Abstract summary: This paper introduces layoutparser, an open-source library for streamlining the usage of deep learning (DL) models in document image analysis (DIA) research and applications. layoutparser comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks. We demonstrate that layoutparser is helpful for both lightweight and large-scale pipelines in real-word use cases.
Score: 3.4253416336476246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though there have been on-going efforts to improve reusability and simplify deep learning (DL) model development in disciplines like natural language processing and computer vision, none of them are optimized for challenges in the domain of DIA. This represents a major gap in the existing toolkit, as DIA is central to academic research across a wide range of disciplines in the social sciences and humanities. This paper introduces layoutparser, an open-source library for streamlining the usage of DL in DIA research and applications. The core layoutparser library comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks. To promote extensibility, layoutparser also incorporates a community platform for sharing both pre-trained models and full document digitization pipelines. We demonstrate that layoutparser is helpful for both lightweight and large-scale digitization pipelines in real-word use cases. The library is publicly available at https://layout-parser.github.io/.

Related papers

A Library Perspective on Supervised Text Processing in Digital Libraries: An Investigation in the Biomedical Domain [3.9519587827662397]
We focus on relation extraction and text classification, using the showcase of eight biomedical benchmarks. We consider trade-offs between accuracy and application costs, dive into training data generation through distant supervision and large language models such as ChatGPT, LLama, and Olmo, and discuss how to design final pipelines.
arXiv Detail & Related papers (2024-11-06T07:54:10Z)
EduNLP: Towards a Unified and Modularized Library for Educational Resources [78.8523961816045]
We present a unified, modularized, and extensive library, EduNLP, focusing on educational resource understanding. In the library, we decouple the whole workflow to four key modules with consistent interfaces including data configuration, processing, model implementation, and model evaluation. For the current version, we primarily provide 10 typical models from four categories, and 5 common downstream-evaluation tasks in the education domain on 8 subjects for users' usage.
arXiv Detail & Related papers (2024-06-03T12:45:40Z)
U-DIADS-Bib: a full and few-shot pixel-precise dataset for document layout analysis of ancient manuscripts [9.76730765089929]
U-DIADS-Bib is a novel, pixel-precise, non-overlapping and noiseless document layout analysis dataset developed in close collaboration between specialists in the fields of computer vision and humanities. We propose a novel, computer-aided, segmentation pipeline in order to alleviate the burden represented by the time-consuming process of manual annotation.
arXiv Detail & Related papers (2024-01-16T15:11:18Z)
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model. We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z)
Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents [54.744701806413204]
Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. We test whether layout-infused LMs are robust to layout distribution shifts.
arXiv Detail & Related papers (2023-06-01T18:01:33Z)
ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding [52.3895498789521]
We propose ERNIE, a novel document pre-training solution with layout knowledge enhancement. We first rearrange input sequences in the serialization stage, then present a correlative pre-training task, reading order prediction, and learn the proper reading order of documents. Experimental results show ERNIE achieves superior performance on various downstream tasks, setting new state-of-the-art on key information, and document question answering.
arXiv Detail & Related papers (2022-10-12T12:59:24Z)
DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis [2.9923891863939938]
Document layout analysis is a key requirement for high-quality PDF document conversion. Deep-learning models have proven to be very effective at layout detection and segmentation. We present textitDocLayNet, a new, publicly available, document- annotation dataset.
arXiv Detail & Related papers (2022-06-02T14:25:12Z)
Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding. UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input. An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z)
DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis [0.6551090704585544]
We propose an open-source deep learning framework, DIVA-DAF, specifically designed for historical document analysis. It is easy to create one's own tasks with the benefit of powerful modules for loading data, even large data sets. Thanks to its data module, the framework also allows to reduce the time of model training significantly.
arXiv Detail & Related papers (2022-01-20T17:02:46Z)
One-shot Key Information Extraction from Document with Deep Partial Graph Matching [60.48651298832829]
Key Information Extraction (KIE) from documents improves efficiency, productivity, and security in many industrial scenarios. Existing supervised learning methods for the KIE task need to feed a large number of labeled samples and learn separate models for different types of documents. We propose a deep end-to-end trainable network for one-shot KIE using partial graph matching.
arXiv Detail & Related papers (2021-09-26T07:45:53Z)
CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes [2.750124853532831]
CLEVR dataset has been used extensively in language grounded visual reasoning in Machine Learning (ML) and Natural Language Processing (NLP) domains. We present a graph library for CLEVR that provides functionalities for object-centric attributes and relationships extraction, and construction of structural graph representations for dual modalities. We discuss downstream usage and applications of the library, and how it accelerates research for the NLP research community.
arXiv Detail & Related papers (2020-09-19T03:32:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.