Related papers: DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis

DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis

URL: http://arxiv.org/abs/2201.08295v3
Date: Thu, 15 Feb 2024 10:42:25 GMT
Title: DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis
Authors: Lars V\"ogtlin, Anna Scius-Bertrand, Paul Maergner, Andreas Fischer, Rolf Ingold
Abstract summary: We propose an open-source deep learning framework, DIVA-DAF, specifically designed for historical document analysis. It is easy to create one's own tasks with the benefit of powerful modules for loading data, even large data sets. Thanks to its data module, the framework also allows to reduce the time of model training significantly.
Score: 0.6551090704585544
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning methods have shown strong performance in solving tasks for historical document image analysis. However, despite current libraries and frameworks, programming an experiment or a set of experiments and executing them can be time-consuming. This is why we propose an open-source deep learning framework, DIVA-DAF, which is based on PyTorch Lightning and specifically designed for historical document analysis. Pre-implemented tasks such as segmentation and classification can be easily used or customized. It is also easy to create one's own tasks with the benefit of powerful modules for loading data, even large data sets, and different forms of ground truth. The applications conducted have demonstrated time savings for the programming of a document analysis task, as well as for different scenarios such as pre-training or changing the architecture. Thanks to its data module, the framework also allows to reduce the time of model training significantly.

Related papers

Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning [103.65680870130839]
We investigate how to design instruction data for the post-training phase of a long context pre-trained model. Our controlled study reveals that models instruction-tuned on short contexts can effectively generalize to longer ones. Based on these findings, we propose context synthesis, a novel data synthesis framework.
arXiv Detail & Related papers (2025-02-21T17:02:40Z)
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation [51.2289822267563]
We propose Corpus Retrieval and Augmentation for Fine-Tuning (CRAFT), a method for generating synthetic datasets. We use large-scale public web-crawled corpora and similarity-based document retrieval to find other relevant human-written documents. We demonstrate that CRAFT can efficiently generate large-scale task-specific training datasets for four diverse tasks.
arXiv Detail & Related papers (2024-09-03T17:54:40Z)
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications [22.847266820057985]
This work explores knowledge distillation for visually-rich document applications such as document layout analysis (DLA) and document image classification (DIC) We design a KD experimentation methodology for more lean, performant models on document understanding tasks that are integral within larger task pipelines. We study what affects the teacher-student knowledge gap and find that some methods (tuned vanilla KD, MSE, SimKD with an apt projector) can consistently outperform supervised student training.
arXiv Detail & Related papers (2024-06-12T13:55:12Z)
An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources. We propose a data processing framework that integrates a Processing Module and an Analyzing Module. The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z)
Data Efficient Training of a U-Net Based Architecture for Structured Documents Localization [0.0]
We propose SDL-Net: a novel U-Net like encoder-decoder architecture for the localization of structured documents. Our approach allows pre-training the encoder of SDL-Net on a generic dataset containing samples of various document classes.
arXiv Detail & Related papers (2023-10-02T07:05:19Z)
Data-Free Sketch-Based Image Retrieval [56.96186184599313]
We propose Data-Free (DF)-SBIR, where pre-trained, single-modality classification models have to be leveraged to learn cross-modal metric-space for retrieval without access to any training data. We present a methodology for DF-SBIR, which can leverage knowledge from models independently trained to perform classification on photos and sketches. Our method also achieves mAPs competitive with data-dependent approaches, all the while requiring no training data.
arXiv Detail & Related papers (2023-03-14T10:34:07Z)
Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer [80.50327229467993]
We show that a single model trained end-to-end can achieve both competitive retrieval and QA performance. We show that end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings.
arXiv Detail & Related papers (2022-12-05T04:51:21Z)
Self-Supervised Visual Representation Learning Using Lightweight Architectures [0.0]
In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine. We critically examine the most notable pretext tasks to extract features from image data. We study the performance of various self-supervised techniques keeping all other parameters uniform.
arXiv Detail & Related papers (2021-10-21T14:13:10Z)
LibFewShot: A Comprehensive Library for Few-shot Learning [78.58842209282724]
Few-shot learning, especially few-shot image classification, has received increasing attention and witnessed significant advances in recent years. Some recent studies implicitly show that many generic techniques or tricks, such as data augmentation, pre-training, knowledge distillation, and self-supervision, may greatly boost the performance of a few-shot learning method. We propose a comprehensive library for few-shot learning (LibFewShot) by re-implementing seventeen state-of-the-art few-shot learning methods in a unified framework with the same single intrinsic in PyTorch.
arXiv Detail & Related papers (2021-09-10T14:12:37Z)
LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis [3.4253416336476246]
This paper introduces layoutparser, an open-source library for streamlining the usage of deep learning (DL) models in document image analysis (DIA) research and applications. layoutparser comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks. We demonstrate that layoutparser is helpful for both lightweight and large-scale pipelines in real-word use cases.
arXiv Detail & Related papers (2021-03-29T05:55:08Z)
Comparative Code Structure Analysis using Deep Learning for Performance Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure. Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z)
KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT) All tasks in KILT are grounded in the same snapshot of Wikipedia. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.