DIVA-DAF: A Deep Learning Framework for Historical Document Image
Analysis
- URL: http://arxiv.org/abs/2201.08295v3
- Date: Thu, 15 Feb 2024 10:42:25 GMT
- Title: DIVA-DAF: A Deep Learning Framework for Historical Document Image
Analysis
- Authors: Lars V\"ogtlin, Anna Scius-Bertrand, Paul Maergner, Andreas Fischer,
Rolf Ingold
- Abstract summary: We propose an open-source deep learning framework, DIVA-DAF, specifically designed for historical document analysis.
It is easy to create one's own tasks with the benefit of powerful modules for loading data, even large data sets.
Thanks to its data module, the framework also allows to reduce the time of model training significantly.
- Score: 0.6551090704585544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning methods have shown strong performance in solving tasks for
historical document image analysis. However, despite current libraries and
frameworks, programming an experiment or a set of experiments and executing
them can be time-consuming. This is why we propose an open-source deep learning
framework, DIVA-DAF, which is based on PyTorch Lightning and specifically
designed for historical document analysis. Pre-implemented tasks such as
segmentation and classification can be easily used or customized. It is also
easy to create one's own tasks with the benefit of powerful modules for loading
data, even large data sets, and different forms of ground truth. The
applications conducted have demonstrated time savings for the programming of a
document analysis task, as well as for different scenarios such as pre-training
or changing the architecture. Thanks to its data module, the framework also
allows to reduce the time of model training significantly.
Related papers
- CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation [51.2289822267563]
We propose Corpus Retrieval and Augmentation for Fine-Tuning (CRAFT), a method for generating synthetic datasets.
We use large-scale public web-crawled corpora and similarity-based document retrieval to find other relevant human-written documents.
We demonstrate that CRAFT can efficiently generate large-scale task-specific training datasets for four diverse tasks.
arXiv Detail & Related papers (2024-09-03T17:54:40Z) - DistilDoc: Knowledge Distillation for Visually-Rich Document Applications [22.847266820057985]
This work explores knowledge distillation for visually-rich document applications such as document layout analysis (DLA) and document image classification (DIC)
We design a KD experimentation methodology for more lean, performant models on document understanding tasks that are integral within larger task pipelines.
We study what affects the teacher-student knowledge gap and find that some methods (tuned vanilla KD, MSE, SimKD with an apt projector) can consistently outperform supervised student training.
arXiv Detail & Related papers (2024-06-12T13:55:12Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - Data Efficient Training of a U-Net Based Architecture for Structured
Documents Localization [0.0]
We propose SDL-Net: a novel U-Net like encoder-decoder architecture for the localization of structured documents.
Our approach allows pre-training the encoder of SDL-Net on a generic dataset containing samples of various document classes.
arXiv Detail & Related papers (2023-10-02T07:05:19Z) - Data-Free Sketch-Based Image Retrieval [56.96186184599313]
We propose Data-Free (DF)-SBIR, where pre-trained, single-modality classification models have to be leveraged to learn cross-modal metric-space for retrieval without access to any training data.
We present a methodology for DF-SBIR, which can leverage knowledge from models independently trained to perform classification on photos and sketches.
Our method also achieves mAPs competitive with data-dependent approaches, all the while requiring no training data.
arXiv Detail & Related papers (2023-03-14T10:34:07Z) - Retrieval as Attention: End-to-end Learning of Retrieval and Reading
within a Single Transformer [80.50327229467993]
We show that a single model trained end-to-end can achieve both competitive retrieval and QA performance.
We show that end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings.
arXiv Detail & Related papers (2022-12-05T04:51:21Z) - Self-Supervised Visual Representation Learning Using Lightweight
Architectures [0.0]
In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine.
We critically examine the most notable pretext tasks to extract features from image data.
We study the performance of various self-supervised techniques keeping all other parameters uniform.
arXiv Detail & Related papers (2021-10-21T14:13:10Z) - LibFewShot: A Comprehensive Library for Few-shot Learning [78.58842209282724]
Few-shot learning, especially few-shot image classification, has received increasing attention and witnessed significant advances in recent years.
Some recent studies implicitly show that many generic techniques or tricks, such as data augmentation, pre-training, knowledge distillation, and self-supervision, may greatly boost the performance of a few-shot learning method.
We propose a comprehensive library for few-shot learning (LibFewShot) by re-implementing seventeen state-of-the-art few-shot learning methods in a unified framework with the same single intrinsic in PyTorch.
arXiv Detail & Related papers (2021-09-10T14:12:37Z) - LayoutParser: A Unified Toolkit for Deep Learning Based Document Image
Analysis [3.4253416336476246]
This paper introduces layoutparser, an open-source library for streamlining the usage of deep learning (DL) models in document image analysis (DIA) research and applications.
layoutparser comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks.
We demonstrate that layoutparser is helpful for both lightweight and large-scale pipelines in real-word use cases.
arXiv Detail & Related papers (2021-03-29T05:55:08Z) - Comparative Code Structure Analysis using Deep Learning for Performance
Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure.
Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z) - KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT)
All tasks in KILT are grounded in the same snapshot of Wikipedia.
We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.