On the Accuracy of CRNNs for Line-Based OCR: A Multi-Parameter
Evaluation
- URL: http://arxiv.org/abs/2008.02777v1
- Date: Thu, 6 Aug 2020 17:20:56 GMT
- Title: On the Accuracy of CRNNs for Line-Based OCR: A Multi-Parameter
Evaluation
- Authors: Bernhard Liebl, Manuel Burghardt
- Abstract summary: We train a high quality optical character recognition (OCR) model for difficult historical typefaces on degraded paper.
We are able to obtain a 0.44% character error rate (CER) model from only 10,000 lines of training data.
We show ablations for all components of our training pipeline, which relies on the open source framework Calamari.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate how to train a high quality optical character recognition
(OCR) model for difficult historical typefaces on degraded paper. Through
extensive grid searches, we obtain a neural network architecture and a set of
optimal data augmentation settings. We discuss the influence of factors such as
binarization, input line height, network width, network depth, and other
network training parameters such as dropout. Implementing these findings into a
practical model, we are able to obtain a 0.44% character error rate (CER) model
from only 10,000 lines of training data, outperforming currently available
pretrained models that were trained on more than 20 times the amount of data.
We show ablations for all components of our training pipeline, which relies on
the open source framework Calamari.
Related papers
- Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - TEN-GUARD: Tensor Decomposition for Backdoor Attack Detection in Deep
Neural Networks [3.489779105594534]
We introduce a novel approach to backdoor detection using two tensor decomposition methods applied to network activations.
This has a number of advantages relative to existing detection methods, including the ability to analyze multiple models at the same time.
Results show that our method detects backdoored networks more accurately and efficiently than current state-of-the-art methods.
arXiv Detail & Related papers (2024-01-06T03:08:28Z) - Identifying and Mitigating Model Failures through Few-shot CLIP-aided
Diffusion Generation [65.268245109828]
We propose an end-to-end framework to generate text descriptions of failure modes associated with spurious correlations.
These descriptions can be used to generate synthetic data using generative models, such as diffusion models.
Our experiments have shown remarkable textbfimprovements in accuracy ($sim textbf21%$) on hard sub-populations.
arXiv Detail & Related papers (2023-12-09T04:43:49Z) - Optimizing the Neural Network Training for OCR Error Correction of
Historical Hebrew Texts [0.934612743192798]
This paper proposes an innovative method for training a light-weight neural network for Hebrew OCR post-correction using significantly less manually created data.
An analysis of historical OCRed newspapers was done to learn common language and corpus-specific OCR errors.
arXiv Detail & Related papers (2023-07-30T12:59:06Z) - Training Deep Surrogate Models with Large Scale Online Learning [48.7576911714538]
Deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs.
Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training.
It proposes an open source online training framework for deep surrogate models.
arXiv Detail & Related papers (2023-06-28T12:02:27Z) - Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time.
Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP.
Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z) - Offline Q-Learning on Diverse Multi-Task Data Both Scales And
Generalizes [100.69714600180895]
offline Q-learning algorithms exhibit strong performance that scales with model capacity.
We train a single policy on 40 games with near-human performance using up-to 80 million parameter networks.
Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal.
arXiv Detail & Related papers (2022-11-28T08:56:42Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Efficient deep learning models for land cover image classification [0.29748898344267777]
This work experiments with the BigEarthNet dataset for land use land cover (LULC) image classification.
We benchmark different state-of-the-art models, including Convolution Neural Networks, Multi-Layer Perceptrons, Visual Transformers, EfficientNets and Wide Residual Networks (WRN)
Our proposed lightweight model has an order of magnitude less trainable parameters, achieves 4.5% higher averaged f-score classification accuracy for all 19 LULC classes and is trained two times faster with respect to a ResNet50 state-of-the-art model that we use as a baseline.
arXiv Detail & Related papers (2021-11-18T00:03:14Z) - Parameter Prediction for Unseen Deep Architectures [23.79630072083828]
We study if we can use deep learning to directly predict parameters by exploiting the past knowledge of training other networks.
We propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU.
The proposed model achieves surprisingly good performance on unseen and diverse networks.
arXiv Detail & Related papers (2021-10-25T16:52:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.