Improving accuracy and speeding up Document Image Classification through
parallel systems
- URL: http://arxiv.org/abs/2006.09141v1
- Date: Tue, 16 Jun 2020 13:36:07 GMT
- Title: Improving accuracy and speeding up Document Image Classification through
parallel systems
- Authors: Javier Ferrando and Juan Luis Dominguez and Jordi Torres and Raul
Garcia and David Garcia and Daniel Garrido and Jordi Cortada and Mateo Valero
- Abstract summary: We show in the RVL-CDIP dataset that we can improve previous results with a much lighter model.
We present an ensemble pipeline which is able to boost solely image input.
Lastly, we expose the training performance differences between PyTorch and Deep Learning frameworks.
- Score: 4.102028235659611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a study showing the benefits of the EfficientNet models
compared with heavier Convolutional Neural Networks (CNNs) in the Document
Classification task, essential problem in the digitalization process of
institutions. We show in the RVL-CDIP dataset that we can improve previous
results with a much lighter model and present its transfer learning
capabilities on a smaller in-domain dataset such as Tobacco3482. Moreover, we
present an ensemble pipeline which is able to boost solely image input by
combining image model predictions with the ones generated by BERT model on
extracted text by OCR. We also show that the batch size can be effectively
increased without hindering its accuracy so that the training process can be
sped up by parallelizing throughout multiple GPUs, decreasing the computational
time needed. Lastly, we expose the training performance differences between
PyTorch and Tensorflow Deep Learning frameworks.
Related papers
- Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt
Learning with Data-Dependent Prior [14.232144691524528]
Recent Vision-Language Pretrained models have become the backbone for many downstream tasks.
MLE training can lead the context vector to over-fit dominant image features in the training data.
This paper presents a Bayesian-based framework of prompt learning, which could alleviate the overfitting issues on few-shot learning application.
arXiv Detail & Related papers (2024-01-09T10:15:59Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - CoV-TI-Net: Transferred Initialization with Modified End Layer for
COVID-19 Diagnosis [5.546855806629448]
Transfer learning is a relatively new learning method that has been employed in many sectors to achieve good performance with fewer computations.
In this research, the PyTorch pre-trained models (VGG19_bn and WideResNet -101) are applied in the MNIST dataset.
The proposed model is developed and verified in the Kaggle notebook, and it reached the outstanding accuracy of 99.77% without taking a huge computational time.
arXiv Detail & Related papers (2022-09-20T08:52:52Z) - On Robustness and Transferability of Convolutional Neural Networks [147.71743081671508]
Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts.
We study the interplay between out-of-distribution and transfer performance of modern image classification CNNs for the first time.
We find that increasing both the training set and model sizes significantly improve the distributional shift robustness.
arXiv Detail & Related papers (2020-07-16T18:39:04Z) - Learning to Learn Parameterized Classification Networks for Scalable
Input Images [76.44375136492827]
Convolutional Neural Networks (CNNs) do not have a predictable recognition behavior with respect to the input resolution change.
We employ meta learners to generate convolutional weights of main networks for various input scales.
We further utilize knowledge distillation on the fly over model predictions based on different input resolutions.
arXiv Detail & Related papers (2020-07-13T04:27:25Z) - Low-Dose CT Image Denoising Using Parallel-Clone Networks [9.318613261995406]
We propose a parallel-clone neural network method that exploits the benefit of parallel input, parallel-output loss, and clone-toclone feature transfer.
The proposed model keeps a similar or less number of unknown network weights as compared to conventional models but can accelerate the learning process significantly.
arXiv Detail & Related papers (2020-05-14T05:21:33Z) - Multi-task pre-training of deep neural networks for digital pathology [8.74883469030132]
We first assemble and transform many digital pathology datasets into a pool of 22 classification tasks and almost 900k images.
We show that our models used as feature extractors either improve significantly over ImageNet pre-trained models or provide comparable performance.
arXiv Detail & Related papers (2020-05-05T08:50:17Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z) - Radon cumulative distribution transform subspace modeling for image
classification [18.709734704950804]
We present a new supervised image classification method applicable to a broad class of image deformation models.
The method makes use of the previously described Radon Cumulative Distribution Transform (R-CDT) for image data.
In addition to the test accuracy performances, we show improvements in terms of computational efficiency.
arXiv Detail & Related papers (2020-04-07T19:47:26Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.