DWT-CompCNN: Deep Image Classification Network for High Throughput JPEG
2000 Compressed Documents
- URL: http://arxiv.org/abs/2306.01359v2
- Date: Sat, 15 Jul 2023 04:09:31 GMT
- Title: DWT-CompCNN: Deep Image Classification Network for High Throughput JPEG
2000 Compressed Documents
- Authors: Tejasvee Bisen, Mohammed Javed, Shashank Kirtania, P. Nagabhushan
- Abstract summary: DWT CompCNN is proposed for classification of documents that are compressed using High Throughput JPEG 2000 (HTJ2K) algorithm.
The proposed model is time and space efficient, and also achieves a better classification accuracy in compressed domain.
- Score: 0.9405458160620535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For any digital application with document images such as retrieval, the
classification of document images becomes an essential stage. Conventionally
for the purpose, the full versions of the documents, that is the uncompressed
document images make the input dataset, which poses a threat due to the big
volume required to accommodate the full versions of the documents. Therefore,
it would be novel, if the same classification task could be accomplished
directly (with some partial decompression) with the compressed representation
of documents in order to make the whole process computationally more efficient.
In this research work, a novel deep learning model, DWT CompCNN is proposed for
classification of documents that are compressed using High Throughput JPEG 2000
(HTJ2K) algorithm. The proposed DWT-CompCNN comprises of five convolutional
layers with filter sizes of 16, 32, 64, 128, and 256 consecutively for each
increasing layer to improve learning from the wavelet coefficients extracted
from the compressed images. Experiments are performed on two benchmark
datasets- Tobacco-3482 and RVL-CDIP, which demonstrate that the proposed model
is time and space efficient, and also achieves a better classification accuracy
in compressed domain.
Related papers
- Unifying Multimodal Retrieval via Document Screenshot Embedding [92.03571344075607]
Document Screenshot Embedding (DSE) is a novel retrieval paradigm that regards document screenshots as a unified input format.
We first craft the dataset of Wiki-SS, a 1.3M Wikipedia web page screenshots as the corpus to answer the questions from the Natural Questions dataset.
In such a text-intensive document retrieval setting, DSE shows competitive effectiveness compared to other text retrieval methods relying on parsing.
arXiv Detail & Related papers (2024-06-17T06:27:35Z) - A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical
Document Image Enhancement [13.27528507177775]
We propose textbfT2T-BinFormer which is a novel document binarization encoder-decoder architecture based on a Tokens-to-token vision transformer.
Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods.
arXiv Detail & Related papers (2023-12-06T23:01:11Z) - Deep Unrestricted Document Image Rectification [110.61517455253308]
We present DocTr++, a novel unified framework for document image rectification.
We upgrade the original architecture by adopting a hierarchical encoder-decoder structure for multi-scale representation extraction and parsing.
We contribute a real-world test set and metrics applicable for evaluating the rectification quality.
arXiv Detail & Related papers (2023-04-18T08:00:54Z) - Data-Efficient Sequence-Based Visual Place Recognition with Highly
Compressed JPEG Images [17.847661026367767]
Visual Place Recognition (VPR) is a fundamental task that allows a robotic platform to successfully localise itself in the environment.
JPEG is an image compression standard that can employ high compression ratios to facilitate lower data transmission for VPR applications.
When applying high levels of JPEG compression, both the image clarity and size are drastically reduced.
arXiv Detail & Related papers (2023-02-26T13:13:51Z) - Deep Selector-JPEG: Adaptive JPEG Image Compression for Computer Vision
in Image classification with Human Vision Criteria [8.615661848178183]
This paper presents Deep Selector-HV, an adaptive JPEG compression method that targets image classification.
Deep Selector-HV selects adaptively a Quality Factor (QF) to compress the image so that a good trade-off between the Compression Ratio (CR) and classifier Accuracy (Accuracy performance) can be achieved.
arXiv Detail & Related papers (2023-02-19T12:38:20Z) - Document Image Binarization in JPEG Compressed Domain using Dual
Discriminator Generative Adversarial Networks [0.0]
The proposed model has been thoroughly tested with different versions of DIBCO dataset having challenges like holes, erased or smudged ink, dust, and misplaced fibres.
The model proved to be highly robust, efficient both in terms of time and space complexities, and also resulted in state-of-the-art performance in JPEG compressed domain.
arXiv Detail & Related papers (2022-09-13T12:07:32Z) - Pattern Spotting and Image Retrieval in Historical Documents using Deep
Hashing [60.67014034968582]
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents.
Deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations.
The proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works.
arXiv Detail & Related papers (2022-08-04T01:39:37Z) - DocScanner: Robust Document Image Rectification with Progressive
Learning [162.03694280524084]
This work presents DocScanner, a new deep network architecture for document image rectification.
DocScanner maintains a single estimate of the rectified image, which is progressively corrected with a recurrent architecture.
The iterative refinements make DocScanner converge to a robust and superior performance, and the lightweight recurrent architecture ensures the running efficiency.
arXiv Detail & Related papers (2021-10-28T09:15:02Z) - Variable-Rate Deep Image Compression through Spatially-Adaptive Feature
Transform [58.60004238261117]
We propose a versatile deep image compression network based on Spatial Feature Transform (SFT arXiv:1804.02815)
Our model covers a wide range of compression rates using a single model, which is controlled by arbitrary pixel-wise quality maps.
The proposed framework allows us to perform task-aware image compressions for various tasks.
arXiv Detail & Related papers (2021-08-21T17:30:06Z) - Learning to Improve Image Compression without Changing the Standard
Decoder [100.32492297717056]
We propose learning to improve the encoding performance with the standard decoder.
Specifically, a frequency-domain pre-editing method is proposed to optimize the distribution of DCT coefficients.
We do not modify the JPEG decoder and therefore our approach is applicable when viewing images with the widely used standard JPEG decoder.
arXiv Detail & Related papers (2020-09-27T19:24:42Z) - Remote Sensing Image Scene Classification with Deep Neural Networks in
JPEG 2000 Compressed Domain [8.296684637620553]
Existing scene classification approaches using deep neural networks (DNNs) require to fully decompress the images.
We propose a novel approach to achieve scene classification in JPEG 2000 compressed RS images.
arXiv Detail & Related papers (2020-06-20T09:13:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.